Class StringDataInstance

java.lang.Object
ghidra.program.model.data.StringDataInstance
Direct Known Subclasses:
StringDataInstance.StaticStringInstance

public class StringDataInstance extends Object
Represents an instance of a string in a MemBuffer.

This class handles all the details of detecting a terminated string's length, converting the bytes in the membuffer into a java native String, and converting the raw String into a formatted human-readable version, according to the various SettingsDefinitions attached to the string data location.

  • Field Details

  • Constructor Details

    • StringDataInstance

      protected StringDataInstance()
    • StringDataInstance

      public StringDataInstance(DataType dataType, Settings settings, MemBuffer buf, int length)
      Creates a string instance using the data in the MemBuffer and the settings pulled from the string data type.
      Parameters:
      dataType - DataType of the string, either a AbstractStringDataType derived type or an ArrayStringable element-of-char-array type.
      settings - Settings attached to the data location.
      buf - MemBuffer containing the data.
      length - Length passed from the caller to the datatype. -1 indicates a 'probe' trying to detect the length of an unknown string, otherwise it will be the length of the containing field of the data instance.
    • StringDataInstance

      public StringDataInstance(DataType dataType, Settings settings, MemBuffer buf, int length, boolean isArrayElement)
      Creates a string instance using the data in the MemBuffer and the settings pulled from the string data type.
      Parameters:
      dataType - DataType of the string, either a AbstractStringDataType derived type or an ArrayStringable element-of-char-array type.
      settings - Settings attached to the data location.
      buf - MemBuffer containing the data.
      length - Length passed from the caller to the datatype. -1 indicates a 'probe' trying to detect the length of an unknown string, otherwise it will be the length of the containing field of the data instance.
      isArrayElement - boolean flag, true indicates that the specified dataType is an element in an array (ie. char[] vs. just a plain char), causing the string layout to be forced to StringLayoutEnum.NULL_TERMINATED_BOUNDED
  • Method Details

    • isString

      public static boolean isString(Data data)
      Returns true if the Data instance is a 'string'.
      Parameters:
      data - Data instance to test, null ok.
      Returns:
      boolean true if string data.
    • isStringDataType

      public static boolean isStringDataType(DataType dt)
      Returns true if the specified DataType is (or could be) a string.

      Arrays of char-like elements (see ArrayStringable) are treated as string data types. The actual data instance needs to be inspected to determine if the array is an actual string.

      Parameters:
      dt - DataType to test
      Returns:
      boolean true if data type is or could be a string
    • isChar

      public static boolean isChar(Data data)
      Returns true if the Data instance is one of the many 'char' data types.
      Parameters:
      data - Data instance to test, null ok
      Returns:
      boolean true if char data
    • getCharRepresentation

      public static String getCharRepresentation(DataType dataType, byte[] bytes, Settings settings)
      Returns a string representation of the character(s) contained in the byte array, suitable for display as a single character, or as a sequence of characters.

      Parameters:
      dataType - the DataType of the element containing the bytes (most likely a ByteDataType)
      bytes - the big-endian ordered bytes to convert to a char representation
      settings - the Settings object for the location where the bytes came from, or null
      Returns:
      formatted string (typically with quotes around the contents): single character: 'a', multiple characters: "a\x12bc"
    • getStringDataInstance

      public static StringDataInstance getStringDataInstance(Data data)
      Returns a new StringDataInstance using the bytes in the data codeunit.

      Parameters:
      data - Data item
      Returns:
      new StringDataInstance, never NULL. See NULL_INSTANCE.
    • getStringDataInstance

      public static StringDataInstance getStringDataInstance(DataType dataType, MemBuffer buf, Settings settings, int length)
      Returns a new StringDataInstance using the bytes in the MemBuffer.

      Parameters:
      dataType - DataType of the bytes in the buffer.
      buf - memory buffer containing the bytes.
      settings - the Settings object
      length - the length of the data.
      Returns:
      new StringDataInstance, never NULL. See NULL_INSTANCE.
    • makeStringLabel

      public static String makeStringLabel(String prefixStr, String str, DataTypeDisplayOptions options)
      Formats a string value so that it is in the form of a symbol label.
      Parameters:
      prefixStr - data type prefix, see AbstractStringDataType.getDefaultLabelPrefix()
      str - string value
      options - display options
      Returns:
      string, suitable to be used as a label
    • getCharsetName

      public String getCharsetName()
      Returns the string name of the charset.
      Returns:
      string charset name
    • getAddress

      public Address getAddress()
      Returns the address of the MemBuffer.
      Returns:
      Address of the MemBuffer.
    • getEndAddress

      public Address getEndAddress()
    • getAddressRange

      public AddressRange getAddressRange()
    • getDataLength

      public int getDataLength()
      Returns the length of this string's data, in bytes.
      Returns:
      number of bytes in this string.
    • getStringLength

      public int getStringLength()
      Returns the length, in bytes, of the string data object contained in the MemBuffer, or -1 if the length could not be determined.

      This is not the same as the number of characters in the string, or the number of bytes occupied by the characters. For instance, pascal strings have a 1 or 2 byte length field that increases the size of the string data object beyond the characters in the string, and null terminated strings have don't include the null character, but its presence is included in the size of the string object.

      For length-specified string data types that do not use null-terminators and with a known data instance length (ie. not a probe), this method just returns the value specified in the constructor length parameter, otherwise a null-terminator is searched for.

      When searching for a null-terminator, the constructor length parameter will be respected or ignored depending on the StringLayoutEnum.

      When the length parameter is ignored (ie. "unbounded" searching), the search is limited to MAX_STRING_LENGTH bytes.

      The MemBuffer's endian'ness is used to determine which end of the padded character field contains our n-bit character which will be tested for null-ness. (not the endian'ness of the character set name - ie. "UTF-16BE")

      Returns:
      length of the string (INCLUDING null term if null term probe), in bytes, or -1 if no terminator found.
    • isMissingNullTerminator

      public boolean isMissingNullTerminator()
      Returns true if the string should have a trailing NULL character and doesn't.
      Returns:
      boolean true if the trailing NULL character is missing, false if string type doesn't need a trailing NULL character or if it is present.
    • getStringValue

      public String getStringValue()
      Returns the string contained in the specified MemBuffer, or null if all the bytes of the string could not be read.

      This method deals in characters of size charSize, that might be padded to a larger size. The raw n-byte characters are converted into a Java String using a Java Charset or by using a custom Ghidra conversion. (see convertBytesToStringCustomCharset)

      The MemBuffer's endian'ness is used to determine which end of the padded field contains our charSize character bytes which will be used to create the java String.

      Returns:
      String containing the characters in buf or null if unable to read all length bytes from the membuffer.
    • getStringRepresentation

      public String getStringRepresentation()
      Returns a formatted version of the string returned by getStringValue().

      The resulting string will be formatted with quotes around the parts that contain plain ASCII alpha characters (and simple escape sequences), and out-of-range byte-ish values listed as comma separated hex-encoded values:

      Example (quotes are part of result): "Test\tstring",01,02,"Second\npart",00

      Returns:
      formatted String, or the translated value if present and the "show translated" setting is enabled for this string's location
    • getStringRepresentation

      public String getStringRepresentation(boolean originalOrTranslated)
      Returns a formatted version of the string returned by getStringValue().

      The resulting string will be formatted with quotes around the parts that contain plain ASCII alpha characters (and simple escape sequences), and out-of-range byte-ish values listed as comma separated hex-encoded values:

      Example (quotes are part of result): "Test\tstring",01,02,"Second\npart",00

      Parameters:
      originalOrTranslated - boolean flag, if true returns the representation of the string value, if false returns the representation of the translated value
      Returns:
      formatted String
    • hasTranslatedValue

      public boolean hasTranslatedValue()
      Returns true if this string has a translated value that could be displayed.
      Returns:
      boolean true if translated value is present, false if no value is present
    • getTranslatedValue

      public String getTranslatedValue()
      Returns the value of the stored translated settings string.

      Returns:
      previously translated string.
    • isShowTranslation

      public boolean isShowTranslation()
      Returns true if the user should be shown the translated value of the string instead of the real value.
      Returns:
      boolean true if should show previously translated value.
    • getCharRepresentation

      public String getCharRepresentation()
      Convert a char value (or sequence of char values) in memory into its canonical unicode representation, using attached charset and encoding information.

      Returns:
      String containing the representation of the char.
    • getLabel

      public String getLabel(String prefixStr, String abbrevPrefixStr, String defaultStr, DataTypeDisplayOptions options)
    • getOffcutLabelString

      public String getOffcutLabelString(String prefixStr, String abbrevPrefixStr, String defaultStr, DataTypeDisplayOptions options, int byteOffset)
    • getByteOffcut

      public StringDataInstance getByteOffcut(int byteOffset)
      Returns a new StringDataInstance that points to the string characters that start at byteOffset from the start of this instance.

      If the requested offset is not valid, StringDataInstance.NULL_INSTANCE is returned.

      Parameters:
      byteOffset - number of bytes from start of data instance to start new instance.
      Returns:
      new StringDataInstance, or StringDataInstance.NULL_INSTANCE if offset not valid.
    • getCharOffcut

      public StringDataInstance getCharOffcut(int offsetChars)
      Create a new StringDataInstance that points to a portion of this instance, starting at a character offset (whereever that may be) into the data.

      Parameters:
      offsetChars - number of characters from the beginning of the string to start the new StringDataInstance.
      Returns:
      new StringDataInstance pointing to a subset of characters, or the this instance if there was an error.
    • getStringDataTypeGuess

      public DataType getStringDataTypeGuess()
      Maps a StringDataInstance (this type) to the String DataType that best can handle this type of data.

      I dare myself to type Type one more time.

      Returns:
      DataType, defaulting to StringDataType if no direct match found.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • encodeReplacementFromStringValue

      public byte[] encodeReplacementFromStringValue(CharSequence value) throws CharacterCodingException
      Encode a string to replace the current value
      Parameters:
      value - the value to encode
      Returns:
      the encoded value
      Throws:
      CharacterCodingException - if a character could not be encoded
    • encodeReplacementFromStringRepresentation

      public byte[] encodeReplacementFromStringRepresentation(CharSequence repr) throws MalformedInputException, UnmappableCharacterException, StringRenderParser.StringParseException
      Parse and encode a string from its representation to replace the current value
      Parameters:
      repr - the representation of the string
      Returns:
      the encoded value
      Throws:
      StringRenderParser.StringParseException - if the representation could not be parsed
      UnmappableCharacterException - if a character could not be encoded
      MalformedInputException - if the input contains invalid character sequences
    • encodeReplacementFromCharValue

      public byte[] encodeReplacementFromCharValue(char[] value) throws CharacterCodingException
      Encode a single character to replace the current value
      Parameters:
      value - a single code point to encode
      Returns:
      the encoded value
      Throws:
      CharacterCodingException - if the character could not be encoded
    • encodeReplacementFromCharRepresentation

      public byte[] encodeReplacementFromCharRepresentation(CharSequence repr) throws MalformedInputException, UnmappableCharacterException, StringRenderParser.StringParseException
      Parse and encode a single character from its representation to replace the current value
      Parameters:
      repr - the representation of a single character
      Returns:
      the encoded value
      Throws:
      StringRenderParser.StringParseException - if the representation could not be parsed
      UnmappableCharacterException - if a character could not be encoded
      MalformedInputException - if the input contains invalid character sequences