Package ghidra.util

Class StringUtilities

java.lang.Object
ghidra.util.StringUtilities

public class StringUtilities extends Object
Class with static methods that deal with string manipulation.
  • Field Details

    • DOUBLE_QUOTED_STRING_PATTERN

      public static final Pattern DOUBLE_QUOTED_STRING_PATTERN
    • LINE_SEPARATOR

      public static final String LINE_SEPARATOR
      The platform specific string that is the line separator.
    • UNICODE_REPLACEMENT

      public static final int UNICODE_REPLACEMENT
      See Also:
    • UNICODE_BE_BYTE_ORDER_MARK

      public static final int UNICODE_BE_BYTE_ORDER_MARK
      Unicode Byte Order Marks (BOM) characters are special characters in the Unicode character space that signal endian-ness of the text.

      The value for the BigEndian version (0xFEFF) works for both 16 and 32 bit character values.

      There are separate values for Little Endian Byte Order Marks for 16 and 32 bit characters because the 32 bit value is shifted left by 16 bits.

      See Also:
    • UNICODE_LE16_BYTE_ORDER_MARK

      public static final int UNICODE_LE16_BYTE_ORDER_MARK
      See Also:
    • UNICODE_LE32_BYTE_ORDER_MARK

      public static final int UNICODE_LE32_BYTE_ORDER_MARK
      See Also:
    • DEFAULT_TAB_SIZE

      public static final int DEFAULT_TAB_SIZE
      See Also:
  • Method Details

    • isControlCharacterOrBackslash

      public static boolean isControlCharacterOrBackslash(char c)
      Returns true if the given character is a special character. For example a '\n' or '\\'. A value of 0 is not considered special for this purpose as it is handled separately because it has more varied use cases.
      Parameters:
      c - the character
      Returns:
      true if the given character is a special character
    • isControlCharacterOrBackslash

      public static boolean isControlCharacterOrBackslash(int codePoint)
      Returns true if the given codePoint (ie. full unicode 32bit character) is a special character. For example a '\n' or '\\'. A value of 0 is not considered special for this purpose as it is handled separately because it has more varied use cases.
      Parameters:
      codePoint - the codePoint (ie. character), see String.codePointAt(int)
      Returns:
      true if the given character is a special character
    • isDoubleQuoted

      public static boolean isDoubleQuoted(String str)
      Determines if a string is enclosed in double quotes (ASCII 34 (0x22))
      Parameters:
      str - String to test for double-quote enclosure
      Returns:
      True if the first and last characters are the double-quote character, false otherwise
    • extractFromDoubleQuotes

      public static String extractFromDoubleQuotes(String str)
      If the given string is enclosed in double quotes, extract the inner text. Otherwise, return the given string unmodified.
      Parameters:
      str - String to match and extract from
      Returns:
      The inner text of a doubly-quoted string, or the original string if not double-quoted.
    • isDisplayable

      public static boolean isDisplayable(int c)
      Returns true if the character is in displayable character range
      Parameters:
      c - the character
      Returns:
      true if the character is in displayable character range
    • isAllBlank

      public static boolean isAllBlank(CharSequence... sequences)
      Returns true if all the given sequences are either null or only whitespace
      Parameters:
      sequences - the sequences to check
      Returns:
      true if all the given sequences are either null or only whitespace.
      See Also:
      • StringUtils.isNoneBlank(CharSequence...)
      • StringUtils.isNoneEmpty(CharSequence...)
      • StringUtils.isAnyBlank(CharSequence...)
      • StringUtils.isAnyEmpty(CharSequence...)
    • characterToString

      public static String characterToString(char c)
      Converts the character into a string. If the character is special, it will actually render the character. For example, given '\n' the output would be "\\n".
      Parameters:
      c - the character to convert into a string
      Returns:
      the converted character
    • countOccurrences

      public static int countOccurrences(String string, char occur)
      Returns a count of how many times the 'occur' char appears in the strings.
      Parameters:
      string - the string to look inside
      occur - the character to look for/
      Returns:
      a count of how many times the 'occur' char appears in the strings
    • equals

      public static boolean equals(String s1, String s2, boolean caseSensitive)
    • endsWithWhiteSpace

      public static boolean endsWithWhiteSpace(String string)
    • toQuotedString

      public static String toQuotedString(byte[] bytes)
      Generate a quoted string from US-ASCII character bytes assuming 1-byte chars.

      Special characters and non-printable characters will be escaped using C character escape conventions (e.g., \t, \n, \\uHHHH, etc.). If a character size other than 1-byte is required the alternate form of this method should be used.

      The result string will be single quoted (ie. "'") if the input byte array is 1 byte long, otherwise the result will be double-quoted ('"').

      Parameters:
      bytes - character string bytes
      Returns:
      escaped string for display use
    • toQuotedString

      public static String toQuotedString(byte[] bytes, int charSize)
      Generate a quoted string from US-ASCII characters, where each character is charSize bytes.

      Special characters and non-printable characters will be escaped using C character escape conventions (e.g., \t, \n, \\uHHHH, etc.).

      The result string will be single quoted (ie. "'") if the input byte array is 1 character long (ie. charSize), otherwise the result will be double-quoted ('"').

      Parameters:
      bytes - array of bytes
      charSize - number of bytes per character (1, 2, 4).
      Returns:
      escaped string for display use
    • startsWithIgnoreCase

      public static boolean startsWithIgnoreCase(String string, String prefix)
      Returns true if the given string starts with prefix ignoring case.

      Note: This method is equivalent to calling:

       string.regionMatches(true, 0, prefix, 0, prefix.length());
       
      Parameters:
      string - the string which may contain the prefix
      prefix - the prefix to test against
      Returns:
      true if the given string starts with prefix ignoring case.
    • endsWithIgnoreCase

      public static boolean endsWithIgnoreCase(String string, String postfix)
      Returns true if the given string ends with postfix, ignoring case.

      Note: This method is equivalent to calling:

       int startIndex = string.length() - postfix.length();
       string.regionMatches(true, startOffset, postfix, 0, postfix.length());
       
      Parameters:
      string - the string which may end with postfix
      postfix - the string for which to test existence
      Returns:
      true if the given string ends with postfix, ignoring case.
    • containsAll

      public static boolean containsAll(CharSequence toSearch, CharSequence... searches)
      Returns true if all the given searches are contained in the given string.
      Parameters:
      toSearch - the string to search
      searches - the strings to find
      Returns:
      true if all the given searches are contained in the given string.
    • containsAllIgnoreCase

      public static boolean containsAllIgnoreCase(CharSequence toSearch, CharSequence... searches)
      Returns true if all the given searches are contained in the given string, ignoring case.
      Parameters:
      toSearch - the string to search
      searches - the strings to find
      Returns:
      true if all the given searches are contained in the given string.
    • containsAnyIgnoreCase

      public static boolean containsAnyIgnoreCase(CharSequence toSearch, CharSequence... searches)
      Returns true if any of the given searches are contained in the given string, ignoring case.
      Parameters:
      toSearch - the string to search
      searches - the strings to find
      Returns:
      true if any of the given searches are contained in the given string.
    • indexOfWord

      public static int indexOfWord(String text, String searchWord)
      Returns the index of the first whole word occurrence of the search word within the given text. A whole word is defined as the character before and after the occurrence must not be a JavaIdentifierPart.
      Parameters:
      text - the text to be searched.
      searchWord - the word to search for.
      Returns:
      the index of the first whole word occurrence of the search word within the given text, or -1 if not found.
    • isWholeWord

      public static boolean isWholeWord(String text, int startIndex, int length)
      Returns true if the substring within the text string starting at startIndex and having the given length is a whole word. A whole word is defined as the character before and after the occurrence must not be a JavaIdentifierPart.
      Parameters:
      text - the text containing the potential word.
      startIndex - the start index of the potential word within the text.
      length - the length of the potential word
      Returns:
      true if the substring within the text string starting at startIndex and having the given length is a whole word.
    • convertTabsToSpaces

      public static String convertTabsToSpaces(String str)
      Convert tabs in the given string to spaces using a default tab width of 8 spaces.
      Parameters:
      str - string containing tabs
      Returns:
      string that has spaces for tabs
    • convertTabsToSpaces

      public static String convertTabsToSpaces(String str, int tabSize)
      Convert tabs in the given string to spaces.
      Parameters:
      str - string containing tabs
      tabSize - length of the tab
      Returns:
      string that has spaces for tabs
    • toLines

      public static String[] toLines(String str)
      Parses a string containing multiple lines into an array where each element in the array contains only a single line. The "\n" character is used as the delimiter for lines.

      This methods creates an empty string entry in the result array for initial and trailing separator chars, as well as for consecutive separators.

      Parameters:
      str - the string to parse
      Returns:
      an array of lines; an empty array if the given value is null or empty
      See Also:
      • StringUtils.splitPreserveAllTokens(String, char)
    • toLines

      public static String[] toLines(String s, boolean preserveTokens)
      Parses a string containing multiple lines into an array where each element in the array contains only a single line. The "\n" character is used as the delimiter for lines.
      Parameters:
      s - the string to parse
      preserveTokens - true signals to treat consecutive newlines as multiple lines; false signals to treat consecutive newlines as a single line break
      Returns:
      an array of lines; an empty array if the given value is null or empty
    • toFixedSize

      public static String toFixedSize(String s, char pad, int size)
      Enforces the given length upon the given string by trimming and then padding as necessary.
      Parameters:
      s - the String to fix
      pad - the pad character to use if padding is required
      size - the desired size of the string
      Returns:
      the fixed string
    • pad

      public static String pad(String source, char filler, int length)
      Pads the source string to the specified length, using the filler string as the pad. If length is negative, left justifies the string, appending the filler; if length is positive, right justifies the source string.
      Parameters:
      source - the original string to pad.
      filler - the type of characters with which to pad
      length - the length of padding to add (0 results in no changes)
      Returns:
      the padded string
    • indentLines

      public static String indentLines(String s, String indent)
      Splits the given string into lines using \n and then pads each string with the given pad string. Finally, the updated lines are formed into a single string.

      This is useful for constructing complicated toString() representations.

      Parameters:
      s - the input string
      indent - the indent string; this will be appended as needed
      Returns:
      the output string
    • findWord

      public static String findWord(String s, int index)
      Finds the word at the given index in the given string. For example, the string "The tree is green" and the index of 5, the result would be "tree".
      Parameters:
      s - the string to search
      index - the index into the string to "seed" the word.
      Returns:
      String the word contained at the given index.
    • findWord

      public static String findWord(String s, int index, char[] charsToAllow)
      Finds the word at the given index in the given string; if the word contains the given charToAllow, then allow it in the string. For example, the string "The tree* is green" and the index of 5, charToAllow is '*', then the result would be "tree*".

      If the search yields only whitespace, then the empty string will be returned.

      Parameters:
      s - the string to search
      index - the index into the string to "seed" the word.
      charsToAllow - chars that normally would be considered invalid, e.g., '*' so that the word can be returned with the charToAllow
      Returns:
      String the word contained at the given index.
    • findWordLocation

      public static WordLocation findWordLocation(String s, int index, char[] charsToAllow)
    • isWordChar

      public static boolean isWordChar(char c, char[] charsToAllow)
      Loosely defined as a character that we would expected to be an normal ascii content meant for consumption by a human. Also, provided allows chars will pass the test.
      Parameters:
      c - the char to check
      charsToAllow - characters that will cause this method to return true
      Returns:
      true if it is a 'word char'
    • findLastWordPosition

      public static int findLastWordPosition(String s)
      Finds the starting position of the last word in the given string.
      Parameters:
      s - the string to search
      Returns:
      int the starting position of the last word, -1 if not found
    • getLastWord

      public static String getLastWord(String s, String separator)
      Takes a path-like string and retrieves the last non-empty item. Examples:
      • StringUtilities.getLastWord("/This/is/my/last/word/", "/") returns word
      • StringUtilities.getLastWord("/This/is/my/last/word/", "/") returns word
      • StringUtilities.getLastWord("This.is.my.last.word", ".") returns word
      • StringUtilities.getLastWord("/This/is/my/last/word/MyFile.java", ".") returns java
      • StringUtilities.getLastWord("/This/is/my/last/word/MyFile.java", "/") returns MyFile.java
      Parameters:
      s - the string from which to get the last word
      separator - the separator of words
      Returns:
      the last word
    • toString

      public static String toString(int value)
      Converts an integer into a string. For example, given an integer 0x41424344, the returned string would be "ABCD".
      Parameters:
      value - the integer value
      Returns:
      the converted string
    • toStingJson

      public static String toStingJson(Object o)
      Creates a JSON string for the given object using all of its fields. To control the fields that are in the result string, see Json.

      This is here as a marker to point users to the real Json String utility.

      Parameters:
      o - the object for which to create a string
      Returns:
      the string
    • toStringWithIndent

      public static String toStringWithIndent(Object o)
    • mergeStrings

      public static String mergeStrings(String string1, String string2)
      Merge two strings into one. If one string contains the other, then the largest is returned. If both strings are null then null is returned. If both strings are empty, the empty string is returned. If the original two strings differ, this adds the second string to the first separated by a newline.
      Parameters:
      string1 - the first string
      string2 - the second string
      Returns:
      the merged string
    • trim

      public static String trim(String original, int max)
      Limits the given string to the given max number of characters. If the string is larger than the given length, then it will be trimmed to fit that length after adding ellipses

      The given max value must be at least 4. This is to ensure that, at a minimum, we can display the "..." plus one character.

      Parameters:
      original - The string to be limited
      max - The maximum number of characters to display (including ellipses, if trimmed).
      Returns:
      the trimmed string
      Throws:
      IllegalArgumentException - If the given max value is less than 5.
    • trimTrailingNulls

      public static String trimTrailingNulls(String s)
    • trimMiddle

      public static String trimMiddle(String s, int max)
      Trims the given string the max number of characters. Ellipses will be added to signal that content was removed. Thus, the actual number of removed characters will be (s.length() - max) + "..." length.

      If the string fits within the max, then the string will be returned.

      The given max value must be at least 5. This is to ensure that, at a minimum, we can display the "..." plus one character from the front and back of the string.

      Parameters:
      s - the string to trim
      max - the max number of characters to allow.
      Returns:
      the trimmed string
    • fixMultipleAsterisks

      public static String fixMultipleAsterisks(String value)
      This method looks for all occurrences of successive asterisks (i.e., "**") and replace with a single asterisk, which is an equivalent usage in Ghidra. This is necessary due to some symbol names which cause the pattern matching process to become unusable. An example string that causes this problem is "s_CLSID\{ADB880A6-D8FF-11CF-9377-00AA003B7A11}\InprocServer3_01001400".
      Parameters:
      value - The string to be checked.
      Returns:
      The updated string.
    • isValidCLanguageChar

      public static boolean isValidCLanguageChar(char c)
      Returns true if the character is OK to be contained inside C language string. That is, the string should not be tokenized on this char.
      Parameters:
      c - the char
      Returns:
      boolean true if it is allows in a C string
    • isAsciiChar

      public static boolean isAsciiChar(char c)
      Returns true if the given character is within the ascii range.
      Parameters:
      c - the char to check
      Returns:
      true if the given character is within the ascii range.
    • isAsciiChar

      public static boolean isAsciiChar(int codePoint)
      Returns true if the given code point is within the ascii range.
      Parameters:
      codePoint - the codePoint to check
      Returns:
      true if the given character is within the ascii range.
    • convertEscapeSequences

      public static String convertEscapeSequences(String str)
      Replaces escaped characters in a string to corresponding control characters. For example a string containing a backslash character followed by a 'n' character would be replaced with a single line feed (0x0a) character. One use for this is to to allow users to type strings in a text field and include control characters such as line feeds and tabs. The string that contains 'a','b','c', '\', 'n', 'd', '\', 'u', '0', '0', '0', '1', 'e' would become 'a','b','c',0x0a,'d', 0x01, e"
      Parameters:
      str - The string to convert escape sequences to control characters.
      Returns:
      a new string with escape sequences converted to control characters.
      See Also:
    • convertControlCharsToEscapeSequences

      public static String convertControlCharsToEscapeSequences(String str)
      Replaces known control characters in a string to corresponding escape sequences. For example a string containing a line feed character would be converted to backslash character followed by an 'n' character. One use for this is to display strings in a manner to easily see the embedded control characters. The string that contains 'a','b','c',0x0a,'d', 0x01, 'e' would become 'a','b','c', '\', 'n', 'd', 0x01, 'e'
      Parameters:
      str - The string to convert control characters to escape sequences
      Returns:
      a new string with all the control characters converted to escape sequences.
    • convertCodePointToEscapeSequence

      public static String convertCodePointToEscapeSequence(int codePoint)
      Maps known control characters to corresponding escape sequences. For example a line feed character would be converted to backslash '\\' character followed by an 'n' character. One use for this is to display strings in a manner to easily see the embedded control characters.
      Parameters:
      codePoint - The character to convert to escape sequence string
      Returns:
      a new string with equivalent to escape sequence, or original character (as a string) if not in the control character mapping.
    • wrapToWidth

      public static String wrapToWidth(String str, int width)
      Wrap the given string at whitespace to best fit within the given line width

      If it is not possible to fit a word in the given width, it will be put on a line by itself, and that line will be allowed to exceed the given width.

      Parameters:
      str - the string to wrap
      width - the max width of each line, unless a single word exceeds it
      Returns: