Class StringUtilities
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
About the worst way to wrap lines ever -
Field Summary
Modifier and TypeFieldDescriptionstatic final int
static final Pattern
static final String
The platform specific string that is the line separator.static final int
Unicode Byte Order Marks (BOM) characters are special characters in the Unicode character space that signal endian-ness of the text.static final int
static final int
static final int
-
Method Summary
Modifier and TypeMethodDescriptionstatic String
characterToString
(char c) Converts the character into a string.static boolean
containsAll
(CharSequence toSearch, CharSequence... searches) Returns true if all the givensearches
are contained in the given string.static boolean
containsAllIgnoreCase
(CharSequence toSearch, CharSequence... searches) Returns true if all the givensearches
are contained in the given string, ignoring case.static boolean
containsAnyIgnoreCase
(CharSequence toSearch, CharSequence... searches) Returns true if any of the givensearches
are contained in the given string, ignoring case.static String
convertCodePointToEscapeSequence
(int codePoint) Maps known control characters to corresponding escape sequences.static String
Replaces known control characters in a string to corresponding escape sequences.static String
Replaces escaped characters in a string to corresponding control characters.static String
Convert tabs in the given string to spaces using a default tab width of 8 spaces.static String
convertTabsToSpaces
(String str, int tabSize) Convert tabs in the given string to spaces.static int
countOccurrences
(String string, char occur) Returns a count of how many times the 'occur' char appears in the strings.static boolean
endsWithIgnoreCase
(String string, String postfix) Returns true if the given string ends withpostfix
, ignoring case.static boolean
endsWithWhiteSpace
(String string) static boolean
static String
If the given string is enclosed in double quotes, extract the inner text.static int
Finds the starting position of the last word in the given string.static String
Finds the word at the given index in the given string.static String
Finds the word at the given index in the given string; if the word contains the given charToAllow, then allow it in the string.static WordLocation
findWordLocation
(String s, int index, char[] charsToAllow) static String
fixMultipleAsterisks
(String value) This method looks for all occurrences of successive asterisks (i.e., "**") and replace with a single asterisk, which is an equivalent usage in Ghidra.static String
getLastWord
(String s, String separator) Takes a path-like string and retrieves the last non-empty item.static String
indentLines
(String s, String indent) Splits the given string into lines using\n
and then pads each string with the given pad string.static int
indexOfWord
(String text, String searchWord) Returns the index of the first whole word occurrence of the search word within the given text.static boolean
isAllBlank
(CharSequence... sequences) Returns true if all the given sequences are either null or only whitespacestatic boolean
isAsciiChar
(char c) Returns true if the given character is within the ascii range.static boolean
isAsciiChar
(int codePoint) Returns true if the given code point is within the ascii range.static boolean
isControlCharacterOrBackslash
(char c) Returns true if the given character is a special character.static boolean
isControlCharacterOrBackslash
(int codePoint) Returns true if the given codePoint (ie.static boolean
isDisplayable
(int c) Returns true if the character is in displayable character rangestatic boolean
isDoubleQuoted
(String str) Determines if a string is enclosed in double quotes (ASCII 34 (0x22))static boolean
isValidCLanguageChar
(char c) Returns true if the character is OK to be contained inside C language string.static boolean
isWholeWord
(String text, int startIndex, int length) Returns true if the substring within the text string starting at startIndex and having the given length is a whole word.static boolean
isWordChar
(char c, char[] charsToAllow) Loosely defined as a character that we would expected to be an normal ascii content meant for consumption by a human.static String
mergeStrings
(String string1, String string2) Merge two strings into one.static String
Pads the source string to the specified length, using the filler string as the pad.static boolean
startsWithIgnoreCase
(String string, String prefix) Returns true if the given string starts withprefix
ignoring case.static String
toFixedSize
(String s, char pad, int size) Enforces the given length upon the given string by trimming and then padding as necessary.static String[]
Parses a string containing multiple lines into an array where each element in the array contains only a single line.static String[]
Parses a string containing multiple lines into an array where each element in the array contains only a single line.static String
toQuotedString
(byte[] bytes) Generate a quoted string from US-ASCII character bytes assuming 1-byte chars.static String
toQuotedString
(byte[] bytes, int charSize) Generate a quoted string from US-ASCII characters, where each character is charSize bytes.static String
Creates a JSON string for the given object using all of its fields.static String
toString
(int value) Converts an integer into a string.static String
static String
Limits the given string to the givenmax
number of characters.static String
trimMiddle
(String s, int max) Trims the given string themax
number of characters.static String
static String
wrapToWidth
(String str, int width) Wrap the given string at whitespace to best fit within the given line width
-
Field Details
-
DOUBLE_QUOTED_STRING_PATTERN
-
LINE_SEPARATOR
The platform specific string that is the line separator. -
UNICODE_REPLACEMENT
public static final int UNICODE_REPLACEMENT- See Also:
-
UNICODE_BE_BYTE_ORDER_MARK
public static final int UNICODE_BE_BYTE_ORDER_MARKUnicode Byte Order Marks (BOM) characters are special characters in the Unicode character space that signal endian-ness of the text.The value for the BigEndian version (0xFEFF) works for both 16 and 32 bit character values.
There are separate values for Little Endian Byte Order Marks for 16 and 32 bit characters because the 32 bit value is shifted left by 16 bits.
- See Also:
-
UNICODE_LE16_BYTE_ORDER_MARK
public static final int UNICODE_LE16_BYTE_ORDER_MARK- See Also:
-
UNICODE_LE32_BYTE_ORDER_MARK
public static final int UNICODE_LE32_BYTE_ORDER_MARK- See Also:
-
DEFAULT_TAB_SIZE
public static final int DEFAULT_TAB_SIZE- See Also:
-
-
Method Details
-
isControlCharacterOrBackslash
public static boolean isControlCharacterOrBackslash(char c) Returns true if the given character is a special character. For example a '\n' or '\\'. A value of 0 is not considered special for this purpose as it is handled separately because it has more varied use cases.- Parameters:
c
- the character- Returns:
- true if the given character is a special character
-
isControlCharacterOrBackslash
public static boolean isControlCharacterOrBackslash(int codePoint) Returns true if the given codePoint (ie. full unicode 32bit character) is a special character. For example a '\n' or '\\'. A value of 0 is not considered special for this purpose as it is handled separately because it has more varied use cases.- Parameters:
codePoint
- the codePoint (ie. character), seeString.codePointAt(int)
- Returns:
- true if the given character is a special character
-
isDoubleQuoted
Determines if a string is enclosed in double quotes (ASCII 34 (0x22))- Parameters:
str
- String to test for double-quote enclosure- Returns:
- True if the first and last characters are the double-quote character, false otherwise
-
extractFromDoubleQuotes
If the given string is enclosed in double quotes, extract the inner text. Otherwise, return the given string unmodified.- Parameters:
str
- String to match and extract from- Returns:
- The inner text of a doubly-quoted string, or the original string if not double-quoted.
-
isDisplayable
public static boolean isDisplayable(int c) Returns true if the character is in displayable character range- Parameters:
c
- the character- Returns:
- true if the character is in displayable character range
-
isAllBlank
Returns true if all the given sequences are either null or only whitespace- Parameters:
sequences
- the sequences to check- Returns:
- true if all the given sequences are either null or only whitespace.
- See Also:
-
characterToString
Converts the character into a string. If the character is special, it will actually render the character. For example, given '\n' the output would be "\\n".- Parameters:
c
- the character to convert into a string- Returns:
- the converted character
-
countOccurrences
Returns a count of how many times the 'occur' char appears in the strings.- Parameters:
string
- the string to look insideoccur
- the character to look for/- Returns:
- a count of how many times the 'occur' char appears in the strings
-
equals
-
endsWithWhiteSpace
-
toQuotedString
Generate a quoted string from US-ASCII character bytes assuming 1-byte chars.Special characters and non-printable characters will be escaped using C character escape conventions (e.g., \t, \n, \\uHHHH, etc.). If a character size other than 1-byte is required the alternate form of this method should be used.
The result string will be single quoted (ie. "'") if the input byte array is 1 byte long, otherwise the result will be double-quoted ('"').
- Parameters:
bytes
- character string bytes- Returns:
- escaped string for display use
-
toQuotedString
Generate a quoted string from US-ASCII characters, where each character is charSize bytes.Special characters and non-printable characters will be escaped using C character escape conventions (e.g., \t, \n, \\uHHHH, etc.).
The result string will be single quoted (ie. "'") if the input byte array is 1 character long (ie. charSize), otherwise the result will be double-quoted ('"').
- Parameters:
bytes
- array of bytescharSize
- number of bytes per character (1, 2, 4).- Returns:
- escaped string for display use
-
startsWithIgnoreCase
Returns true if the given string starts withprefix
ignoring case.Note: This method is equivalent to calling:
string.regionMatches(true, 0, prefix, 0, prefix.length());
- Parameters:
string
- the string which may contain the prefixprefix
- the prefix to test against- Returns:
- true if the given string starts with
prefix
ignoring case.
-
endsWithIgnoreCase
Returns true if the given string ends withpostfix
, ignoring case.Note: This method is equivalent to calling:
int startIndex = string.length() - postfix.length(); string.regionMatches(true, startOffset, postfix, 0, postfix.length());
- Parameters:
string
- the string which may end withpostfix
postfix
- the string for which to test existence- Returns:
- true if the given string ends with
postfix
, ignoring case.
-
containsAll
Returns true if all the givensearches
are contained in the given string.- Parameters:
toSearch
- the string to searchsearches
- the strings to find- Returns:
- true if all the given
searches
are contained in the given string.
-
containsAllIgnoreCase
Returns true if all the givensearches
are contained in the given string, ignoring case.- Parameters:
toSearch
- the string to searchsearches
- the strings to find- Returns:
- true if all the given
searches
are contained in the given string.
-
containsAnyIgnoreCase
Returns true if any of the givensearches
are contained in the given string, ignoring case.- Parameters:
toSearch
- the string to searchsearches
- the strings to find- Returns:
- true if any of the given
searches
are contained in the given string.
-
indexOfWord
Returns the index of the first whole word occurrence of the search word within the given text. A whole word is defined as the character before and after the occurrence must not be a JavaIdentifierPart.- Parameters:
text
- the text to be searched.searchWord
- the word to search for.- Returns:
- the index of the first whole word occurrence of the search word within the given text, or -1 if not found.
-
isWholeWord
Returns true if the substring within the text string starting at startIndex and having the given length is a whole word. A whole word is defined as the character before and after the occurrence must not be a JavaIdentifierPart.- Parameters:
text
- the text containing the potential word.startIndex
- the start index of the potential word within the text.length
- the length of the potential word- Returns:
- true if the substring within the text string starting at startIndex and having the given length is a whole word.
-
convertTabsToSpaces
Convert tabs in the given string to spaces using a default tab width of 8 spaces.- Parameters:
str
- string containing tabs- Returns:
- string that has spaces for tabs
-
convertTabsToSpaces
Convert tabs in the given string to spaces.- Parameters:
str
- string containing tabstabSize
- length of the tab- Returns:
- string that has spaces for tabs
-
toLines
Parses a string containing multiple lines into an array where each element in the array contains only a single line. The "\n" character is used as the delimiter for lines.This methods creates an empty string entry in the result array for initial and trailing separator chars, as well as for consecutive separators.
- Parameters:
str
- the string to parse- Returns:
- an array of lines; an empty array if the given value is null or empty
- See Also:
-
toLines
Parses a string containing multiple lines into an array where each element in the array contains only a single line. The "\n" character is used as the delimiter for lines.- Parameters:
s
- the string to parsepreserveTokens
- true signals to treat consecutive newlines as multiple lines; false signals to treat consecutive newlines as a single line break- Returns:
- an array of lines; an empty array if the given value is null or empty
-
toFixedSize
Enforces the given length upon the given string by trimming and then padding as necessary.- Parameters:
s
- the String to fixpad
- the pad character to use if padding is requiredsize
- the desired size of the string- Returns:
- the fixed string
-
pad
Pads the source string to the specified length, using the filler string as the pad. If length is negative, left justifies the string, appending the filler; if length is positive, right justifies the source string.- Parameters:
source
- the original string to pad.filler
- the type of characters with which to padlength
- the length of padding to add (0 results in no changes)- Returns:
- the padded string
-
indentLines
Splits the given string into lines using\n
and then pads each string with the given pad string. Finally, the updated lines are formed into a single string.This is useful for constructing complicated
toString()
representations.- Parameters:
s
- the input stringindent
- the indent string; this will be appended as needed- Returns:
- the output string
-
findWord
Finds the word at the given index in the given string. For example, the string "The tree is green" and the index of 5, the result would be "tree".- Parameters:
s
- the string to searchindex
- the index into the string to "seed" the word.- Returns:
- String the word contained at the given index.
-
findWord
Finds the word at the given index in the given string; if the word contains the given charToAllow, then allow it in the string. For example, the string "The tree* is green" and the index of 5, charToAllow is '*', then the result would be "tree*".If the search yields only whitespace, then the empty string will be returned.
- Parameters:
s
- the string to searchindex
- the index into the string to "seed" the word.charsToAllow
- chars that normally would be considered invalid, e.g., '*' so that the word can be returned with the charToAllow- Returns:
- String the word contained at the given index.
-
findWordLocation
-
isWordChar
public static boolean isWordChar(char c, char[] charsToAllow) Loosely defined as a character that we would expected to be an normal ascii content meant for consumption by a human. Also, provided allows chars will pass the test.- Parameters:
c
- the char to checkcharsToAllow
- characters that will cause this method to return true- Returns:
- true if it is a 'word char'
-
findLastWordPosition
Finds the starting position of the last word in the given string.- Parameters:
s
- the string to search- Returns:
- int the starting position of the last word, -1 if not found
-
getLastWord
Takes a path-like string and retrieves the last non-empty item. Examples:- StringUtilities.getLastWord("/This/is/my/last/word/", "/") returns word
- StringUtilities.getLastWord("/This/is/my/last/word/", "/") returns word
- StringUtilities.getLastWord("This.is.my.last.word", ".") returns word
- StringUtilities.getLastWord("/This/is/my/last/word/MyFile.java", ".") returns java
- StringUtilities.getLastWord("/This/is/my/last/word/MyFile.java", "/") returns MyFile.java
- Parameters:
s
- the string from which to get the last wordseparator
- the separator of words- Returns:
- the last word
-
toString
Converts an integer into a string. For example, given an integer 0x41424344, the returned string would be "ABCD".- Parameters:
value
- the integer value- Returns:
- the converted string
-
toStingJson
Creates a JSON string for the given object using all of its fields. To control the fields that are in the result string, seeJson
.This is here as a marker to point users to the real
Json
String utility.- Parameters:
o
- the object for which to create a string- Returns:
- the string
-
toStringWithIndent
-
mergeStrings
Merge two strings into one. If one string contains the other, then the largest is returned. If both strings are null then null is returned. If both strings are empty, the empty string is returned. If the original two strings differ, this adds the second string to the first separated by a newline.- Parameters:
string1
- the first stringstring2
- the second string- Returns:
- the merged string
-
trim
Limits the given string to the givenmax
number of characters. If the string is larger than the given length, then it will be trimmed to fit that length after adding ellipsesThe given
max
value must be at least 4. This is to ensure that, at a minimum, we can display the "..." plus one character.- Parameters:
original
- The string to be limitedmax
- The maximum number of characters to display (including ellipses, if trimmed).- Returns:
- the trimmed string
- Throws:
IllegalArgumentException
- If the givenmax
value is less than 5.
-
trimTrailingNulls
-
trimMiddle
Trims the given string themax
number of characters. Ellipses will be added to signal that content was removed. Thus, the actual number of removed characters will be(s.length() - max) + "..."
length.If the string fits within the max, then the string will be returned.
The given
max
value must be at least 5. This is to ensure that, at a minimum, we can display the "..." plus one character from the front and back of the string.- Parameters:
s
- the string to trimmax
- the max number of characters to allow.- Returns:
- the trimmed string
-
fixMultipleAsterisks
This method looks for all occurrences of successive asterisks (i.e., "**") and replace with a single asterisk, which is an equivalent usage in Ghidra. This is necessary due to some symbol names which cause the pattern matching process to become unusable. An example string that causes this problem is "s_CLSID\{ADB880A6-D8FF-11CF-9377-00AA003B7A11}\InprocServer3_01001400".- Parameters:
value
- The string to be checked.- Returns:
- The updated string.
-
isValidCLanguageChar
public static boolean isValidCLanguageChar(char c) Returns true if the character is OK to be contained inside C language string. That is, the string should not be tokenized on this char.- Parameters:
c
- the char- Returns:
- boolean true if it is allows in a C string
-
isAsciiChar
public static boolean isAsciiChar(char c) Returns true if the given character is within the ascii range.- Parameters:
c
- the char to check- Returns:
- true if the given character is within the ascii range.
-
isAsciiChar
public static boolean isAsciiChar(int codePoint) Returns true if the given code point is within the ascii range.- Parameters:
codePoint
- the codePoint to check- Returns:
- true if the given character is within the ascii range.
-
convertEscapeSequences
Replaces escaped characters in a string to corresponding control characters. For example a string containing a backslash character followed by a 'n' character would be replaced with a single line feed (0x0a) character. One use for this is to to allow users to type strings in a text field and include control characters such as line feeds and tabs. The string that contains 'a','b','c', '\', 'n', 'd', '\', 'u', '0', '0', '0', '1', 'e' would become 'a','b','c',0x0a,'d', 0x01, e"- Parameters:
str
- The string to convert escape sequences to control characters.- Returns:
- a new string with escape sequences converted to control characters.
- See Also:
-
convertControlCharsToEscapeSequences
Replaces known control characters in a string to corresponding escape sequences. For example a string containing a line feed character would be converted to backslash character followed by an 'n' character. One use for this is to display strings in a manner to easily see the embedded control characters. The string that contains 'a','b','c',0x0a,'d', 0x01, 'e' would become 'a','b','c', '\', 'n', 'd', 0x01, 'e'- Parameters:
str
- The string to convert control characters to escape sequences- Returns:
- a new string with all the control characters converted to escape sequences.
-
convertCodePointToEscapeSequence
Maps known control characters to corresponding escape sequences. For example a line feed character would be converted to backslash '\\' character followed by an 'n' character. One use for this is to display strings in a manner to easily see the embedded control characters.- Parameters:
codePoint
- The character to convert to escape sequence string- Returns:
- a new string with equivalent to escape sequence, or original character (as a string) if not in the control character mapping.
-
wrapToWidth
Wrap the given string at whitespace to best fit within the given line widthIf it is not possible to fit a word in the given width, it will be put on a line by itself, and that line will be allowed to exceed the given width.
- Parameters:
str
- the string to wrapwidth
- the max width of each line, unless a single word exceeds it- Returns:
-