BSim Search

The BSim Search features allows the user to perform similar function searches against an existing BSim database. See the BSim Overview for a full description of BSim. This section describes the actions and related GUIs for conducting either an overview search or a similar functions search and the follow on actions that can be performed on the search results.

Enabling the BSim Search Plugin

The BSim Database feature comes with an Ghidra GUI interface for initiating searches. This GUI integrates with Ghidra via a plug-in that can be added to the main Ghidra Code Browser tool. This plug-in is currently called the BSimSearchPlugin within Ghidra and can be enabled from within the Configure Toolmenu. If the plug-in is enabled, the Code Browser will contain a BSim menu with actions for managing BSim database definitions, performing an overview query, or perfrorming a similar functions search. Also, the   will appear in the toolbar, which is a shortcut for search similar fuctions action.

Defining and Managing BSim Databases

Before a BSim overview or search can be performed, one or more BSim database specifications must be defined. BSim database specifications are managed by the BSim Server Manager dialog.

The server dialog can be invoked either from main toolbar, BSim->Manage Servers, or using the button in either the BSim Overview or BSim Search dialogs.






The dialog displays a table showing all the currently defined BSim databases/servers. Each entry shows a name for the BSim database, its type (postgres, elastic, or file), a host ip and port (if applicable), and finally the number of active connections.

There are four primary actions for this dialog:

Defining a new BSim server/database

Pressing the brings up the Add BSim Server dialog



Choose the type of BSim database first as that will affect the type of information that needs to be specified. For postgres and elastic, you need to enter host and port. For file, you see a button for using a filechooser to pick the file that is the local BSim H2 database.

BSim Overview Query

The BSim Overview action does a search against all functions in the current program, but instead of returning all the specific results for each function, it returns a count of the matching results for each function.

To invoke an overview search dialog, select BSim->Overview...





To start the overview task, select a predefined BSim database server from the combo box or press the button to bring up the Manage BSim Servers dialog. Then adjust the similarity and confidence settings as desired, and press the Overview button. See the settings for the Similar Functions Search for more information about similarity and confidence values.

BSim Overview Results

After performing a BSim Overview Query, the BSim Function Overview window will appear.



The table displays an entry for each function that had at least one hit. Each entry displays the address of the function, its name, the number of hits BSim found and the self significance of the function

The action pops up a dialog displaying the search criteria used to generate this overview results set.

The action controls whether or not single clicking in the table will navigate the listing to the selected function. Even if the action is off, navigate will still occur on a double-click.

The action will create a selection in the listing for the selected row(s).

There are also several pop-up actions that work on the selected rows.

When trying to use the Executable Match Summary to determine if there is a significant match between the currently active program and executables in the database, there are two potential sources of noise in the resulting scores. Very small functions can produce false positives which artificially increase the confidence score for a particular executable. Also, some functions, like those provided by standard language libraries, may be used by a large portion of the executables in the corpus, and incorporating their matches into confidence scores obscures more significant matches. In both cases, the functions have an overly large number of matches in the database. This table, preferably sorted on the Hit Count column, serves as an overview of what, within the context of the active program, are the most common and least common functions in the database. This makes it easy to filter out precisely these kinds of problem functions.

The standard procedure is to select an upper-bound for a function's Hit Count, select every row in the table below that threshold, and then transfer that selection to the main Code Browser window by clicking on the Make a selection icon in the upper right corner of the table. Then, with selection active, invoke the Search Similar Functions.

BSim Similar Function Search

The BSim Similar Function Search action performs a BSim search against one or more functions in the current program. If there is a selection, it will search all functions in the selection; otherwise it will search on the one function containing the cursor (If the cursor is not in the body of any function, an error dialog will appear).

To invoke an overview search dialog, select BSim->Search Functions... or press the   button in the main toolbar.




This dialog allows you to configure the BSim search. The fields are as follows:

Standard Fields

Filters

Filters allow the search to be further restricted by allowing the user to choose from a list of predefined filter criteria and then specify a value for that critera. Supported filters include:<\P>

Most of these filters also have NOT versions where only functions that don't match the criteria are included in the results.

Once all the fields have valid values, press the Search button to initiate the BSim Function search.

A BSim search can also be initiated from either the listing or decompiler by right-clicking to bring up the popup menu and selecting either BSim->Search Function(s) or BSim->Search Function(s)... The only difference is whether or not to bring up the BSim Search Dialog before performing the search. Once one search has been done, subsequent searches can be done using the same settings as the previous search without bringing up the dialog.

When this action is invoked from the listing, it will apply the function containing the cursors, unless there is a selection, in which case it applies to all functions in the selection. When the action is invoked from the decompiler, it will apply to the function whose name is directly under the cursor.

Similar Function Search Results

After initiating a BSim Similar functions search, a BSim Search Results Window will appear.


There are two panels associated with each result set. The top panel is the Function Matches table and the bottom table is the Executables Summary Table. The Executables Summary Table can be hidden using the tool bar button. Each row will show columns pertaining to the particular match including, the name of the original function queried, the name of the matching function, and the corresponding similarity score. If a single function produces more than one match, each match will produce a separate row. Clicking on the column headers will sort the results on that column, and clicking on individual rows will navigate in the Code Browser to the original function that produced that particular match.

Function Matches Panel

The Function Matches Panel displays one function match result per row. There can be multiple rows/matches for any queried function. Each row displays the name of the function being queried, the name of the matching function, its associated match scores, and other related information described below. Clicking on column headers will sort on that column and clicking on a row will navigate the tool to function being queried. The columns include (not all are visible by default):

Executable categories added to the specific database instance will also be available here as additional columns. See Executable Categories. The column name will match the formal category name, and the string values can be sorted like any other column. It is possible for multiple values to be assigned to the same category for a single executable. In this case, the results table will still display a single column, but the cell will display all the values as a sorted and comma-separated list. If a new date column is specifically added to the database, this will replace the existing column called 'Ingest Date'. In either case, this column will sort and filter as a proper date.

Each function tag registered with the BSim instance will produce an additional column available here. See Function Tags. The column will be labeled with the tag name, and the row entry will be a check-box, indicating whether the tag was present for that function or not.

Executable Match Summary

The Executable March Summary table displays a row for each executable that has at least one matching function for the queried function(s). Every function returned as a match is associated with its own executable. This table lists exactly one row for every executable associated with some function match, even if there is more than one such match. Many of the columns are the same as for the Function Match table, but there are two columns that show an aggregated value over all function matches that share that same executable.

The executables with the highest aggregate confidence scores share the highest amount of functionality with the subset of functions in the active program that were queried. Users should be aware that this shared functionality is not necessarily the most important functionality. Small functions can produce false positive matches that artificially inflate a confidence score, and matches to library functions increase the score even though the shared functionality is not significant. Proper filtering of the queried subset and of the results may be crucial to getting a meaningful result. See The Overview Query.

Actions

Toolbar Actions

Popup Actions on Functions Table

Popup Actions on Executables Summary Table

Comparing Functions

For an interesting function match, the user can invoke the CodeDiffPlugin in order to display the decompilation of the two matching functions side-by-side and highlight the differences between them. In order for this to happen, the tool needs to have access to the matching executable. The executable can be pulled in automatically if the Ghidra server corresponding to the BSim Database is running. Every executable record in the database has a URL field providing the host, repository, and path for retrieval. If the executable records were ingested from a Ghidra server using the standard tools, this field should be populated correctly.

The code comparison is triggered by right-clicking on a particular entry in the Function Match table and selecting Compare Function from the resulting pop-up. If the Ghidra server containing the executable is running, it will be loaded as a separate program directly into the current Code Browser and a comparison window will be displayed. If the Ghidra server is not running, or if the URL field is missing from the record, the comparison will still be triggered if the matching executable is loaded manually as a program in the same Code Browser. The menu action will identify the executable by name.

Loading Executables

An executable can be loaded into the Code Browser, without immediately triggering a function comparison, by right-clicking on a row in the Executables Summary table and selecting Load Executable from the corresponding pop-up menu.

Authentication

Depending on the configuration of the database (See Security and Authentication), the user may need to authenticate themselves with the BSim server. This check will be performed immediately upon selected a server definition. If the server requires a password, a separate dialog will be brought up.

By default Ghidra will connect as the username reported by the OS, but in the password dialog, a different username can be entered if this doesn't match the account established on the server. The title bar of the main dialog indicates the username being used for the current connection.

If the BSim server requires PKI authentication, the user must register their certificate with the Ghidra client. This is accomplished from the main Project window by selecting Set PKI Certificate... from the Edit menu and pointing the dialog at the certificate file. The same certificate is used for authenticating with BSim and with a Ghidra server, if either require PKI. Ghidra will typically bring up a password dialog once per session to unlock the certificate at the first point it is required.