C-Parser

The C-Parser plugin can be used to extract data type information from C-Header files.  Data type definitions such as structures, enums, typedefs, and function signatures are all extracted.  The extracted data types can be added to a currently open program, or saved to a data type archive for re-use in multiple programs.  An archive file can also be imported into an open project as a shared data type archive.

Why would you want to do this?

One benefit of using the C-Parser over other methods of data type extraction (ie. debug information embedded in a program), is all defines that have an integer value defined are added to the archive as an Equate value.  For example, #defines are sometimes used to setup error return codes.  These can be very useful to have when annotating a program undergoing reverse engineering.

Dialog: C Parser

The C-Parser has a C-Preprocessor(CPP) phase and a parsing phase.  During the CPP phase, traditional CPP directives (#define, #ifdef, etc...) are used to build up a single file that has all CPP #define macro directives expanded just as in a normal compilation process.  The second phase actually parses the output of the CPP process according to C syntax, extracting the actual data type definitions.

The CPP macro expanded file is placed in the user's home directory and called CParserPlugin.out.  This file is VERY useful for debugging parsing problems, include order, and necessary "-D" directives.  Every attempt is made to include line number information for each included file as part of this larger file.

The C-Parser has been successfully used on Visual Studio, GCC, and Objective-C header files.  The include files for GCC, Windows, MacOS, and ANSI C were all parsed with the C-Parser plugin.  Most vanilla C-Header files can be parsed using the C-Parser.  However, just as in C software development the correct include order and "-D" pre-defines must be specified.  Getting this correct can be much like porting an application from one platform to another (linux program to visual studio).  The first time you compile the ported program you will get all sorts of data type undefined errors, because the new platform has some data type defined in a different header file or include location than the original platform.

Setting up to Parse

The C-Parser dialog has three sections:

 The newly created data type archive will become dependent on any data type archives currently open in the Data Type Manager.  For example, the Cocoa data type archive is dependent on the mac_osx data type archive.

Dialog: Use Open Archives?

It is strongly suggested that the basic data type archive for the particular platform be open in the data type manager.  When parsing C-Header files, undefined types will be used from the archives.  If you are unsure of your target platforms core data types, it is suggested that the "generic_C_lib" archive, which defines ANSI-C functions and data types, be open.

C++ Header files

There currently is no support for parsing the information from C++ header files.  It is possible to import information from C++ header files by compiling a program with the desired header files included and the Debug option turned on for the compiler.  After successful compilation and linking, import the program into Ghidra.  If the debug format is supported fully, all function signatures and data types information that is used in the program should be preserved.  Extracting data type information and function signatures in this way does not recover as much information as parsing full C-Header files, however, it can more accurately layout structure definitions.

Tips:

Getting a new set of header files to parse can be frustrating.  Make use of the CParserPlugin.out file produced in your home directory.

Use the Line numbers to determine where in the file the parse error occurred.

The last valid data parsed displayed in the parse error dialog can be useful as well.  Search for it in the CParserPlugin.out file and then look at the next defined data type for a parse error.

Parse Error dialog

You can use the "-D<name>=<value>" directive to "define" away or redefine nasty compiler specific directives like "__builtin_va_list" to "void"
"-D__builtin_va_list=void".

When adding a file to the source files to parse list, you can specify a directory.  Every file in the directory will be added to the list of source files.  This is very useful if the original programmers did a good job protecting against double inclusion of header files.  This is the norm in most modern day source code, but was definitely not always the case.