Version Tracking in Ghidra

When a binary is reverse engineered using Ghidra, various annotations are applied to aid in understanding the binary’s behaviour. These annotations come in the form of comments, renamed functions, variables, arguments and more. Collectively these annotations are known as “markup” and are specific to a single binary in the Ghidra project.

For long running reverse engineering projects, newer versions of a binary may be released periodically, which introduces the problem of starting the reverse engineering process from scratch against the latest version. This is far from ideal and can lead to a lot of repeated reverse engineering effort. Applying markup from the previous version(s) of the binary could save a significant amount of time.

This is where Ghidra’s Version Tracking tool comes in handy. The Version Tracking tool works by using “correlators” to match specific parts of a source binary to parts of a destination binary. If a match is valid, then the markup from the source binary can be applied to the destination binary in a semi-automated way.

This blog post walks through using the Version Tracking tool to apply markup from an old binary to a newer one, and explains the main concepts along the way.

Creating a Version Tracking Session

Before starting a Version Tracking session make sure to have the older binary (with markup) and the newer binary loaded into the same Ghidra project. Auto analysis should have already been run on the newer binary (although this can be performed later during the version tracking session).

Next, load the relevant Ghidra project and press the footsteps icon to open the Version Tracking tool.

Open the version tracking tool

This will load the main Version Tracking window in which a new session needs to be created. To do this, press the footsteps icon.

Start the new tracking session

Fill out the required information, and specify the source program which has the markup and the destination program which doesn’t have markup.

Select the source and destination programs

For this example, multiple versions of the open source firmware for the Prusa Mini 3D printer were compiled. To simulate a binary with significant markup an older version (v4.0.3) of the firmware was compiled with debugging symbols. All other versions of the firmware were compiled with no debugging symbols and were stripped.

Press “Next >>” and run the “Precondition Checks” in the new window.

Precondition checks

These checks attempt to spot any obvious problems between the source and destination program that may cause issues when transferring markup. One such example is the difference in number of memory blocks each program has and the associated memory block permissions.

After the checks have run, press the “Next >>” button then press “Finish”. This will load an additional two windows which are similar to the usual Code Browser window. These windows hold the source and destination binaries.

The title of the windows will include either “(SOURCE TOOL)” or “(DESTINATION TOOL)” to make it obvious which tool contains which binary.

Preparing to Apply the Markup

Before applying the markup from the older version (v4.0.3) to the newer version (v5.0.1), its worth looking at what each version contains.

As mentioned previously, the older version was compiled with symbols to simulate a binary that has lots of markup. This older version has 2927 functions which are all correctly named and have the correct types for arguments.

Source binary with functions marked up

The newer version which was compiled with no symbols and was stripped contains 4869 functions which all have the default Ghidra function names of FUN_xyz where xyz is the address of the function.

Destination binary with no functions marked up

To apply the markup to the newer binary, the wand icon inside the main Version Tracking window needs to be pressed. This will automatically run a set of “correlators” that attempt to find matches between the source and destination program, and if a good match is found the markup will be automatically accepted. “Accepted” means the match is classed as valid in the version tracking session but this doesn’t automatically apply the markup to the destination binary.

Run correlators wand icon

Before running the correlators, a few options need to be tweaked to force the Version Tracking tool to overwrite function signatures and parameters in the destination binary. Without these changes only the destination function name will change and not the parameters and return type.

In the main Version Tracking window, navigate to “Edit -> Tool Options” and change the options to the values shown in the screenshot below.

Version Tracking options

After pressing the wand icon to start the automatic version tracking, a set of eight correlators will run which include finding matches for similar blocks of data, instructions and mnemonics. Additional correlators can be activated by pressing the “+” button in the main Version Tracking window. These are worth a look into (especially the BSim correlator) but for now the default selection will be used.

Once the correlators have run, the main Version Tracking window will be populated with the results. The results will include matches for functions and defined data. Multiple correlators can find the same matches, therefore there may be multiple entries in the results window for the same match. The column on the right most side will show which correlator found the match.

Results

Scores and Confidence Values

The “Score” and “Confidence” values in the results table can be used to quickly filter matches that are likely correct or incorrect.

The “Score” value represents how similar a match is in the destination binary and ranges from -1.0 to 1.0, with 1.0 being a perfect match. Each correlator will calculate and return a score for any potential matches it identifies.

For example if the “Exact Function Bytes” correlator identified a function in the destination binary that contained exactly the same bytes as a function in the source binary, the correlator would return a perfect score of 1.0 for that particular match. Depending on the correlator, a perfect score of 1.0 doesn’t guarantee a correct match especially when looking at short functions.

The “Confidence” value represents how likely a match is to be correct. Confidence values range from -9.999 to 9.999 with each correlator calculating and returning the confidence value for each match it identified.

Since correlators calculate the score and confidence values using individual correlator specific algorithms, it is not recommended to directly compare score/confidence values returned from multiple correlators for a conflicting match. For example, a confidence value of 0.8 returned from one correlator may have an actual confidence that is higher than a confidence value of 1.0 returned from a different correlator. In these situations the Ghidra documentation for the correlators in question can be very helpful.

Looking at the results window, the “Score Filter” and “Confidence Filter” towards the bottom right can be used to show likely matches. Starting off by entering a score filter of 1 and a confidence of 1, which will show likely correct matches.

Score and Confidence filters

Pressing each row in the results table will show decompiler and listing diff view at the bottom of the main Version Tracking window. This can be used to quickly determine whether or not a function is a valid match or not. The left side is the source binary and the right is the destination.

Decompiler diff view

Since this match has a high score and confidence value the Version Tracking tool has automatically applied the markup, which is why the source and destination decompilation look identical, including the called function (remember_feedrate_and_scaling()) and the global variable (feedrate_percentage). The markup has only been applied inside the Version Tracking tool session and not in the actual destination binary (yet).

Looking at the listing diff view for the above function shows there are a few differences in the addresses used but overall the function is still a match.

Listing diff view

Applying the Markup

For each row in the results table the suggested match must be either accepted (if it hasn’t been automatically) or rejected so that the Version Tracking tool knows which markup needs to be applied and which should be ignored.

One quick way to achieve this is to sort the results table on the “Confidence” value by pressing the “Confidence” column heading (don’t forget to remove the “Confidence Filter” if its still applied).

Any match with a confidence value of less than 0 is likely invalid and should be rejected, however take care with smaller functions as they can have a low confidence value but are valid matches. Also, as mentioned previously, if multiple correlators present a conflicting match then just picking the highest confidence value for the match isn’t recommended. Instead, review the Ghidra documentation for the correlators and inspect the decompiler and listing diff views.

With the results sorted, rejecting the low confidence matches is as simple as multi-selecting the relevant matches in the table and then right clicking and pressing “Reject”.

Rejecting low confidence matches

This will change the “Status” of the selected matches to “Rejected” and therefore the Version Tracking tool will not apply the markup for these matches in the destination binary.

At this point its worth glancing over the matches with a confidence value greater than 1 to ensure they are truly matches. More often than not the matches will be valid, however be careful when dealing with conflicts where the Version Tracking tool found multiple matches and has automatically accepted a match that isn’t correct. Also, keep an eye on the “Score” value as high confidence values don’t always coincide with a perfect score of 1.

The remaining matches with confidence values between 0 and 1 should be manually inspected and either accepted or rejected by right clicking the relevant row in the results window and pressing “Accept” or “Reject”.

Once this process is complete, clear any filters that have been set on the results window, press “Ctrl+A” to select every row in the results table, and then right click “Apply Markup” to persist the changes to the destination binary. Now when the destination binary is opened up in the normal Ghidra Code Browser window, the markup will have been applied.

Apply the markup to the destination binary

Anything marked as “Rejected” will not be applied to the destination binary.

Conclusion

The Version Tracking tool is a great way to ensure previous work isn’t lost when a new version of the target binary is released. Whilst this post described applying markup from an older version of the binary to a newer version, there is no reason why it can’t be the other way around.

Also, if multiple versions of the binary have been released it may be best to apply the markup to the binaries in chronological order. This will likely minimise the changes between each version and therefore allow the version tracking to have the best chance of migrating the markup from the oldest version all the way through to the latest version.