Why would you fuzz? People fuzz for many reasons, depending on the industry they are in, from reliable assurance through to testing and validation. In security research, our primary goal is to discover potential vulnerabilities or weaknesses. Fuzzing allows us do this in an automated, if not somewhat less rigorous, manner. This is the first of two entries reviewing fuzzing techniques and tools. The first gives an overview of what fuzzing actually is, while the second will further review some real-world fuzzing tools.
What makes a vulnerability
In a simplistic model, a typical software application can be described as a collection of blocks. Each block is designed to take an input, process it and provide an output. These blocks work together to produce the behaviours that make the application do whatever it is the application is meant to do.
There are many limitations and intricacies to be aware of when creating these blocks of logic and often these can be overlooked. Due to this, irregular input can result in behaviours that the original developer did not expect. As the result of these behaviours propagates to other blocks that were not originally designed to accept these values new, unplanned scenarios are played out in the software which often result in the application entering a state in which it cannot continue, and ultimately crashes.
Sometimes this is the end of the matter. Sometimes however, with enough knowledge, it is possible to steer these new stories in a manner which makes the software do something entirely counter to the applications’ original goals. This is where vulnerabilities are born.
In the majority of cases, the problems described above occur as the result of “bad input” to the system. This can be as a result of the original developer being unaware of the little details, or making bad assumptions about the input. A classic example of this is where a developer assumes input to the system comes from a “trusted source” such as another system under their control, and doesn’t verify the values being passed to them properly. Eventually, a malicious user finds a way to send data to the system, which is accepted because it is never checked for erroneous values.
A brief introduction to fuzzing
As security researchers, it is often our goal to find these problems before other people who might wish to abuse the situation. In this entry we are going to describe “fuzzing”, one of the techniques available to us that can help identify issues with bad input handling. Input for this tooling often focuses on where computer generated data can enter the system. This could include for example from a file or network connection. It is less concerned with user input, which is typically assumed to be far more irregular and better validated and managed.
The principal behind fuzzing is quite basic. In its most rudimentary form it simply requires identifying a publically exposed interface and pushing random data in to it, and observing the resulting behaviour. If anything exceptional happens, such as the application crashing or ‘freezing’, the input will be stored for a researcher to review. Then the process repeats. It’ll often take many iterations to get results, if you get any at all. As noted before, not all results are useable for malicious purposes, although in some cases simply getting a crash to occur can be useful as a “denial of service” attack vector, if the input can be passed to the application remotely.
Improving the process
Realistically firing entirely random data at a system doesn’t give a good return on investment. This is because data will often be expected to arrive in a specific format and it is highly unlikely that purely random data will align properly to form a valid structure. As a clear example of this, many data formats expect “magic patterns” at the start (or other fixed offsets from the start) to indicate the format of the data, for example, all BMP files are expected to start with the byte values 0x42, 0x4D which correlates to the string “BM” when translated as an ASCII byte sequence. Most BMP viewers will validate this before trying to load the file. Assuming truly random data, passing this check alone will only occur on average once in every 65,535 iterations.
To mitigate this issue and improve our results we can take a reference of valid system input (in the above example, an actual BMP file) and then randomly modify one or more parts of the file. The exact number, length and nature of the modifications can be controlled to suit individual applications, but overall most of our input will now be valid with only a few changes. With some luck, these changes will impact an important structure in an unexpected manner and cause the target application to do something wrong.
Whilst this delivers a far better result, it still has a limited return and we can go further to help our fuzzing mechanism deliver the best results. If we consider a basic BITMAP file for example, this could loosely be described as having a few, small critical control structures at the start of the file, followed by a vast amount of pixel data following. If we randomly alter a reference image, chances are high that our modifications will impact the pixel data rather than the control structures. This might result in a few discoloured pixels, but as each byte in this data can quite validly contain any value in its range, these changes should never result in any abnormal behaviours. As such, many fuzzing mechanisms will give you the ability to target specific regions of the input data and define the format of any structures. Using this method, we are able to target specific properties within the file and help steer our modifications towards causing behavioural changes in the target application. If there are any vulnerabilities, hopefully they’ll fall out of this testing routine and they can be fixed before they are used for other purposes.
To contact Nettitude’s editor, please email email@example.com.