Going down the rabbit hole with go-fuzz

Fuzzing has been a well known automated testing technique for decades, especially among security professionals. It used to be “dumb”, meaning it was only capable of generating random data and feeding it to the program that was being tested. It has evolved a lot in the last few years. Modern fuzzing tools now track code coverage and leverage symbolic execution to systematically explore different code paths in the program (watch The Smart Fuzzer Revolution if you are interested in the state of the art in the field). american fuzzy lop brought fuzzing to the masses. It didn’t have any revolutionary features, but thanks to it’s ease of use, we are now living in the fuzzing renaissance.

Inspired by AFL, Dmitry Vyukov created go-fuzz, which is a fuzz testing tool for the Go programming language. It’s one of the most useful tools in the Go toolchain, but it’s not as widely known as it should be. I discovered it a long time ago and immediately knew that it was pretty impressive library, but I didn’t have the time to play with it back then. Ever since that, it’s GitHub page has been sitting in my bookmarks, until few weeks ago I found this great article that explained go-fuzz usage step by step. I decided to finally give it a try.

Getting started

Other people have already written good go-fuzz tutorials, so I won’t go into all the details of using go-fuzz in this post, but I will point you to the articles and talks that will get you up to speed.

The best way to get started with go-fuzz is to read Damian Gryski’s great article go-fuzz github.com/arolek/ase. Another great starting point is GitHub page for go-fuzz. For a slightly more advanced usage, you should read Filippo Valsorda’s adventures with DNS parsing in his blog post DNS parser, meet Go fuzzer (if you prefer watching videos, you can find his GopherCon 2015 talk about it here). Finally, if you want to know more about go-fuzz internals, watch the author’s presentation about Go Dynamic Tools.

Fuzzing Runtastic Archiver

Naturally, I wanted to start by fuzzing my own Go programs. Fuzzing is the most effective for finding bugs in code that parses complex data (it can be used in a lot of other ways, but that won’t be the topic today). The closest thing to that was parsing custom binary protocol in Runtastic Archiver, which is a tool that downloads Runtastic activities and converts them to TCX or GPX format. Activities are retrieved from the server in JSON format, but the GPS trace field is actually a binary blob that has to be parsed according to some serialization rules. The function that parses the trace has the following signature:

func parseGPSData(trace string) ([]gpsPoint, error)

Functions like this are the ideal candidates for fuzzing in most situations. The binary format in this case was super simple, and the function was really short (just few dozen lines of code), so I didn’t expect to find any crashes in it. Because I didn’t have a better candidate, I decided to fuzz it anyway. The great thing about go-fuzz is that you can place the fuzz function anywhere in your code. That means you can easily test your private functions (notice that parseGPSData is private), thus focusing on the areas that are most likely to produce bugs. Here is the fuzz function that I came up with:

func Fuzz(data []byte) int {
  if _, err := parseGPSData(string(data)); err != nil {
    return 0
  }

  return 1
}

Fuzzing functions follow the same general pattern in most cases. You call the function that you want to test, return 0 if the input was invalid, and 1 if it was valid. go-fuzz will focus on valid inputs, ignore invalid ones, and hopefully catch some inputs that cause panic during the process.

Here is another similar example. Few months ago, I was playing with various parsing algorithms (precedence climbing and Shunting yard), so I wanted to test that package, too. Fuzz function was following the same pattern:

func Fuzz(data []byte) int {
  if _, err := New().Parse(string(data)); err != nil {
    return 0
  }

  return 1
}

In both cases, go-fuzz didn’t find anything. Not because of my mad programming skills, mind you, but because the code that did the parsing was in both cases really simple, so there was never a good chance that it could crash. This was not a satisfactory conclusion; I was on a mission from God to find bugs, so I needed to find some other libraries to fuzz.

Fuzzing UniDoc

I started looking through the list of all dependencies in my Go projects on GitHub, and the ideal candidate for fuzzing showed up immediately. It was UniDoc, the PDF library for Go (it’s also the engine behind FoxyUtils). I was using it for decrypting and merging the individual magazine pages in both Future plc downloader and Zinio DRM removal. UniDoc is a really great library, but PDF is also a incredibly complex format, so I was almost 100% certain that if the library’s authors haven’t fuzzed it, go-fuzz would find some bugs in it. Again, writing the Fuzz function was pretty easy—I just had to convert the input bytes into io.ReadSeeker and parse it by calling pdf.NewPdfReader.

func Fuzz(data []byte) int {
  b := bytes.NewReader(data)

  if _, err := pdf.NewPdfReader(b); err != nil {
    return 0
  }

  return 1
}

The other important thing was to find the good initial corpus of PDF files. I just did a Google search for “PDF sample” and downloaded the first three results. Everything was ready for running go-fuzz. After only a few minutes, first crashers started to appear. I left go-fuzz running for about an hour, after which the final count of crashing inputs was well over 50. Most of them were actually duplicates. go-fuzz keeps only crashes with unique stack traces, but the unique stack traces don’t guarantee that the crash wasn’t happening in the same place. In the case of UniDoc, most of the crashes were in the same function, but the call could happen at arbitrary depth in the call stack, because of the hierarchical nature of PDF format. After looking through all the crashers, number of unique bugs settled at four. Here are all of them:

panic: interface conversion: pdf.PdfObject is nil, not *pdf.PdfObjectInteger
panic: runtime error: makeslice: len out of range
panic: runtime error: invalid memory address or nil pointer dereference
runtime: goroutine stack exceeds 1000000000-byte limit

Within less than two days, library authors fixed all of them, but that was not all: I suggested to them to fuzz the library with larger initial corpus, which helped them find six additional bugs! Thanks to go-fuzz, we now have much more reliable PDF library for Go.

At this moment I was wondering whether I should continue playing with go-fuzz and some different libraries, or just call it a day. It was a fun and useful process, so I decided to test one more library.

Fuzzing goftp

The next interesting library was goftp, which I used to write LFTP server. Writing FTP clients involves some non-trivial parsing, so even though the library was small, I was hoping I would find something interesting. This time, the process was a little bit more involved. The library is designed to connect to real FTP servers, but I wanted to test only the parsing code, which was internal to the package. As I said earlier, the great thing about go-fuzz is that you can easily test even private methods. Combine that with the open source nature of Go packages and you’ll get the ability to go get any package and test its private methods, which was exactly what I did. In the parse.go file I found this function:

func parseListLine(line string) (*Entry, error)

That’s exactly what I was looking for! Now I had to find some test inputs. The most common way to find them is to just take the data that you feed your tests with and create a separate file for each test input. Fortunately, the FTP package was well tested, so I had more that 20 tests in the initial corpus. Fuzz function was again pretty easy to write:

func Fuzz(data []byte) int {
  if _, err := parseListLine(string(data)); err != nil {
    return 0
  }

  return 1
}

After few minutes, go-fuzz found two crashing inputs, but stayed at that number even after one hour. Finding two crashes was good enough for me, so I stopped the fuzzing process and started looking through inputs that caused them. In the first file, the input that broke the line parser was 000000000x (notice the whitespace at the end—this would be really difficult to find without fuzzing). I filed an issue on GitHub, and it was quickly resolved. But the second crash turned out to be much more interesting.

Crash in the standard library

Here is the input that caused the second crash:

-000000000 0 r 0 0 --- 4 0000

And here’s the stack trace after the crash:

panic: runtime error: index out of range

goroutine 1 [running]:
time.parse(0x10e4643, 0x13, 0xc42000c320, 0x12, 0x115d800, 0x115fa80, 0xc42000c320, 0xc42000c328, 0xc420049cd0, 0x0, ...)
	/usr/local/Cellar/go/1.8.3/libexec/src/time/format.go:1015 +0x2c77
time.Parse(0x10e4643, 0x13, 0xc42000c320, 0x12, 0xc42000c320, 0x12, 0xc0, 0xc0, 0x10bd700)
	/usr/local/Cellar/go/1.8.3/libexec/src/time/format.go:743 +0x68
github.com/jlaffaye/ftp.(*Entry).setTime(0xc4200163c0, 0xc42008a050, 0x3, 0x7, 0x0, 0x0)
	/Users/Metalnem/Go/src/github.com/jlaffaye/ftp/parse.go:237 +0x1c9
github.com/jlaffaye/ftp.parseLsListLine(0x10e64b4, 0x1d, 0x114c200, 0xc4200107d0, 0xc420010701)
	/Users/Metalnem/Go/src/github.com/jlaffaye/ftp/parse.go:135 +0x575
github.com/jlaffaye/ftp.parseListLine(0x10e64b4, 0x1d, 0x100499c, 0xc42001a0b8, 0x0)
	/Users/Metalnem/Go/src/github.com/jlaffaye/ftp/parse.go:209 +0x68
github.com/jlaffaye/ftp.ParseListLine(0x10e64b4, 0x1d, 0x10c63e0, 0xc420014660, 0x0)
	/Users/Metalnem/Go/src/github.com/jlaffaye/ftp/parse.go:218 +0x35
main.main()
	/Users/Metalnem/Go/src/github.com/jlaffaye/ftp/main/main.go:11 +0x3a
exit status 2

Wait, what? Top of the stack was not in the FTP library, but in src/time/format.go file, which belongs to the Go standard library! That was the last thing I expected—the Go standard library has been extremely well tested with go-fuzz.

I called the parseListLine function again with the line that was causing the crash, but this time with the debugger attached, because I wanted to isolate the string that made the time.Parse function panic. Here is the call that was causing the panic:

time.Parse("_2 Jan 06 15:04 MST", "4 --- 00 00:00 GMT")

Would you ever be able to come up with such silly test case on your own? Another victory for go-fuzz! Again, I filed an issue on GitHub. It was marked as a release-blocker, which meant I was affecting millions of lives by delaying the next release of Go for at least 10 minutes!

Conclusion

If you didn’t fuzz it, you can’t say it’s correct.

— Dmitry Vyukov

It’s really difficult to catch all bugs without fuzzing, no matter how hard you try to test your software. Thanks to go-fuzz, fuzz testing has never been easier (proposals exist to make it even easier by adding fuzzing support as first-class citizen to Go tooling). You should consider fuzzing not just your own code, but also your dependencies—you can always find some interesting bugs that way, as you have seen in this post. Also, don’t forget to add your findings to go-fuzz list of trophies. Happy fuzzing!