More on static analysis with gcc - meet dehydra

Sunday, 31 May 2009 | Brad Hards

As reported in a previous blog, I've spent some time working with Taras Glek on Dehydra and Treehydra.

It is stabilising and getting to be easier to build. However I thought I'd show a simple example of actually using GCC and Dehydra to check for a policy decision, along the lines of the things that EBN does.

Lets say that we want to avoid use of the strcpy() function. Instead, all developers are supposed to use a xstrcpy() equivalent.

Unfortunately, some developers forget, and use the strcpy() version anyway. It would be useful if the compiler could produce a warning or error message for each use of strcpy().

So we'll create a dehydra script that produces an error message (and stops compilation) when the strcpy() function is used.

There are a few basic callbacks that our script could make use of:

process_decl(decl), which dehydra calls for every global variable, function, or template declaration.
process_function(decl, body), which dehydra calls for each function definition (declarations without bodies are not included), including both top-level functions, class member functions, and inline class member functions.
process_type(type), which dehydra calls for each class, struct, enum, union and typedef declaration.

In the situation we are looking at, we're going to need to look at places where strcpy() might be used, which is in function definitions. So we'll use the process_function callback.

Now since we're not yet very familiar with how dehydra works, it might be useful to look at the various things we'll be working with. Here is an example of a simple javascript script to feed to dehydra: function process_function(decl, body) { /* print out the function name and location */ print("Function found: " + decl.name + " at location: " + decl.loc);

/* Loop over the function body */ for each (let bodyItem in body) { for each (let stmtItem in bodyItem.statements) { print("Statement item: " + stmtItem.name + " at location: " + bodyItem.loc); } } }

If we had a little test program, such as: #include "tutorial1.h"

char* do_something(bool useFirst, char* first, char* second) { char* my_copy; if (useFirst) { xstrcpy(my_copy, first); } else { strcpy ( my_copy, second); } return my_copy; }

and ran the dehydra script as $ g++ -Wall -fplugin=gcc_dehydra.so -fplugin-arg-gcc_dehydra-=error_strcpy_initial.js -c tutorial1.cc -o /dev/null then the output will look something like:

tutorial1.cc: In function ‘char* do_something(bool, char*, char*)’:
tutorial1.cc:7: warning: ‘my_copy’ may be used uninitialized in this function
Function found: do_something(bool, char*, char*) at location: tutorial1.cc:3:7
Statement item: my_copy at location: tutorial1.cc:5:11
Statement item: useFirst at location: tutorial1.cc:6:5
Statement item: xstrcpy(char*, const char*) at location: tutorial1.cc:7:32
Statement item: strcpy(char*, const char*) at location: tutorial1.cc:9:34
Statement item: my_copy at location: tutorial1.cc:11:12

Note that the first two lines are the usual warnings from g++ - the compiler is still doing normal checks.

By looking at the output, we can see how each line in the source code is presented. So to detect use of strcpy(), all we have to do is to compare the statement name to strcpy(char*, const char*), and emit an error message. So by updating the test script to be: function process_function(decl, body) { /* Loop over the function body, which is an array of {loc:, statements:array of Variable Types} / for each (let bodyItem in body) { for each (let stmtItem in bodyItem.statements) { if (stmtItem.name == "strcpy(char, const char*)") { error("Use of strcpy() at " + bodyItem.loc + " (in " + decl.name + ")"); } } } }and running it as g++ -Wall -fplugin=gcc_dehydra.so -fplugin-arg-gcc_dehydra-=error_strcpy.js -c tutorial1.cc -o /dev/null we will get the required error message:

tutorial1.cc: In function ‘char* do_something(bool, char*, char*)’:
tutorial1.cc:7: warning: ‘my_copy’ may be used uninitialized in this function
tutorial1.cc: At global scope:
tutorial1.cc:12: error: Use of strcpy() at tutorial1.cc:9:34 (in do_something(bool, char*, char*))

This technique is more powerful than using grep (or similar application, such as ack) or perl, because it actually understands the code. So it doesn't suffer from false positives where you try to grep for strcpy and hit comments: /* don't use strcpy here */and it doesn't suffer from false negatives where you try to exclude xstrcpy: strcpy(xstrcpy, ystrcpy);

It is still just a slightly more sophisticated version of grep (and ack), but we can come up with more complex tests, which I'll look at in a future blog entry.