C macros are probably one of my least favourite things… So when I encounter them my spontaneous reaction is to do away with them. They have a nasty resemblance to functions but they have completely different semantics, text replacement, and are usually almost impossible to get under test. So it’s better to get rid of them.
In my current hobby project, refactoring the C refactoring tool c-xrefactory (https://github.com/thoni56/c-xrefactory) to a maintainable state, I encountered a particularly macro-infested area, the lexer. I’ll leave my speculations about how the code gotten this way for another day, and focus on a fairly mechanical sequence of steps that I have been using to refactor a number of macros into nice, clean, testable C functions.
Here is a short excerpt of some code using a macro:
static void processLine(void) {
Lexem lexem;
int l, h, v=0, len;
Position pos;
...
PassLex(cInput.currentLexem, lexem, l, v, h, pos, len, 1);
if (lexem != CONSTANT) return;
...
Here’s (again, a part of) the macro (yes, it’s long…) :
#define PassLex(input, lexem, lineval, val, hash, pos, length, linecount) { \
if (lexem > MULTI_TOKENS_START) { \
if (isIdentifierLexem(lexem)){ \
char *tmpcc,tmpch; \
hash = 0; \
for(tmpcc=input,tmpch= *tmpcc; tmpch; tmpch = *++tmpcc) { \
SYMTAB_HASH_FUN_INC(hash, tmpch); \
} \
SYMTAB_HASH_FUN_FINAL(hash); \
tmpcc ++; \
GetLexPosition((pos),tmpcc); \
input = tmpcc; \
} else if (lexem == STRING_LITERAL) { \
char *tmpcc,tmpch; \
for(tmpcc=input,tmpch= *tmpcc; tmpch; tmpch = *++tmpcc); \
tmpcc ++; \
GetLexPosition((pos),tmpcc); \
Input = tmpcc; \
} else if (lexem == LINE_TOK) { \
GetLexToken(lineval,input); \
...
As you can see there are multiple levels of macros invoking macros invoking macros. So it is extremely difficult and frustrating to try to follow the expansion manually. Of course, you could let the C preprocessor do it, which I tried, but that turns the code into something almost impossible to recognize, so that did not work out well.
The strategy I devised was to first manually expand one of the invocations, then do any adjustments necessary to be able to do the third step, extracting to a “real” C function. Luckily, c-xrefactory is fully operational so I could use it to do the extraction.
Here are the detailed steps:
1 – Align local variable names with macro arguments
Macro semantics are textual replacement, e.g. the formal arguments to a macro can be seen as the text, not the value or even address, you use as the actual argument in the invocation. A clever way to do away with this replacement is to use actual arguments that are names that are exactly the same as the formal arguments, then you can just paste the body of the macro in place for the invocation.
In the example we would change
Lexem lexem;
int l, h, v=0, len;
Position pos;
...
PassLex(cInput.currentLexem, lexem, l, v, h, pos, len, 1);
so that variable names align, to
Lexem lexem;
int lineval, hash, val=0, length;
Position pos;
...
PassLex(cInput.currentLexem, lexem, lineval, val, hash, pos, length, 1);
Now, it we would paste the body of the macro here, most of the use of arguments in the body would just match local variables.
2. Create adapter variables
The first and last actual arguments are not local variables that we can change to match. But we can create such local variables:
Lexem lexem;
int lineval, hash, val=0, length;
Position pos;
...
{
char *input = cInput.currentLexem;
int linecount = 1;
PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
...
cInput.currentLexem = input;
}
...
Of course, in the general case you would also have to restore any modified data, in this example copying input
back to cInput.currentLexem
.
And the good thing is that at this point all your tests should still pass. You have test coverage for this code, right?
3. Replace the invocation with the body
As the heading says, comment out the invocation and paste the body of the macro just after it:
Lexem lexem;
int lineval, hash, val=0, length;
Position pos;
...
{
char *input = cInput.currentLexem;
int linecount = 1;
// PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
if (lexem > MULTI_TOKENS_START) { \
if (isIdentifierLexem(lexem)){ \
char *tmpcc,tmpch; \
hash = 0; \
for(tmpcc=input,tmpch= *tmpcc; tmpch; tmpch = *++tmpcc) { \
SYMTAB_HASH_FUN_INC(hash, tmpch); \
} \
SYMTAB_HASH_FUN_FINAL(hash); \
tmpcc ++; \
GetLexPosition((pos),tmpcc); \
input = tmpcc;
...
}
As we can see, the parameters to the macro will now directly correspond to local variables in the context of the replacement.
Run your tests! They should still pass.
4. Clean up
Now we can clean up the code in preparation for the extraction.
5. Extract the function
With the help of a C refactoring browser, like c-xrefactory, that does semantic flow analysis, an extraction will be smooth and optimal in the sense that variables that are not input or output will be restricted to local use inside the function.
In this particular example, extracting a C function from the first line after the commented out invocation to just before any restoring of values will give us
Lexem lexem;
int lineval, hash, val=0, length;
Position pos;
...
{
char *input = cInput.currentLexem;
int linecount = 1;
// PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
input = passLex(input, lexem, pos, linecount);
cInput.currentLexem = input;
}
...
And you have turned a macro into a function!
We can, of course, do some final touch up, like doing away with the linecount
and input
temporary variables by using the values directly as arguments to the function.
Lexem lexem;
int lineval, hash, val=0, length;
Position pos;
...
// PassLex(input, lexem, lineval, val, hash, pos, length, linecount);
cInput.currentLexem = passLex(cInput.currentLexem, lexem, pos, 1);
...
I was really lucky with this example because not only did I get a function instead of a macro, but also the argument list became shorter, and some local variables were no longer needed. I’m not sure this was because of the lousy macro in the first place, but I think this strategy could be of general use for cleaning up messy old C code.
Actually, I did this twice for two different invocations and extracted to two different functions. As the code for the two was identical (and all the tests passed) I was confident that the operation had succeeded and could proceed with just replacing the other calls after the established pattern.