Hi Peter,
That can be rather difficult because by the nature of preprocessors, they require two passes. You may need to make an IParser that first resolves all the #defines. And as it loops through the document text, builds up another temporary ICodeDocument by replacing the defined items. In your case this temporary document would end up being:
char * str = "this is a string with semicolon";
if (10 < 20)
{
...
}
Then manually run your normal language parser on that document instead. The tricky thing will be that if your parser gives you AST nodes or syntax errors, all of those offsets will be off since they'd be based on the temporary document. So you would possibly need to do some tracking of what ranges were merged in and the offset delta to apply to the AST and syntax error results.
Then modify all those before returning the IParseData result back from the parsing operation.