|
On 18 Apr 2007, at 09:49, Dave Dodge wrote:
On Wed, Apr 18, 2007 at 09:54:39AM +0100, David Jones wrote:Phase 3 does not need a little bit of knowledge from Phase 4.Footnote 6 (admittedly non-normative, but read on) seems to explicitly state that it does.The problem you referred to earlier was that of differentiating the string "foo\nar" from the included file "foo\nar". In stage 3 there is no distinction, it's all just pp-tokens. You can create a problem for yourself if you decide that your frontmost lexer can distinguish strings from included files, but really the C standard says that strings don't become strings as we know them until stage 5.Right, one problem is if you're trying to categorize the pp-tokens before passing them to phase 4. The sequence "foo" is ambiguously either a header-name or string-literal when you don't have the phase 4 context available. As you say, one approach is to just consider it a generic pp-token and figure it out later. But there's a more difficult case I forgot about: <x> which can be pp-tokenized two very different ways: punctuator identifier punctuator header-name Choosing the correct pp-token sequence here does require phase 4 context.
No, it just requires context. I don't have to have done any macro replacements or conditional preprocessing, or file inclusion (in other words I don't have to do any of the things that are actually in phase 4). I simply have to know whether I could be in a control-line #include reduction or not. This means that you can't naively separate your preprocessor lexer from the parser. But in fact the amount of context you have to maintain to decide whether the next preprocessor-token is a header-name is next to trivial.
I'm not sure what you mean when you say "phase 4 context". The C standard says that a program is decomposed into preprocessing tokens. 6.4p4 observes an ambiguity in the grammar and specifies how it is resolved.
When phase 3 says that "The source file is decomposed into preprocessing tokens" it doesn't say how this is done, and we can see (from 6.4p4) that we can't do this using a traditional context free lexer. So don't do that then.
It seems that you're trying to take shortcuts in analysing a C program strictly according to the grammar specified in the standard, and then complaining that it's tricky. I agree.
Cheers, drj