[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: question about Unicode
- From: Mike Pall <mikelu-0612@...>
- Date: Thu, 7 Dec 2006 23:55:07 +0100
Hi,
Rici Lake wrote:
> If I actually use the identifier código (say) in some file, and try to
> refer to it from another file, it might fail because the encodings are
> different. For example, one file might be in iso-8859-1, or both of
> them might be in utf-8 but one of them uses a composed ó and the other
> one uses an o and a combining accent. These differences may be
> completely invisible.
Well, then there are also distinct characters that have the same
glyph shape, Like 'a' and '\u0430' (Cyrillic a). Normalization
won't help you here ... There is no perfect solution.
> I strongly agree that "locale-dependent lexing is bad"; however, robust
> lexing needs to be aware of unicode normalization forms. Unfortunately,
> that is by no means cheap.
IMHO simple is better. Even Java doesn't normalize in the lexer:
http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#40625
Bye,
Mike
- References:
- Re: question about Unicode, David Jones
- Re: question about Unicode, Roberto Ierusalimschy
- Re: question about Unicode, David Given
- Re: question about Unicode, Rici Lake
- Re: question about Unicode, Roberto Ierusalimschy
- Re: Re: question about Unicode, Ken Smith
- Re: question about Unicode, Adrian Perez
- Re: question about Unicode, David Given
- Re: question about Unicode, Mike Pall
- Re: question about Unicode, Rici Lake