Re: question about Unicode

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: question about Unicode
From: Rici Lake <lua@...>
Date: Thu, 7 Dec 2006 19:37:06 -0500


On 7-Dec-06, at 7:25 PM, David Given wrote:

It will also fail on any encoding that uses low-bit characters as partof anextended sequence. If there's an encoding that uses <high> <low1><low2> aspart of a single character, then <low1> and <low2> may potentiallyconfuse theparser. This scheme would only work on encodings where *all* bytes ofanextended character have the top bit set. I believe that includesShift-JIS as
well as UTF-8.

Actually, both Shift-JIS and Big5 use second bytes in the range0x40-0xFE (or so, there are a few illegal codes, iirc), and so does GB18030-2000

References:
- Re: question about Unicode, Roberto Ierusalimschy
- Re: question about Unicode, Roberto Ierusalimschy
- Re: question about Unicode, David Given
- Re: question about Unicode, Rici Lake
- Re: question about Unicode, Roberto Ierusalimschy
- Re: Re: question about Unicode, Ken Smith
- Re: question about Unicode, Adrian Perez
- Re: question about Unicode, Asko Kauppi
- Re: question about Unicode, Brian Weed
- Re: question about Unicode, Glenn Maynard
- Re: question about Unicode, Russ Cox
- Re: question about Unicode, David Given

Prev by Date: Re: question about Unicode
Next by Date: Re: question about Unicode
Previous by thread: Re: question about Unicode
Next by thread: Re: question about Unicode
Index(es):
- Date
- Thread