[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: question about Unicode
- From: Rici Lake <lua@...>
- Date: Thu, 7 Dec 2006 19:37:06 -0500
On 7-Dec-06, at 7:25 PM, David Given wrote:
It will also fail on any encoding that uses low-bit characters as part
of an
extended sequence. If there's an encoding that uses <high> <low1>
<low2> as
part of a single character, then <low1> and <low2> may potentially
confuse the
parser. This scheme would only work on encodings where *all* bytes of
an
extended character have the top bit set. I believe that includes
Shift-JIS as
well as UTF-8.
Actually, both Shift-JIS and Big5 use second bytes in the range
0x40-0xFE (or so, there are a few illegal codes, iirc), and so does GB
18030-2000
- References:
- Re: question about Unicode, Roberto Ierusalimschy
- Re: question about Unicode, Roberto Ierusalimschy
- Re: question about Unicode, David Given
- Re: question about Unicode, Rici Lake
- Re: question about Unicode, Roberto Ierusalimschy
- Re: Re: question about Unicode, Ken Smith
- Re: question about Unicode, Adrian Perez
- Re: question about Unicode, Asko Kauppi
- Re: question about Unicode, Brian Weed
- Re: question about Unicode, Glenn Maynard
- Re: question about Unicode, Russ Cox
- Re: question about Unicode, David Given