[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Matching multibyte alphabetical characters with LPeG
- From: Miles Bader <miles@...>
- Date: Mon, 18 Jun 2012 14:14:49 +0900
William Ahern <william@25thandClement.com> writes:
>> There's still the grapheme problem for å vs å; hopefully you can't
>> tell the second is "a".."␣̊". [1]
>>
>> How should lpeg match the one with a separate combining mark
>> version against character classes?
>
> Normalization. Wrap the lpeg API with routines to normalize input
> strings. Without normalization Unicode is almost useless, like
> comparing apples and oranges while the user sees plums.
Hardly; it very much depends on what you're doing with it -- and note
that in many cases (Apple, I'm looking at you...) normalization is
downright harmful.
-Miles
--
"Don't just question authority,
Don't forget to question me."
-- Jello Biafra