Re: Matching multibyte alphabetical characters with LPeG

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Matching multibyte alphabetical characters with LPeG
From: Miles Bader <miles@...>
Date: Mon, 18 Jun 2012 15:54:50 +0900

William Ahern <william@25thandclement.com> writes:
>> Hardly; it very much depends on what you're doing with it -- and note
>> that in many cases (Apple, I'm looking at you...) normalization is
>> downright harmful.
>
> I think my MTA truncated this message. Care to resend the explanation? =)

Go read the Git mailing list for painful, painful, details.

Basically, "normalization" makes a change, and if that change is
persistent (i.e., not made temporarily during comparison or whatever),
then parts of your system which weren't expecting a change (because
nothing "real" changed) may get confused (or simply do excess work
because of the change).

[You might say "well then _always_ keep everything in normalized form,
then no problem!"  ... but one doesn't always have control over every
part of the system and every tool.]

"Temporary normalization" (only when sorting strings or whatever, and
not saving the result) is safer of course.

-miles

-- 
"Though they may have different meanings, the cries of 'Yeeeee-haw!' and
 'Allahu akbar!' are, in spirit, not actually all that different."

References:
- Matching multibyte alphabetical characters with LPeG, Hinrik Örn Sigurðsson
- Re: Matching multibyte alphabetical characters with LPeG, Miles Bader
- Re: Matching multibyte alphabetical characters with LPeG, Jay Carlson
- Re: Matching multibyte alphabetical characters with LPeG, William Ahern
- Re: Matching multibyte alphabetical characters with LPeG, Miles Bader
- Re: Matching multibyte alphabetical characters with LPeG, William Ahern

Prev by Date: Re: Matching multibyte alphabetical characters with LPeG
Next by Date: Bug: non-error unwind of previously yielded pcall doesn't restore errfunc.
Previous by thread: Re: Matching multibyte alphabetical characters with LPeG
Next by thread: Re: Matching multibyte alphabetical characters with LPeG
Index(es):
- Date
- Thread