Re: Matching multibyte alphabetical characters with LPeG

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Matching multibyte alphabetical characters with LPeG
From: William Ahern <william@...>
Date: Sun, 17 Jun 2012 23:20:46 -0700

On Mon, Jun 18, 2012 at 02:14:49PM +0900, Miles Bader wrote:
> William Ahern <william@25thandClement.com> writes:
> >> There's still the grapheme problem for å vs å; hopefully you can't
> >> tell the second is "a".."␣̊". [1]
> >> 
> >> How should lpeg match the one with a separate combining mark
> >> version against character classes?
> >
> > Normalization. Wrap the lpeg API with routines to normalize input
> > strings.  Without normalization Unicode is almost useless, like
> > comparing apples and oranges while the user sees plums.
> 
> Hardly; it very much depends on what you're doing with it -- and note
> that in many cases (Apple, I'm looking at you...) normalization is
> downright harmful.
> 

I think my MTA truncated this message. Care to resend the explanation? =)

Follow-Ups:
- Re: Matching multibyte alphabetical characters with LPeG, Miles Bader

References:
- Matching multibyte alphabetical characters with LPeG, Hinrik Örn Sigurðsson
- Re: Matching multibyte alphabetical characters with LPeG, Miles Bader
- Re: Matching multibyte alphabetical characters with LPeG, Jay Carlson
- Re: Matching multibyte alphabetical characters with LPeG, William Ahern
- Re: Matching multibyte alphabetical characters with LPeG, Miles Bader

Prev by Date: Re: Matching multibyte alphabetical characters with LPeG
Next by Date: Re: Matching multibyte alphabetical characters with LPeG
Previous by thread: Re: Matching multibyte alphabetical characters with LPeG
Next by thread: Re: Matching multibyte alphabetical characters with LPeG
Index(es):
- Date
- Thread