Re: Matching multibyte alphabetical characters with LPeG

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Matching multibyte alphabetical characters with LPeG
From: Miles Bader <miles@...>
Date: Sun, 17 Jun 2012 22:38:01 +0900

Hinrik Örn Sigurðsson <hinrik.sig@gmail.com> writes:
> I've been making a parser with LPeG and I've run into the issue that I can't
> match non-ASCII words even though I'm using a utf8 locale. It seems that
> "alpha" (and "alnum", etc) from lpeg.locale() don't match anything beyond ASCII.
> See the following code:
>
>     local lpeg = require 'lpeg'
>     local locale = lpeg.locale()
>     print(lpeg.match(lpeg.C(lpeg.P("æ")), "æ"))    --> æ
>     print(lpeg.match(lpeg.C(locale.alpha), "æ"))   --> nil
>
> Is there an easy way to match non-ASCII alphabetical characters with LPeG?

No -- it's not so hard to parse utf-8 characters, but testing a
property like "alphabetic" requires unicode tables, which are a huge
and bloated dependency.

[See the LPEG home page for a simple example of how to parse utf-8
characters though.  http://www.inf.puc-rio.br/~roberto/lpeg/ ]

> If not, can LPeG be patched to support it?

Very unlikely.

-miles

-- 
Egotist, n. A person of low taste, more interested in himself than in me.

Follow-Ups:
- Re: Matching multibyte alphabetical characters with LPeG, Jay Carlson

References:
- Matching multibyte alphabetical characters with LPeG, Hinrik Örn Sigurðsson

Prev by Date: Re: Pari/GP has introduced lightweight anonymous function syntax
Next by Date: Re: Pari/GP has introduced lightweight anonymous function syntax
Previous by thread: Re: Matching multibyte alphabetical characters with LPeG
Next by thread: Re: Matching multibyte alphabetical characters with LPeG
Index(es):
- Date
- Thread