[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Unicode?
- From: Tuomo Valkonen <tuomov@...>
- Date: Wed, 11 Jun 2003 22:29:29 +0300
On Wed, Jun 11, 2003 at 04:12:21PM -0300, Roberto Ierusalimschy wrote:
> > But two identical utf-8 characters can have different encoding, right?
> No. I mean, if they have the same unicode number, they must have the
> same utf-8 encoding.
If I recall correctly, the same glyph, however, may have multiple encodings
unless you stick to a sensible subset of Unicode.
It would be nice to have a UTF8 string replacement library and writing
versions of string.sub etc. that support utf8 should be a trivial task.
However, writing a version of the regular expression matcher may be a
bigger task. In the meanwhile, there are Unicode and possibly UTF8-aware
POSIX regular expression matchers, though, so maybe it would be possible
to convert one Lua POSIX regex libraries to use one of those? See e.g.
<http://linuxselfhelp.com/HOWTO/Unicode-HOWTO-6.html>.
--
Tuomo