Re: question about Unicode

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: question about Unicode
From: roberto@... (Roberto Ierusalimschy)
Date: Mon, 4 Dec 2006 16:36:50 -0200

> It depends on whether you want to use the encoding specified by the 
> current locale, or always use UTF-8.  The former is a more general 
> solution and is probably preferred on Unix; GNU/Linux distributions are 
> moving toward UTF-8 anyway.  However, it's problematic on Windows; 
> someone please correct me if I'm wrong, but I believe that UTF-8 is 
> never (or rarely) the encoding associated with the system locale on 
> Windows.  So if you always want to use UTF-8, it's probably better to 
> use a hand-written converter.

This is actually part of my question :) I guess I would prefer to use
the current locale. But I know nothing about other multibyte encodings,
and so I have no idea whether my code would work for them. For
instance, may I assume that any 0 ends the string? What if the
encoding is state dependent? (It seems a nightmare to handle shift
states when doing backtracking and the like...) UTF-8 seems so
much more simple... But if it cannot be used on Windows, that would
be a strong limitation.

-- Roberto

Follow-Ups:
- Re: question about Unicode, David Jones

References:
- question about Unicode, Roberto Ierusalimschy
- Re: question about Unicode, Matt Campbell

Prev by Date: Re: question about Unicode
Next by Date: Re: question about Unicode
Previous by thread: Re: question about Unicode
Next by thread: Re: question about Unicode
Index(es):
- Date
- Thread