[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: lua for unicode
- From: Roberto Ierusalimschy <roberto@...>
- Date: Tue, 03 Dec 2002 16:51:55 +0000
> An important consideration to be made is whether all strings are Unicode
> or whether a new Unicode type is to be added (as is done in Python).
I think we can live outside these two options. Strings may contain
Unicode data or not (e.g. they may contain raw binary data, as now).
If you call a function from the new "utf8" library, it will assume
the string is a Unicode-utf8 string.
> It is essential that such byte patterns [non-valid Unicode character]
> do not exist in the internal encoding since this opens several
> security issues.
I think it would be easier to allow such patterns (among other things
because strings may contain other stuff besides Unicode data), and to
check for consistency when needed (that is, inside the functions of the
"utf8" library).
This is more or less what happens now. Strings may contain embedded
zeros, but some functions in the `string' library do not operate on
them, because proper "ISO" strings cannot contain zeros. The important
thing is to ensure that all functions have an "acceptable" behavior
(such as a polite error message) for any input.
-- Roberto