[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
- From: Paul Merrell <marbux@...>
- Date: Sat, 6 Aug 2016 16:56:59 -0700
On Sat, Aug 6, 2016 at 9:54 AM, Paul Moore <p.f.moore@gmail.com> wrote:
> Why do that when the standard Lua string type is UTF-8 safe? Better
> surely to use UTF-8 via Lua strings, and only use UTF-16 for
> interfacing to the Windows APIs?
Lua strings are problematic if you are working with C libraries that
expect UTF-8 input; the character offsets returned by the string
library become undependable when Lua encounters multi-byte UTF-8
characters.
We worked around this issue in NoteCase Pro (GTK2 libraries) by
embedding Xavier Wang's luautf8 library, [1] which provides, inter
alia, utf8-compatible equivalents to the Lua string library's
functions that work with character offsets, although we changed its
namespace to avoid a naming collision between one of his functions and
one of the new Lua 5.3 unicode functions. No problems encountered on
any supported OS (including Windows) in well over a year.
Best regards,
Paul
[1] https://github.com/starwing/luautf8 (also available via Luarocks).
--
[Notice not included in the above original message: The U.S. National
Security Agency neither confirms nor denies that it intercepted this
message.]