[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Managing Unicode (UTF-8 and UTF-16) data in Lua
- From: Paul Moore <p.f.moore@...>
- Date: Fri, 5 Aug 2016 17:45:16 +0100
I'm trying to embed Lua in a Windows program that needs to be "Unicode
clean". Unfortunately, the standard C runtime on Windows is basically
not a practical option for Unicode programs, essentially because the
console and GUI subsystems use different code pages, so there's no
consistent encoding you can use. Lua functions like print(),
os.getenv() and the io module, have a distressing tendency to produce
mojibake at the drop of a hat :-(
However, that's a relatively minor problem in many ways - it's easy to
wrap the Windows Unicode APIs in Lua, and once you do that everything
works perfectly OK. However, there are a few parts of the process that
feel clumsy, and I'd like some advice on the best way of handling
them.
1. The print() function doesn't handle UTF-8 (because it uses the C
runtime, which uses the OEM codepage). Is the best way to replace
print() with a UTF-8 aware version simply to register my own
alternative implementation under the name "print"?
2. If I want to write a UTF-16 string userdata (the Windows APIs use
UTF-16) that interoperates seamlessly with Lua strings, I gather that
I need to give the type a "__tostring" metamethod. I presume that will
cover all of the Lua functions that take strings. I guess I'll also
need __concat and __len (and likely the comparison operations). Any
others that are important?
3. I'm probably going to have to replace a reasonable chunk of the os
and io libraries if I want to do a proper job - I'm not sure it's
worth doing that, but if I do, I guess I can just avoid calling
luaopen_io and luaopen_os and register my own replacements. But is it
possible to just selectively replace the parts that need better
Unicode handling? (There's no real need for me to reimplement
os.clock, for example, but os.getenv definitely needs rewriting).
Thanks for any suggestions. I should not e 2 things: First of all,
os.setlocale doesn't work ("If you provide a code page value of UTF-7
or UTF-8, setlocale will fail, returning NULL") so that's not an
option for me, sadly. And second, I don't intend this as any sort of
criticism of Lua's Unicode handling - it's not Lua's fault that
Windows' support for UTF-8 in the C runtime is insufficient.
Thanks for any suggestions or help.
Paul