[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
- From: Marc Balmer <marc@...>
- Date: Fri, 5 Aug 2016 19:42:44 +0200
Well, you use a bad stack.
We use Linux and Motif and everything is UTF-8. It works like a charm.
> Am 05.08.2016 um 18:45 schrieb Paul Moore <p.f.moore@gmail.com>:
>
> I'm trying to embed Lua in a Windows program that needs to be "Unicode
> clean". Unfortunately, the standard C runtime on Windows is basically
> not a practical option for Unicode programs, essentially because the
> console and GUI subsystems use different code pages, so there's no
> consistent encoding you can use. Lua functions like print(),
> os.getenv() and the io module, have a distressing tendency to produce
> mojibake at the drop of a hat :-(
>
> However, that's a relatively minor problem in many ways - it's easy to
> wrap the Windows Unicode APIs in Lua, and once you do that everything
> works perfectly OK. However, there are a few parts of the process that
> feel clumsy, and I'd like some advice on the best way of handling
> them.
>
> 1. The print() function doesn't handle UTF-8 (because it uses the C
> runtime, which uses the OEM codepage). Is the best way to replace
> print() with a UTF-8 aware version simply to register my own
> alternative implementation under the name "print"?
> 2. If I want to write a UTF-16 string userdata (the Windows APIs use
> UTF-16) that interoperates seamlessly with Lua strings, I gather that
> I need to give the type a "__tostring" metamethod. I presume that will
> cover all of the Lua functions that take strings. I guess I'll also
> need __concat and __len (and likely the comparison operations). Any
> others that are important?
> 3. I'm probably going to have to replace a reasonable chunk of the os
> and io libraries if I want to do a proper job - I'm not sure it's
> worth doing that, but if I do, I guess I can just avoid calling
> luaopen_io and luaopen_os and register my own replacements. But is it
> possible to just selectively replace the parts that need better
> Unicode handling? (There's no real need for me to reimplement
> os.clock, for example, but os.getenv definitely needs rewriting).
>
> Thanks for any suggestions. I should not e 2 things: First of all,
> os.setlocale doesn't work ("If you provide a code page value of UTF-7
> or UTF-8, setlocale will fail, returning NULL") so that's not an
> option for me, sadly. And second, I don't intend this as any sort of
> criticism of Lua's Unicode handling - it's not Lua's fault that
> Windows' support for UTF-8 in the C runtime is insufficient.
>
> Thanks for any suggestions or help.
> Paul
>