[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
- From: Ricardo Ramos Massaro <ricardo.massaro@...>
- Date: Tue, 9 Aug 2016 00:26:03 -0300
Paul Moore <p.f.moore@gmail.com> wrote:
> 1. The print() function doesn't handle UTF-8 (because it uses the C
> runtime, which uses the OEM codepage). Is the best way to replace
> print() with a UTF-8 aware version simply to register my own
> alternative implementation under the name "print"?
[...]
> I can display the "snowman" character in Powershell:
>
>>$x = (0x2603)
> PS 22:56 {00:00.000} C:\Work\Scratch
>>[char]$x
> ☃
I've done a little digging, and it seems that Lua's inability to
print UTF-8 to the Windows console is not exactly a problem of the
C runtime. For example, if you add this to the start of main() in
lua.c:
FreeConsole();
AllocConsole();
SetConsoleOutputCP(CP_UTF8);
(also add "#include <windows.h>" on the top of lua.c) then the
following Lua program prints a snowman character to the console,
just like Powershell:
print("snowman: '\xE2\x98\x83'")
io.read()
Sadly, with this modification Lua will annoyingly open a new
console window even when you run it from an existing console, and
the window will be closed immediately when the program ends
(that's the reason for "io.read()" after the print() call).
I don't know why, but just adding
SetConsoleOutputCP(CP_UTF8);
doesn't quite work -- or rather, it works when a new console is
opened fresh for Lua (e.g if you double-click lua.exe from Windows
Explorer), but fails if you run lua from an existing console (by
"fails" I mean that 3 garbage characters are printed instead of the
snowman, indicating that the OEM codepage is being used instead
of UTF-8).
By the way, this was all tested on Windows 10 with the code page
set to 437 (default for the US, I think). I have no idea if other
code pages or other versions of Windows work like that.
- Ricardo