|
I also ran into this problem a few weeks back. One portable solution is to escape the ISO-8859-1 characters with \x so that the source file encoding can be UTF-8 but the string literals will remain
ISO-8859-1. This keeps the tests passing. This doesn’t solve the problem that the string length, search, replace and manipulation functions don’t work with multibyte encodings like UTF-8, which I suspect is the default encoding for pretty
much everyone nowadays on Unix platforms, with other platforms having adopted Unicode well before that. Has moving the internal string representation to UTF-8 been considered? Or tagging strings with the encoding so that they can be converted as needed into
the appropriate encoding? Kind regards, Roger From: Michael Lenaghan <michaell@dazzit.com> Hello, all. Five Lua test files are actually ISO-8859-1 encoded:
Two of the files have tests that count bytes, so you can’t just convert them to UTF-8. Well, not if you want your tests to succeed. :-) Not fatal — the tests work as they are! — but unusual in an increasingly UTF-8 world. The real problem is that it’s such an increasingly UTF-8 world that many editors don’t try to auto-detect the encoding. Save any changes in such an editor — hello, VS Code! — and you corrupt the files. |