RE: Five Lua test files are ISO-8859-1 encoded

I also ran into this problem a few weeks back. One portable solution is to escape the ISO-8859-1 characters with \x so that the source file encoding can be UTF-8 but the string literals will remain ISO-8859-1. This keeps the tests passing.

This doesn’t solve the problem that the string length, search, replace and manipulation functions don’t work with multibyte encodings like UTF-8, which I suspect is the default encoding for pretty much everyone nowadays on Unix platforms, with other platforms having adopted Unicode well before that. Has moving the internal string representation to UTF-8 been considered? Or tagging strings with the encoding so that they can be converted as needed into the appropriate encoding?

Kind regards,

Roger

From: Michael Lenaghan <michaell@dazzit.com>
Sent: Saturday, July 29, 2023 10:42 PM
To: lua-l@lists.lua.org
Subject: Five Lua test files are ISO-8859-1 encoded

Hello, all.

Five Lua test files are actually ISO-8859-1 encoded:

db.lua
files.lua
pm.lua
sort.lua
strings.lua

Two of the files have tests that count bytes, so you can’t just convert them to UTF-8. Well, not if you want your tests to succeed. :-)

Not fatal — the tests work as they are! — but unusual in an increasingly UTF-8 world.

The real problem is that it’s such an increasingly UTF-8 world that many editors don’t try to auto-detect the encoding. Save any changes in such an editor — hello, VS Code! — and you corrupt the files.