[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Default UTF-8 encoding in strings
- From: Sean Conner <sean@...>
- Date: Mon, 22 Jul 2019 00:13:36 -0400
It was thus said that the Great Robert Virding once stated:
> This is for Lua 5.3.4
>
> When exactly are characters/bytes UTF-8 interpreted in a literal strings?
> It seems like that when you write a literal string which includes a unicode
> character then it will be inserted into the string by its UTF-8 encoding.
> Even if it is small enough to fit in one byte. For example the string "aäb"
> has the bytes 97, 195, 164, 98 even though the ä character has the value
> 228 so it could fit in a byte. The same when printing a string if there is
> a legal UTF-8 sequence then its unicode character will be printed, however,
> a value of 228 will be printed as ?.
UTF-8 encoding will always encode characters with code points above 127 as
multiple bytes (two or more). A good starting place (I think) is the
Wikipedia page on UTF-8 encoding:
https://en.wikipedia.org/wiki/UTF-8
-spc