lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Robert Virding once stated:
> This is for Lua 5.3.4
> 
> When exactly are characters/bytes UTF-8 interpreted in a literal strings?
> It seems like that when you write a literal string which includes a unicode
> character then it will be inserted into the string by its UTF-8 encoding.
> Even if it is small enough to fit in one byte. For example the string "aäb"
> has the bytes 97, 195, 164, 98 even though the ä character has the value
> 228 so it could fit in a byte. The same when printing a string if there is
> a legal UTF-8 sequence then its unicode character will be printed, however,
> a value of 228 will be printed as ?.

  UTF-8 encoding will always encode characters with code points above 127 as
multiple bytes (two or more).  A good starting place (I think) is the
Wikipedia page on UTF-8 encoding:

	https://en.wikipedia.org/wiki/UTF-8

  -spc