Re: Default UTF-8 encoding in strings

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Default UTF-8 encoding in strings
From: Sean Conner <sean@...>
Date: Mon, 22 Jul 2019 00:13:36 -0400

It was thus said that the Great Robert Virding once stated:
> This is for Lua 5.3.4
> 
> When exactly are characters/bytes UTF-8 interpreted in a literal strings?
> It seems like that when you write a literal string which includes a unicode
> character then it will be inserted into the string by its UTF-8 encoding.
> Even if it is small enough to fit in one byte. For example the string "aäb"
> has the bytes 97, 195, 164, 98 even though the ä character has the value
> 228 so it could fit in a byte. The same when printing a string if there is
> a legal UTF-8 sequence then its unicode character will be printed, however,
> a value of 228 will be printed as ?.

  UTF-8 encoding will always encode characters with code points above 127 as
multiple bytes (two or more).  A good starting place (I think) is the
Wikipedia page on UTF-8 encoding:

	https://en.wikipedia.org/wiki/UTF-8

  -spc

References:
- Default UTF-8 encoding in strings, Robert Virding

Prev by Date: Re: Default UTF-8 encoding in strings
Next by Date: Special do/end block that raises an error on free names - Proof of concept
Previous by thread: Re: Default UTF-8 encoding in strings
Next by thread: Special do/end block that raises an error on free names - Proof of concept
Index(es):
- Date
- Thread