Re: Clearing up misconceptions about characters vs bytes in the manual

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Clearing up misconceptions about characters vs bytes in the manual
From: spir <denis.spir@...>
Date: Mon, 05 Nov 2012 00:58:49 +0100

On 05/11/2012 00:10, William Ahern wrote:

On Fri, Nov 02, 2012 at 07:55:41PM +0100, spir wrote:
<snip>

There is, I guess, no hope to get back the ideal simplicity of 1 char <-->
1 repr (and even less representations of equal lengths) we lived with in
ascii & iso-latin times. There is affordable way to get strings as a
sequences of chars, with s[i] = ith char, exactly, and complete.

Perl6 does this with it's homegrown "NFG" normalization form. Graphemes
which in Unicode are not assigned a single codepoint are assigned one
dynamically.

There's surprisingly little information about this available online. You
basically need to refer to the Parrot and Perl6 documentation--and sometimes
source code--to decipher the details.

See, e.g.
http://docs.parrot.org/parrot/devel/html/docs/pdds/pdd28_strings.pod.html

Thanks for the pointer, very interesting!

Denis

References:
- Clearing up misconceptions about characters vs bytes in the manual, Rob Hoelz
- Re: Clearing up misconceptions about characters vs bytes in the manual, Rapin Patrick
- Re: Clearing up misconceptions about characters vs bytes in the manual, M. Edward (Ed) Borasky
- Re: Clearing up misconceptions about characters vs bytes in the manual, spir
- Re: Clearing up misconceptions about characters vs bytes in the manual, William Ahern

Prev by Date: Re: Clearing up misconceptions about characters vs bytes in the manual
Next by Date: Re: [ANN] MoonScript v0.2.2
Previous by thread: Re: Clearing up misconceptions about characters vs bytes in the manual
Next by thread: Re: Clearing up misconceptions about characters vs bytes in the manual
Index(es):
- Date
- Thread