On Fri, Nov 02, 2012 at 07:55:41PM +0100, spir wrote:
<snip>
There is, I guess, no hope to get back the ideal simplicity of 1 char <-->
1 repr (and even less representations of equal lengths) we lived with in
ascii & iso-latin times. There is affordable way to get strings as a
sequences of chars, with s[i] = ith char, exactly, and complete.
Perl6 does this with it's homegrown "NFG" normalization form. Graphemes
which in Unicode are not assigned a single codepoint are assigned one
dynamically.
There's surprisingly little information about this available online. You
basically need to refer to the Parrot and Perl6 documentation--and sometimes
source code--to decipher the details.
See, e.g.
http://docs.parrot.org/parrot/devel/html/docs/pdds/pdd28_strings.pod.html