[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: byteoffset() in lutf8lib.c from 5.3, work2
- From: Tim Hill <drtimhill@...>
- Date: Tue, 13 May 2014 16:52:36 -0700
On May 13, 2014, at 4:45 PM, Sean Conner <sean@conman.org> wrote:
> It was thus said that the Great Coroutines once stated:
>> On Tue, May 13, 2014 at 3:41 PM, Sean Conner <sean@conman.org> wrote:
>>
>>> If you are curious, check out the source code to joe (Joe's Editor),
>>> specifically, the files i18n.c and utf8.c, to see just the amount of code
>>> required to maybe, hopefully, handle UTF-8. I have no idea how well it
>>> deals with right-to-left languages.
>>
>> https://github.com/paul-schwendenman/joe-editor/blob/master/joe/i18n.c
>>
>> I am not a fan of the proliferation of wide characters :(
>
> Well, if you want to handle checking for control characters, spaces, upper
> case, lower case, numbers, combining characters or punctuation ...
>
> -spc (i18n.c and utf8.c compile to about 31k on a 32-bit system ... )
>
>
Just editorializing for a moment, when it first appeared Unicode was supposed to clean up the mess with codepages, all the various odd multi-byte character hacks (shift-JIS anyone?) and make multi-lingual applications far easier to code. Fast forward and I’m not sure that the “cure” is any better than the original problem. Any standard that has a “normalized” form that is in fact FOUR different forms is in trouble imho.
—Tim
- References:
- byteoffset() in lutf8lib.c from 5.3, work2, Coroutines
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Roberto Ierusalimschy
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Coroutines
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Sean Conner
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Coroutines
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Sean Conner