[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: use of string.sub generates SYN characters
- From: Coda Highland <chighland@...>
- Date: Fri, 15 Jan 2016 19:49:05 -0800
On Fri, Jan 15, 2016 at 7:44 PM, Todd Wegner <twwegner@gmail.com> wrote:
> I would like to understand why the following code produces SYN characters
> (0x16) in Lua53 on Linux.
> The SYN occur whenever split divides a multi-byte character in half.
> Why does string.sub return SYN rather than respective bytes.
>
>
> Code:
>
> local space = string.byte(' ')
> local text = utf8.char(0x92e,0x947,0x930,0x93e, space, 0x928, 0x93e, 0x92e,
> space, 0x932,0x942,0x905, space, 0x939,0x948, 0x964)
>
> local split = 1
> local lh = string.sub(text, 1, split)
> local rh = string.sub(text, split+1)
>
> print('text', text)
> print('lh', lh)
> print('rh', rh)
>
>
> Output:
>
> text मेरा नाम लूअ है।
> lh म SYN SYN
> rh SYN रा नाम लूअ है।
>
> Thanks
string.sub is not UTF-8 aware. It operates on byte strings, not
Unicode character strings.
Look at the utf8 module for Unicode-aware functionality.
/s/ Adam