Re: UTF-8 validation

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: UTF-8 validation
From: "Cezary H. Noweta" <chn@...>
Date: Thu, 10 Dec 2015 00:32:42 +0100

On 2015-12-09 23:58, Coda Highland wrote:

utf8.len() will return false and the position of the first invalid
byte for an invalid UTF-8 string.

Indeed, however my function's purpose is not testing if a string isvalid but the following flow:


[unknown string] => [black box] => [valid string].

in one simple step. This comes from an Unicode's recommendation. Afterthat I know that there are no 4/6-byte backslashes or quotes for aSQLinj and other fancy pitfalls.

Today, non-shortest forms are very dangerous - Lua's utf8_decode issusceptible to this (there is no need to correct this as long as astring is valid). Conciseness of UTF-8 allows to treat strings as plainASCII ones - it is frequently used and can be very danger.

The first thing to do with an unknown string (just after its length isdetermined) is to validate it. After you have treated a string by myutf8.validate, you can apply less secure but very efficient functions(like above utf8_decode, for example).


-- best regards

Cezary H. Noweta

Follow-Ups:
- Re: UTF-8 validation, Coda Highland

References:
- UTF-8 validation, Cezary H. Noweta
- Re: UTF-8 validation, Coda Highland

Prev by Date: Re: UTF-8 validation
Next by Date: Re: UTF-8 validation
Previous by thread: Re: UTF-8 validation
Next by thread: Re: UTF-8 validation
Index(es):
- Date
- Thread