lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


As well, if you use Base32 from RFC4648 (letters A..Z and digits 2..7), to encode a 128 bit UUID, the 26 chararacters encoded will actually encode 130 bits. This means they encode 2 spare bits, and you can use them to avoid some characters in the first and last position in the 26.
If you want to avoid the X or digits in the first position (because it is used in many legacy apps), it's simple: make the first spare bit equal to 0 and put it in the highest order position of the 1st group of 5 bits (so the first encoded character will necessarily be a letter in A..P).
As well you can avoid the last character to be a digit in 2..7 by also inserting the second spare bit in the highest order position of the last group of 5 bits.
Now the 26 characters can be written in groups of 5 characters by hyphens (-).

You get header names like in the set from "Aaaaa-aaaaa-aaaaa-aaaaa-aaaaa-a:" to "P7777-77777-77777-77777-77777-p:" (you can use any capitalization you want for the letters) which are still relatively easy to read/write and possibly memorize (thanks to grouping and the choice of non ambiguous letters/digits), unique (randomly generated), and even shorter then some legacy "X-" header names like "X-Pepperfish-Transaction:", and none of these random header names will collide with legacy "X-" ones (different form) or with future RFC that will deprecate RFC2822 in its standard track (using human-significant keywords), and that still look in good format very legacy mail processors.




Le lun. 13 mai 2019 à 05:51, Philippe Verdy <verdy_p@wanadoo.fr> a écrit :
One way to make sure you will not conflict with any other standard RFC, is to name your headers using a UUID (formatted as syntaxically conforming header name).
Generate one randomly (to avoid collisions), attempt to search this generated GUID online if you want to make sure it is unique, keep that on your records.
In fact any randomly generated 128-bit integer can fit; convert it converted to ASCII using some set of safe digits (not just limited to hexadecimals) and without the extra group separators and surrounding braces commonly seen) and it will not be very long.

RFC 2822 states that "A field name MUST be composed of printable US-ASCII characters (i.e., characters that have values between 33 and 126, inclusive), except colon...". This would allow using characters in a set of 93, but as they are case insensitive, you also need to remove 26 dual-cased letters from the set to form the base, leaving a choice of 67 characters.
You can then use a Base-67 conversion.

And in base 67, a 128-bit random integer (or UUID) just needs ceil(128/log2(67))=22 characters (there are some extra bits not needed in the first of last character of the sequence, you may take that into account to avoid generating non-letters in these positions).

You may prefer using a base-64 conversion because ceil(128/log2(64))=22 and the encoding will not be longer (you can still append some additional characters to make subheaders, or generate other 22-character header each time). A Base-64 conversion will work for you, but not with the two alternate alphabets defined in RFC 4648 (because they are both case-sensitive).

You can choose your 64-characters alphabet so that it will avoid using "_" (useful as additional group separators, to allow "visual" control of its length and better "readibility"), and the double quote and backslash (which may be useful to embed your header in quoted strings, including in constants of programming languages like C or Java, without needing any escaping).

I would suggest formatting the 22 digits in Base67 in groups of 4 or 5 digits, except the first and last group having only 1 digit

So you don't need the "X-" followed by a "readable" header name, which is much more likely to enter in collision with other apps (or evolutions of RFC 2822 in its BCP standard track, or inclusion of RFC 2822 in a new standard protocol) than a randomly generated header.

And a header name like this one will work:

  "GE16Q$,18'(4<SG@HB.N5S" + ":" + "(some value here)" + CRLF

just like this one with 4 additional "formatting group separators" inserted every 5 digits:

  "G_E16Q$_,18'(_4<SG@_HB.N5_S" + ":" + "(some value here)" + CRLF

which is also equivalent to:

  "G_e16q$_,18'(_4<sg@_hb.n5_s" + ":" + CRLF + SPACE +
  "(some value here)" + CRLF

You may prepend an "X-" to this random header, if you still want to make sure it "looks" like a legacy header, but in that case I suggest you use Base-32 from RFC 4648 (i.e. letters A to Z and digits 2 to 7, avoiding 0 and I confused with letter O and I), without any padding and group separators (from a 128-bit UUID or random number, you need 26 digits in base 32, and with the "X-" prefix, your header name will have 28 characters).


Le dim. 12 mai 2019 à 23:44, Eduardo Ochs <eduardoochs@gmail.com> a écrit :
Hi list,

Two questions:

1) Is there a standard header that I can put in my e-mails that means
   "this is _NOT GOING to be used in production code UNDER ANY
   CIRCUMSTANCES_, this is going to be a personal hack that I will
   only load into a Lua interpreter BY HAND for some VERY CONTROLLED
   tests, etc, etc"?

   (Can you please suppose that I started my e-mail with a header like
   this? I've been considering asking the question below here at the
   list for YEARS, but EVERY SINGLE TIME I predicted the probable
   reaction of the professional programmers in the list and gave
   up...)

   By the way, I am the author of the article "Bootstrapping a Forth
   in 40 lines of Lua code" that appeared in the Lua Gems book. One of
   its last paragraphs is this:

     I've met many people over the years who have been Forth
     enthusiasts in the past, and we often end up discussing what made
     Forth so thrilling to use at that time - and what we can do to
     adapt its ideas to the computers of today. My personal impression
     is that Forth's main points were not the ones that I listed at
     the beginning of this section, and that I said that were easy to
     quantify; rather, what was most important was that nothing was
     hidden, there were no complex data structures around with
     "don't-look-at-this" parts (think on garbage collection in Lua,
     for example, and Lua's tables - beginners need to be convinced to
     see these things abstractly, as the concrete details of the
     implementation are hard), and _everything_ - code, data,
     dictionaries, stacks - were just linear sequences of bytes, that
     could be read and modified directly if we wished to. We had total
     freedom, defining new words was quick, and experiments were quick
     to make; that gave us a sense of power that was totally different
     from, say, the one that a Python user feels today because he has
     huge libraries at his fingertips.

   The technical question that I want to ask is related to using Lua
   as Forths were used in the early 90's - there were LOTS of commands
   that if used wrongly could freeze the system and require a reboot,
   and we were perfectly happy with that.


2) Here is the idea; the question is below.

   The functions debug.getinfo, debug.getlocal and debug.setlocal are
   usually called with an integer argument that the manual refers to
   as "level", that is processed like this (I took the code from
   db_getinfo, in ldblib.c) to set the variable "ar" to an "activation
   record":

     if (lua_isnumber(L, arg+1)) {
       if (!lua_getstack(L1, (int)lua_tointeger(L, arg+1), &ar)) {
         lua_pushnil(L);  /* level out of range */
         return 1;
       }
     }

   I would like to have _variants_ of these functions, to be called
   debug.mygetinfo, debug.mygetlocal and debug.mysetlocal, that would
   accept an alternative to a numerical "level". Running

     ar = debug.mygetstack(2)

   would set ar to a string like

     "activation record: 0x125cf20"

   whose address part points to the "activation record" of a function
   in the call stack, like the pointer that

     lua_getstack(L1, (int)lua_tointeger(L, arg+1), &ar)

   puts into ar, and if we are super-ultra-careful then we can call
   debug.mygetinfo, debug.mygetlocal and debug.mysetlocal in either of
   these ways, the second one being equivalent to the first one:

     debug.mygetinfo (2,  "n")
     debug.mygetinfo (ar, "n")
     debug.mygetlocal(2,  3)
     debug.mygetlocal(ar, 3)
     debug.mysetlocal(2,  3, 42)
     debug.mysetlocal(ar, 3, 42)

   But OF COURSE if we set ar to a bad address, say,

     ar = "activation record: 0x12345678"

   then debug.mygetinfo, debug.mygetlocal and debug.mysetlocal WOULD
   NOT HESITATE to use that address and segfault (HAHAHA! DEAL WITH
   THIS, MODERN PROGRAMMERS!!!)...

   The question is: has anyone implemented something like this, or
   something that would cover a part of this? I haven't written any C
   code in ages... I think I can implement it myself, alone, but that
   would take me one or two full days just for a prototype in which I
   would just change ldblib.c... putting these new functions into a
   ".so" would take more.


Thanks in advance!!!
  Eduardo Ochs =)
  http://angg.twu.net/dednat6.html   <- (for lualatex users)