lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Friday 01, John Passaniti wrote:
> I'm looking for a little design inspiration from others.  I have a
> basic approach (which I'll describe, below), but chances are good that
> someone here has already implemented something like this or can see a
> better solution.
>
> I have a UDP-based communications protocol for controlling a device.
> The protocol is implemented in C and specified in a tedious Microsoft
> Word document.  The base protocol is simple and extensible and isn't
> interesting.  What is interesting is there are a large number of
> binary messages, each with a unique structure.  I'm bothered that when
> I change the C code to add a new message, I also have to update the
> Word document.  The two are separate things.
>
> I'd like to write a tool that could parse these messages and display
> them in human readable form.  And I'd like another tool that does the
> reverse-- specify a message in a human-readable form, and get back out
> a message.  And I'd like to be able to write a Wireshark dissector for
> these messages.  And I'd like... well, more stuff.
>
> The solution would seem to be to express the protocol in some
> abstract, high-level representation (like... a Lua table).  Once in
> this form, I can write a tool that will dump out a description of the
> protocol in HTML.  I could also have it generate optimized C code to
> implement the protocol parser (such as converting to a finite state
> machine).  The table would also be at the heart of the Wireshark
> dissector and the tools to display and create packets.  And so on.
>
> Now whenever I make changes to the protocol, I edit the Lua table that
> describes the messages, and I'm done.  Everything flows from that.
>
> So how about something like this:
>
> {
>     OPT_MUTE = {
>         option = 0x02,
>         read = true,
>         write = true,
>         message = {
>             {
>                 name = "type",
>                 type = OutputInput,
>                 desc = "Channel Type"
>             },
>             {
>                 name = "number"
>                 type = ChannelNumber,
>                 desc = "Channel Number"
>             },
>             {
>                 name = "mute"
>                 type = OffOn,
>                 desc = "Mute Status"
>             }
>         }
>     }
> }
>
> So I have a message named OPT_MUTE that has a binary value of 0x02,
> it's both readable and writable, and has three items in the payload.
> The three items are named type, number, and mute and are of specific
> types (OutputInput, ChannelNumber, and OffOn).  Each of these types is
> itself a table that describes them further:
>
> ChannelNumber = {
>     bytes = 1,
>     low = 0,
>     high = 23,
> }
>
> There are (potentially) a variety of other fields to describe messages
> (which devices implement them, numeric ranges), and the payload
> section can have repetitions of a item or a larger structure, or
> optional data at the end of a message.
>
> What I'm looking for is if anyone has done this kind of thing before
> and what kind of representation you came up with.  The biggest
> stumbling block I have is the representation of repeated and optional
> data in the payload.
>
> Suggestions?

Take a look at how the XCB [1] (X protocol C-language Binding) project 
generates it's C code for parsing the wire protocol.  They describe the 
protocol in XML [2] (not pretty but you can take some ideas from it).  The 
XML format is explained here [3].

Here are a few more protocol languages Avro [4], Protocol Buffers [5], Thrift 
[6], Etch [7].

Also here is a message based protocol [8] that is used over UDP, I even wrote 
a wireshark protocol dissector for it in Lua [9].

I have done a lot of research into this subject over the past year and even 
created my own protocol language where the definitions are written as Lua 
code (see the attached test_records.lua file for an example).  I wrote a code 
generator that takes a definition file and generates C code to handle 
encoding & decoding those records into binary data.  The main reason I used 
Lua code for the definition files is that I didn't want to create a full 
parser for a new definition language that would be changing a lot as I worked 
on it.  I can describe it in more detail if you are interested.

I also used parts of that code generator in a Lua bindings generator for all 
the C code in my main project.  You can see an example bindings definition in 
the attached gd_module.lua file.  The bindings generated from gd_module.lua 
would be used like this:

require("gd")
local img = gd.gdImage(200,200) -- create image 200x200
local red = img:color_allocate(255,0,0) -- create red color
img:line(10,10,50,50, red) -- draw red line
img:toPNG("line.png") -- output png image

I have been thinking about releasing the bindings generator.  Right now I call 
it "Lua API Gen", but I am not sure if that is a good name or not.  If anyone 
is interested in it let me know.


1. http://xcb.freedesktop.org/
2. http://cgit.freedesktop.org/xcb/proto/tree/src/xproto.xml
3. http://cgit.freedesktop.org/xcb/proto/tree/doc/xml-xcb.txt
4. http://avro.apache.org/docs/current/
5. http://code.google.com/p/protobuf/
6. http://incubator.apache.org/thrift/
7. https://cwiki.apache.org/ETCH/index.html
8. http://wiki.secondlife.com/wiki/Message_Layout
9. http://opensimulator.org/wiki/LLUDP_Dissector
10. http://library.gnome.org/devel/glib/stable/

-- 
Robert G. Jakabosky
package "TestRecs" "ntest" {
-- test enum.
enum "TestEnum" {
	TYPE_INVALID = 0,
	TYPE_BOOL    = 1,
	TYPE_INT8    = 2,
	TYPE_INT16   = 3,
	TYPE_INT32   = 4,
	TYPE_INT64   = 5,
	TYPE_VAR32   = 6,
	TYPE_VAR64   = 7,
	TYPE_UINT8   = 8,
	TYPE_UINT16  = 9,
	TYPE_UINT32  = 10,
	TYPE_UINT64  = 11,
	TYPE_UVAR32  = 12,
	TYPE_UVAR64  = 13,
	TYPE_FLOAT   = 14,
	TYPE_DOUBLE  = 15,
},
-- test records
record "TestRecord" {
	version { 0, 1 },
	-- sub-type structure
	struct "TestBlock1" {
		field "uint32" "Test1" { default = 543 },
	},
	struct "NeighborBlock" {
		field "uint32" "Test0" { default = 0 },
		field "uint32" "Test1" { default = 1 },
		field "uint32" "Test2" { default = 2 },
	},
	-- example of usage of enum type.
	field "TestEnum" "enum_val" {},
	field "TestEnum" "enum_val_int32" { default = 'TYPE_INT32' },

	-- bool fields are packed together at the start of the record.
	field "bool" "bool_val",
	field "bool" "bool_val_true" { default = true },
	-- fixed length signed integers of length 8/16/32/64-bits
	field "int8" "int8_val" { default = -21 },
	field "int16" "int16_val" { default = -4321 },
	field "int32" "int32_val" { default = -4321 },
	field "int64" "int64_val" { default = -4321 },
	-- variable length signed 32-bit & 64-bit integers (encoded like protobufs)
	field "var32" "var32_val" { default = -4321 },
	field "var64" "var64_val" { default = -4321 },
	-- fixed length unsigned integers of length 8/16/32/64-bits
	field "uint8" "uint8_val" { default = 21 },
	field "uint16" "uint16_val" { default = 4321 },
	field "uint32" "uint32_val" { default = 4321 },
	field "uint64" "uint64_val" { default = 4321 },
	-- variable length unsigned 32-bit & 64-bit integers (encoded like protobufs)
	field "uvar32" "uvar32_val" { default = 4321 },
	field "uvar64" "uvar64_val" { default = 4321 },
	field "float" "float_val" { default = 4.321 },
	field "double" "double_val" { default = 4.321 },
	-- strings are encoded with the string length first encoded as an uvar32 type.
	field "string" "string_val" { default = "this is a test" },

	-- embedded TestBlock1 struct (i.e. this is the same as a fixed length array
	--   of TestBlock1 structs with length 1)
	field "TestBlock1" "testblock1" {},

	-- Arrays can be fixed or variable length.  The length of variable length arrays are
	--   encoded as an uvar32 type.

	-- fixed length array of 4 NeighborBlock structs
	array "NeighborBlock" "neighborblock" { 4 },
	-- variable length array of NeighborBlock structs
	array "NeighborBlock" "neighborblock_var" {},
	-- fixed length array of 4 bools (encoded as 4 bits, padded to 8 bits)
	array "bool" "bitarray_4" { 4 },
	-- variable length array of bools (padded to a multiple of 8 bits)
	array "bool" "bitarray_var" {},
	-- fixed length array of 4 strings
	array "string" "strarray_4" { 4 },
	-- variable length array of strings.
	array "string" "strarray_var" {},
}
}

c_module "gd" {
hide_meta_info = true,
include "gd.h",
object "gdImage" {
  method_new {
    c_call "gdImage *" "gdImageCreate" { "int", "sx", "int", "sy" }
  },
  method_delete {
    c_call "void" "gdImageDestroy" {}
  },
  method "color_allocate" {
    c_call "int" "gdImageColorAllocate"
      { "int", "r", "int", "g", "int", "b" }
  },
  method "line" {
    c_call "void" "gdImageLine"
      { "int", "x1", "int", "y1", "int", "x2", "int", "y2", "int", "colour" }
  },
  method "toPNG" {
    var_in { "const char *", "name" },
    c_source [[
  FILE *pngout = fopen( ${name}, "wb");
  gdImagePng(${this}, pngout);
  fclose(pngout);
]]
  },
}
}