Re: [ANN] lua-pb Lua Protocol Buffers

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: [ANN] lua-pb Lua Protocol Buffers
From: "Robert G. Jakabosky" <bobby@...>
Date: Fri, 24 Jun 2011 22:03:28 -0700

On Friday 24, Josh Haberman wrote:
> Robert G. Jakabosky <bobby <at> sharedrealm.com> writes:
> > -- person.proto:
> > message Person {
> > 
> >   required string name = 1;
> >   required int32 id = 2;
> >   required string email = 3;
> > 
> > }
> > 
> > -- example_person.lua:
> > require"pb" -- first load lua-pb
> > require"person" -- load the above .proto file
> > msg = person.Person() -- create a Person message
> 
> I'm all for magic where there's a benefit, but what is
> the benefit over:

No real benefit, I just though it would be cool.

>   person = pb.load("person.proto")

This is still supported as:
  person = pb.require"person"
or as an embedded .proto:
  person = pb.load_proto([[
    message Person { ... }
]])

> 
> This seems clearer to me.
> 
> If you're going the magical route, I think it makes more
> sense to load things into the top-level namespace, based
> on the package name specified in the .proto file:
> 
>   // person.proto
>   package foo.bar
>   message Person { ... }
> 
>   -- test.lua
>   require "person"
>   msg = foo.bar.Person()

Yes, this seems like a good idea.  if the .proto has a defined package name, 
then that will be used instead of what is passed to require().

> > I am planning on adding these methods to the message interface:
> > msg:MergeFrom(msg1)
> > msg:CopyFrom(msg1)
> > msg:Clear()
> > msg:IsInitialized()
> > msg:MergeFromString(str)
> > msg:ParseFromString(str)
> > msg:SerializeToString()
> > msg:SerializePartialToString()
> > msg:ByteSize()
> 
> I'm thinking more and more that it makes sense to separate
> the in-memory representation of a protobuf from its
> serializations (text format, binary format, JSON, etc),
> both code-wise and API-wise.
> 
> I have plans to write a C-based protobuf extension for
> Lua, and my plans were to have something like:

I am very interested to see how you implement the interface to nested C data 
structures.  I have done this for a private project and it is an interesting 
problem (reference counting, nested structure, arrays of structures).

>   -- These are as you mentioned, because they are not
>   -- specific to any one serialization format.
>   msg:Clear()
>   msg:IsInitialized()
>   msg:CopyFrom(msg1)

This one should be included too:
msg:MergeFrom(msg1) -- CopyFrom() clears the message first.

>   -- These are specific to one or more serializations:
>   pb.Serialize(msg)
>   pb.SerializeText(msg)
>   pb.SerializeJSON(msg)
>   msg = pb.Parse(str)
>   msg = pb.ParseText(str)
>   msg = pb.ParseJSON(str)

Why not pass the format as a parameter?
msg:Serialize(format, ...)
msg:SerializePartial(format, ...) -- this doesn't check required fields.
msg:Parse(str, format, ...)

and use the binary format as the default if 'format' is not provided.  New 
formats can be 'registered' with the library.

> Don't worry, I don't plan to use the "pb" namespace --
> I'm planning to put everything under "upb", since that's
> the name of my project:
>   https://github.com/haberman/upb/wiki

I have been keeping an eye on your 'upb' project for a long time (I think for 
more then a year) waiting for it to become usable.  I still can't wait to see 
it finished.  One of the reasons I started lua-pb is that I wanted to see how 
close LuaJIT could get to the speed of your JIT'ed decoder (haven't optimized 
the project for LuaJIT yet, so it is not even close right now).

> Some other things to consider:
> 
> - do you plan to allow reparenting of nested messages?  eg.
>   msg.foo = Foo().  I ask because you say you're emulating
>   Python proto, which does not allow this AFAIK and instead
>   uses the C++ convention of: msg.mutable_foo.  I've always
>   thought this was awkward for a dynamic language, so plan
>   to allow reparenting, but in that case you have to watch
>   out for cycles that the user may create.

I don't plan on emulating every thing from the Python/C++ interface, and 
msg.mutable_foo is something I don't want to do.

I was planning on allowing a message to be referenced by multiple parent 
messages, but restricting messages to one parent will allow invalidating the 
cached "byte size" of the parent message when a field is changed.  Maybe I 
will just add a "msg:Duplicate()" method.

As for cycles, for now I will just let Lua throw a stack overflow error.  
Cyclic references can only be made by the programmer using the Lua interface, 
a decoded message can't have cyclic references.  Later I might add a max depth 
setting for the encoder/decoder.

> - watch out for 64-bit integers, which lua_Number can't
>   fully represent in its default configuration (since
>   a double only has 52 bits of precision.  Probably the best
>   you can do is just warn your users about the loss of
>   precision.

Yup, right now lua-pb doesn't round-trip (i.e. lost of precision) large 64-bit 
integers.  I am not sure what would be the best way to handle this in standard 
Lua.  Even if a message internally stored 64bit integers as binary 8byte 
strings (packing/unpacking every time the field is accessed), Lua wouldn't be 
able to work on the full value.  I should atleast add support for preserving 
64bit integer fields that are not changed by the Lua code.

-- 
Robert G. Jakabosky

Follow-Ups:
- Re: [ANN] lua-pb Lua Protocol Buffers, Josh Haberman

References:
- [ANN] lua-pb Lua Protocol Buffers, Robert G. Jakabosky
- Re: [ANN] lua-pb Lua Protocol Buffers, Robert G. Jakabosky
- Re: [ANN] lua-pb Lua Protocol Buffers, Josh Haberman

Prev by Date: Re: LuaJIT without the JIT?
Next by Date: debugger for mac or Linux
Previous by thread: Re: [ANN] lua-pb Lua Protocol Buffers
Next by thread: Re: [ANN] lua-pb Lua Protocol Buffers
Index(es):
- Date
- Thread