lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


hi list.
I've just posted a new project called codepeg to github.  It's a
PEG-based lexing/parsing library based on listlpeg.  The goal here is
to build a generic, robust, and more importantly simple lexing/parsing
library that can also perform diagnostics on the code when lexing and
parsing errors are encountered.  Some unique features compared to
other Lua-based PEG parsing projects:

- lexing and parsing are setup in a 2 phase process
- parsing uses listlpeg over a token stream (array of tokens) and
tracks every rule that's visited for diagnostic purposes
- generic grammar specification format for plugging in your favorite
DSL, Lua, C, whatever ...


>From the README:
codepeg is as a generic lexing and parsing system that is agnostic to
any particular language. Lexers and parsers for a specific language
are generated by writing a specification file that describes at a
minimum the language's Tokens and Rules but can also include its
Comments. The lexer exclusively uses the Token and Comment definitions
while the parser also makes use of the Rules.

In addition, codepeg aims to provide accurate and pertinent diagnostic
information during both lexing and parsing. During the lexing process
codepeg provides hooks to signal malformed tokens. During the parsing
process, codepeg tracks the parser as it moves through the grammar
rules such that when a parsing error occurs, all the basic information
required to determine what the error is is available. Currently this
includes a rule stack and the last token in the stream the parser
reached.


There are 2 examples in the repo.  One syntax highlights Lua tokens
exactly like github's browser highlighter.  The other extracts global
functions from a script.  It detects the following styles of
declaration:

function x() end
x = function() end
x, y = function() end, function() end

function o:x() end
o.x = function() end

function o.x.x() end
etc.


Every tool I've seen so far only does the first style.


This project is very new and has not been exhaustively tested.  Any
feedback appreciated.  In particular, codepeg.Parser's diagnostics are
shaky at best.  All the data is there, it just doesn't get utilized
right now.



The project has an example specification file for Lua:
https://github.com/weshoke/codepeg/blob/master/codepeg/specification/lua.lua


project page:
https://github.com/weshoke/codepeg

dependencies:
listlpeg: https://github.com/mascarenhas/lpeg-list


wes