[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [LPeg] How can I parse a subset of markdown?
- From: "Soni L." <fakedme@...>
- Date: Sat, 23 Jul 2016 00:17:56 -0300
On 22/07/16 08:59 PM, Sean Conner wrote:
It was thus said that the Great Soni L. once stated:
On 22/07/16 08:13 PM, Sean Conner wrote:
It was thus said that the Great Soni L. once stated:
http://stackoverflow.com/q/38514522/3691554
I'm trying to parse a subset of markdown into a tree with LPeg. The idea
is simple but I'm not sure what I'm doing. The whole spec for the thing
I'm doing is here[1] and yes, that's a master branch github link, there
are still some things I need to work out.
I'm not exactly sure how you want the resulting table to look like, but
going from this minimal example [1]:
#Tag
##Attribute
###Value
Content
Invalid. Instead:
Then I suggest you fix https://github.com/SoniEx2/MDXML/blob/master/README.md
as that's where I got the above.
Hmm well granted that doesn't specify how the content and tags go
together...
Second, I think I gave you enough to go on your own. *I* am not terribly
interested in writing the LPeg for this.
#Tag
##Attribute
###Value
Content
should produce:
{ -- document root
{ -- root tag
[tagname_idx] = "Tag",
["Attribute"] = "Value",
"Content"
}
}
Ah, nice that you finally gave an example of the output. So from here, I
could expect:
#book
##edition
###3
> #name
> Programming In Lua
> #ISBN
> 859037985X
Tags followed by a `>` line are nonempty tags (equivalent to
<tag></tag>), while tags not followed by a `>` line are empty tags
(equivalent to <tag />), so this wouldn't do what you expect.
To produce:
{
{
tagname_idx = "book",
edition = 3,
{ tagname_idx = "name" , "Programming In Lua" },
{ tagname_idx = "ISBN" , "859037985X" },
}
}
Except you'd get:
{
{
tagname_idx = "book",
edition = 3,
{ tagname_idx = "name" },
"Programming In Lua",
{ tagname_idx = "ISBN" },
"859037985X"
}
}
(I will say this---I liked RFC-7049 because it included a TON of encoding
examples)
an initial stab at the problem (untested):
[code]
I opted to store the "tag" as the [0]th element because that's what
LuaXML
does when parsing XML documents. This should get you going though (other
things left as an exercise---what if there's a missing tag? Adding in
escape sequences. That odd 'raw' mode I didn't understand. Parsing nested
data)
A missing tag should be an error. A missing attribute value should be an
error. Raw mode means "disable the parser and treat everything as data"
like XML's <![CDATA[ ]]>. Note that missing tags can only happen when
you go in a `> ` block.
If you have any other LPeg questions, I'll be happy to answer them. I'm
not up to writing the code for you though.
[1] And I'm wondering why you even want this, when you could just use
Lua directly, or JSON, or YAML, or *any number of existing
half-documented markup languages masquerading as a "standard"* but
I'll take you at face value and not ask WTF?
I can use this for config files, because it's a clean config file format
unlike XML, and I can also use this to generate XML documents (e.g.
XHTML webpages) because I designed it that way.
At work, most of the components are configured using XML, except for the
one component I wrote in Lua. I use Lua as the configuration file for that.
And amazingly enough, the ops group has no problems with it. Neither does
the tester (or the rest of the development members in our department).
I don't really see what's wrong with:
version = "1.0"
encoding = "utf-8"
programmming =
{
languages =
{
{ name = "Lua" , link = "http://www.lua.org/" },
{ name = "Python" , link = "https://www.python.org/" },
},
books =
{
{ name = "Programming in Lua" , edition = 3 , ISBN = "859037985X" },
},
}
as that is *way* more concise than https://raw.githubusercontent.com/SoniEx2/MDXML/master/example.md
and could just as easily be converted to XML. Another issue I see with your
format is the use of repeated ">" to indicate nesting level, and it's the
same issue I have with Python and it's significant whitespace to indicate
nesting level---it makes reorganizing a bit more onerous.
But hey, it's your project---knock yourself out.
-spc (But hey, it's your project---knock yourself out)
I already have to do XHTML anyway. It's an excuse to try out my new
language. Also, at least `>` doesn't get stripped like whitespace.
--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.