[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Coroutines to implement filters (Fwd: Filter Chains)
- From: Chris Babcock <cbabcock@...>
- Date: Tue, 1 Feb 2011 18:31:05 -0700
This is how I explained using coroutines in a filter application to
the hobby community surrounding a Linux-based game server. You can use
pipes to implement filters - and that's a good analogy for a Linux
Users Group - but I challenge anyone to construct a Linux command line
with standard Gnu tools to grep, sed and awk an email implementing
features of RFC 5822 into one that can be parsed by a tool built in
1988.
I included it here because it may be of use to the OP in the
"Recommendations on my upcoming Lua presentation?" thread.
Chris
---------- Forwarded message ----------
From: Chris Babcock <cbabcock@asciiking.com>
Date: Wed, Sep 1, 2010 at 9:52 AM
Subject: Re: Filter Chains
How to document this is precisely the problem I'm looking at. This is
slightly different than yesterday because I realized that I'm passing
an object, not a message, and need to write that back in RFC 822
format rather than what I've been testing with. With that change made,
here's the intro. This isn't necessary for someone to just run the
judge, but is useful for someone who wants to work with the filters.
Please let me know how usable this is for the "average judge
keeper"...
--
Conceptually, filter chains are like Unix pipes. Source is a program
that creates data and sink is a program that uses it:
source | sink
You can add programs that modify the output to the pipeline:
source | filter1 | filter2 | sink
The order of filters matters. If sort puts items in alphabetical order
and uniq removes items from a list when it is a duplicate of the
previous item then these two chains are *not* equivalent:
source | sort | uniq | sink
source | uniq | sort | sink
This unit of work for most shell tools is the line but we need to
process messages, so we are going to use co-routines for our pipeline.
Co-routines are like backward functions. Normally programs have data
and call a function to do something to that data. We have something we
want to do and we're going to call a coroutine to get the data. Here,
we want to write a message out to the judge but we need to get a
message from the mbox file first. It's written in function notation so
the source is on the inside:
write(mbox())
Before we can write the message from the mbox out to the judge,
though, we need to add the mime filter:
write(demime(mbox())
If we called our input "source", our output "sink", and the filter
"filter" then it would look like this:
sink(filter(source()))
If we want to put the elements back into the order that we think of
this processing then we could write it out the long way:
chain = source()
chain = filter(chain)
sink(chain)
We need to do two additional things before we can write a message out
to the judge. We need to save (or persist) information about the
original message for our output filter to use and we are going to do
content filtering on the body of the message. The persist step needs
to happen before we change any of the header fields:
write(demime(persist(mbox()))
Our content filter needs to operate on messages already processed by
the mime filter:
write(content(demime(persist(mbox())))
Some judge keepers are going to want to add filters, change them or
maybe remove some. Let's make this easy to do and undo by writing it
out the long way:
-- Use two dashes to comment out a line.
chain = mbox()
-- Comment out the next line to use your old smail script:
chain = persist(chain)
chain = demime(chain)
-- Require an explicit signoff if signon or create are found:
chain = content(chain)
write(chain)
Adding filters to the chain and writing new filters will be covered later.
For a casual user, the experience is the same as using the standard
libraries of a "batteries included" language. The advantage for the
community is that the filters don't have to be black box, so those who
care that a given mail provider sends mail that isn't read by the
current filter can easily modify the filter and distribute the new
filter if they want.
Chris
On Wed, Sep 1, 2010 at 4:14 AM, David Norman <da...@...on.co.uk> wrote:
> Chris,
>
> I feel like I've stepped into the middle of a technical document without
> reading the introduction...
>
> David.
>
> At 01:10 01/09/2010 , you wrote:
>>
>> The input filter is taking the form of a filter chain. After defining
>> the filters, the chain is executed in one line, like this:
>>
>> io.write(content(demime(persist(mbox())))
>>
>> Or maybe the long form:
>>
>> chain = mbox()
>> chain = persist(chain)
>> chain = demime(chain)
>> chain = content(chain)
>> io.write(chain)
>>
>> So instead of deleting a filter from the chain, it can be commented out:
>>
>> chain = mbox()
>> chain = persist(chain)
>> chain = demime(chain)
>> -- chain = content(chain)
>> io.write(chain)
>>
>> The mbox() function is an iterator that feeds one message at a time
>> into the chain. A judge keeper anxious to optimize the input chain can
>> change mbox() to io.read("*all").
>>
>> The persist() function is going to save information about the message
>> that our new smail can use to generate a valid reply.
>>
>> The demime() filter is where mail is converted into a message the
>> judge will understand.
>>
>> The content() function is the area where rules like "explicit signoff
>> required" will be enforced. This was in bugzilla, so it will be
>> enabled by default but easily disabled if anyone cares.
>>
>> The io.write() function is part of the standard library. As used here,
>> it just dumps the content on stdout. It may make sense to execute rdip
>> directly at some point, but there's no advantage to doing that yet.
>>
>> Chris
>
>
>