lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


G'day,

Some background on what "would be nice" in testing, as opposed to any
specific discussions on a "good" specific framework.

First of all, most testing literature defines a "defect" as "any place
where the system fails to meet the Requirements Specification".
Sometimes the lack of a clear set of requirements is one of the major
causes of heartache further on:  If you ask, "What are your
requirements?", and the client answers, "I dunno... well, what do you
suggest?", then you need to tread very carefully.  In this situation,
I tend to recommend small, simple, common, low-level things to look
at, and use the time taken while doing these to discuss higher-level
ideas with the client -- in essence, defer decisions while they are
not on the critical path, as more relevant information may become
available before the decision has to be made (or may be made clearer
by the work on the low-level things).

This is perhaps a place where the easy-to-get-started nature of Lua,
being able to hack scripts together quickly, and then mutate them as
knowledge of the problem domain grows, is vulnerable to problems:
There is no formal specification that the programmer is starting from.

Early on, get into the habit of using version control tools (e.g. Git),
not only for code, but also for system administration.  Use these
tools to manage changes, and set up hooks so that any time a change
is submitted, an extended test suite (see below) is triggered.

Then, I'd suggest that a test rig run through a sample set of fairly
straightforward tests, to find out quickly if anything is drastically
broken.  If anything breaks here, stop.

Next, tests might try to look "from the outside" at edge cases: For
example, in ASCII, "/" comes just before "0", and ":" comes just after
"9".  Write tests to see that simple edge cases are not violated by
off-by-one errors (or whatever).

Then, tools such as Valgrind are available to look at whether critical
resources, such as memory, are being mishandled (e.g. illegal
references, or perhaps not freed at normal program termination).  This
is especially valuable if you want to write modules that are
sufficiently trustworthy of being reused.  [As a side note, see David
A. Wheeler's writings on "fuzzing" code, e.g. being able to create a
valid JPEG image from an image translator, given a starting point of
the string "Hello, World!" as a starting point!]

An obvious test category is regression testing:  Mandate a policy that,
for a patch fixing a bug, there must also be a change to the test rig
that fails if the bugfix is not applied, but passes if the bugfix is
applied.  For some programs, this can be quite tricky to set up.

At this point, having a diversity of systems, with varying OSes,
endianness, word size, etc, all controlled from a testing controller
such as BuildBot may be valuable -- every time anyone checks in a
change to the code, a wave of test runs is triggered across all the
systems, and a "scoreboard" of machines and pass/fail results, tied
to the change request activity.

You may have different sets of run types/test thoroughness, moving
from "pretend", then "regular", and through to "special/taxing" tests
(depending on resource usage, e.g. slow or large-memory requirements),
and let the user select what test to run before submitting, and have the
Testing Bot work through the more demanding cases in off-peak periods
(e.g. overnight).

Another thing to consider is to have debug builds with an extra (Lua?)
language that allows for fault injection at various points, so that
you can check that the code handles exceptions (signals?) such as
out-of-memory, file not found, etc. correctly.

And next-to-last, I'll mention peer review, moving from simple code
reviews up to fully-fledged Software Inspections.  Software
Inspections are very expensive (e.g. teams of five descending on a
module of code, with specific roles such as Moderator, Author, Peer 1,
Peer 2 and Journal/Minute Recorder).  Not only is the software
exposed to multiple eyeballs, but the hope is that the Moderator can
invoke a "Phantom Reviewer", such that the team is able to function more
effectively than the individuals alone.  Rigorous statistics are also
used to measure how the teams churn through lines of code, and how many
defects per (k?)LOC are found.  Further, if a defect is found, it is
used as a sentinel of a possible systematic fault in the way code is
composed, and so a flag is raised to look for possible similar
systematic errors elsewhere.

And finally, have *adversarial* test teams outside the group that
prepared the software, who take the Requirements Specifications, and try
to come up with ways to break proposed solution.

OK, so the above is beyond the scope of the original discussion.  Hope
it is useful.

cheers,

sur-behoffski (Brenton Hoff)
Programmer, Grouse Software