We really need a development model change !

Wed Jan 2 06:36:14 CST 2002

On 30 Dec 2001, Alexandre Julliard wrote:
[...]
> In fact here's a 10-minute hack to add a make test target. With that
> all you have to do is create a test script in dlls/xxx/tests/foo.test,
> put the expected output in tests/foo.test.ref (presumably generated by
> running the test under Windows), add your script to the makefile and
> run make test.

   I think what we need with this is a couple of guidelines and
documentation for potential test writers, and maybe a couple of
extensions. The following is half a proposed documentation that we could
put in the Wine Developper Guide, and half a proposed specification for
some possible extensions. As usual, comments and suggestions are
welcome.

What is a test
--------------

   A test unit is an executable or script. You can name it anyway you
like (please no space in the names, it's always annoying). All test
units should be non-interactive. A test unit called xxx generates two
outputs:
 * its exit code
 * text output on either or both of stdout and stderr, both of which are
normally redirected to a file called 'xxx.out'.

   A test succeeds if:
 * its exit code is 0
 * and its output, 'xxx.out' matches the reference output according to
the rules described later.

   Reciprocally it fails if:
 * its exit code is non zero
   Either because one aspect of the test failed and thus the test unit
decided to return a non-zero code to indicate failure, or because it
crashed and thus the parent got a >= 128 error code.
 * or because its output differs from the reference output established
on Windows

   Under this model each test unit may actually be comprised of more
than one process (for instance to test CreateProcess, inter-process
messaging, inter-process DDE, etc.). All that counts is that the
original process does not finish until the testing is complete so that
the testing framework knows when to check the test output and move on.
   (There is no provision for hung tests. A time out based mechanism,
with a large time out, like 5 minutes, could do the trick.)

   A test unit can also exercise more than one aspect of one or more
APIs. But, as a rule of thumb, a specific test should not exercise more
than a couple to a handful related APIs (or up to a dozen in extreme
cases). Also, more than one test could exercise different aspects a
given API.
   So when running the Wine regression tests, if we find that 3 tests
out of 50 failed, it means that three processes had an incorrect exit
code or output out of fifty. One should then analyze in more details
what went wrong during the execution of these processes to determine
which specific API or aspect of an API misbehaved.  This can be done
either by looking at their output, by running them again with Wine
traces, or even by running them in a debugger.

Test Output
-----------

   Wine tests can write their output in any form they like. The only
important criteria are:
 * it should be reproducible from one run to the next: don't print
pointer values. They are most likely to change in the next run and thus
cannot be checked
 * it should be the same on a wide range of systems: don't print things
like the screen resolution!
 * it should be easy to correlate with the source of the test. For
instance if a check fails, it would be a good idea to print a message
that can easily be grepped in the source code, or even the line number
for that check. But don't print line numbers for success messages, they
will change whenever someone changes the test and would require an
update to the reference files..
 * the output should not be empty (just in case the process may die with
a 0 return code / fail to start before writing anything to the output)
 * finally it should be easy to read by the people who are going to be
debugging the test when something goes wrong.

   To each test we associate a file containing the reference output for
that test. If the test's output consists of a single "Test Ok", then
that file may be ommitted. (I am not sure if this shortcut is actually
very needed/useful)

   Otherwise this file is either called:
 * 'xxx.ref'
 * or 'xxx.win95' or 'xxx.win98' ... if the output depends on the
Windows version being emulated. The winever-specific file takes
precedence over the '.ref' file, and the '.ref' file, which should
exist, serves as a fallback.

   This second feature is probably best avoided as much as possible as
multiple reference files are harder to maintain than a single reference
file. But they maybe be useful for some APIs (can anyone think of any?).
In any case I propose not to implement it until we actually find the
need for it.

   One may also create a file called 'xxx.ref.diff' (resp.
'xxx.win95.diff', etc.) which contains a diff between the test output in
Windows and the test output in Wine. The goal is to:
 * make it unnecessary to tweak tests to not report known Wine
shortcomings/bugs, or to remove these tests altogether
 * but not have a hundreds of tests that systematically fail due to
these shortcomings either (I can think of at least one case related to
command line passing). Because then you would not know if more aspects
of these tests fail than usual or not.

   The criteria to determine success/failure of a test unit xxx then
becomes:
   xxx >xxx.out 2>&1
   if the return code is != 0
      then the test failed
   diff -u xxx.ref xxx.out >xxx.diff
   if there is no xxx.ref.diff && xxx.diff is not empty
      then the test failed
   if xxx.diff is different from xxx.ref.diff
      then the test failed
   otherwise the test is successful

   The test framework can then return three numbers:
 * the number of failed tests
 * the number of tests with known bugs
 * the total number of tests

   (and/or various other differences between these numbers)

Test coverage
-------------

   Each test should contain a section that looks something like:

# @START_TEST_COVERAGE@
# kernel32.CreateProcess
# kernel32.GetCommandLineA
# kernel32.GetCommandLineW
# __getmainargs
# __p__argc
# __p_argv
# __p_wargv
# @END_TEST_COVERAGE@

   The goal of this section is to identify which APIs are being tested
by a given test unit. Each API is specified as 'dll.api'. If the dll
name is omitted, then the API is assumed to be in the dll in which the
test unit is located (in the above example that would be msvcrt).

   Note that we cannot simply extract the list of APIs being called by a
given test. For instance most tests are likely to call APIs like printf.
And yet, printf should not be recorded as being tested by a thousand
tests.

   We can then write a script that uses this information to:
 * list the tests that cover a given API
 * build a list of the APIs that have no associated tests
 * build all sorts of fancy and aybe useful statistics

   (the above section would work just as well inside C-style comments,
one way to handle this is to ignore leading non-alphanumeric characters)

Test environment
----------------

   A test may need to know things about its environment although
hopefully this will be relatively rare. So I propose to store some
information in environment variables as this seems the least intrusive
way to provide them with information:

 * TEST_OUT
   The name of the test output file. This may be useful if a test needs
to create new processes and to redirect their output to temporary files.
If the child processes need to output infomation to the test output,
then they can use this environment variable to open the rght file.

 * TEST_WINVER
   This contains the value of the '-winver' Wine argument, or the
Windows version if the test is being run in Windows. We should mandate
the use of specific values of winver so that test don't have to
recognize all the synonyms of win2000 (nt2k...), etc.
   (Do we need to distinguish between Windows and Wine?)

 * TEST_BATCH
   If true (equal to 1) or unset, then the test should assume that it is
being run from within the test framework and thus that it should be
non-interactive. If TEST_BATCH is set to 0, then the test can assume
that it is being run in interactive mode, and thus ask questions to the
user. Of course most tests will simply behave identically in both cases,
but in some cases an interactive mode may be useful. For instance the
test concerning CommandLineToArgvW could have an interactive mode where
it asks the user for a command line and prints the corresponding
argc/argv. This would allow a developper to manually check how the API
behaves for specific command lines.

   I thought about passing these arguments on the command line but:
 * it seems less practical, especially since 'main' would have to parse
it and store is somewhere
 * it may interfer with some tests (command line related tests,
although only child processes should care about that)
 * it seems less expandable and flexible

Running tests
-------------

In Wine:

   'make tests' seems the best way to do things.
   But some tests may need to create windows. For instance I have a DIB
test that creates a window, draws a number of DIBs in it and checks the
bitmap bits of these DIBs and then exits. Thus it is a non-interactive
test. I am not really sure whether the window actually needs to be made
visible or not, but even if this particular exemple does not require it,
I suspect that othersm checking for message sequences for instance, may
need to make the window visible.
   And if the Wine test suite contains many such tests, then there will
be windows popping up and down all the time and it would make it
impossible to use the computer while the tests are running.
   So we may:
 * have to test lists in the Makefiles: cui-tests (those that don't pop
up windows) and gui-tests (those that do)
 * and add two corresponding targets: 'make cui-tests' runs only those
tests that do not pop up windows, and 'make gui-tests' runs only those
tests that do pop up windows
 * 'make tests' would be 'tests: cui-tests gui-tests'

   Of course, it should be noted that one way to deal with tests that
pop up windows is to run them inside a VNC X server, a second X server
or some other similar X trick.

In Windows:

   Hmmm, not sure how that is done. Run 'winetest.exe'?

Writing tests
-------------

   This is where we describe which APIs are available to a test
writer... if any. I believe that very little functionality is
necessary/useful.

 * test_failed(message)
   Sets a global variable to 1 to indicate failure and printf the
specified message

 * test_result()
   Returns 0 if the 'test_failed' was never called, and 1 otherwise.

 * get_test_out()
   Returns the contents of $TEST_OUT

 * get_test_batch()
   Returns true if the test is being run in batch mode and false
owtherwise. Of course this is based on $TEST_BATCH

 * get_test_winver()
   Returns the current Windows version, or the version being emulated if
in Wine. This coould simply be based on $TEST_WINVER.

   That's about all. As you can see this is pretty basic stuff and I am
not sure more is needed. But if you have ideas...

   What is needed most is a two sample tests:
 * one simple console based test
 * another one involving some GUI stuff

   Then the documentation could use them as examples and discuss the
interesting aspects.

   I believe that all the above is fairly neutral as far as perl vs. C
is concerned. Except for the compilation issues, and maybe the exact
command to use to invoke a test, whether a test is written in C or in
perl should not make any difference.

--
Francois Gouget         fgouget at free.fr        http://fgouget.free.fr/
                            1 + e ^ ( i * pi ) = 0

We *really* need a development model change !

We really need a development model change !