Performance testing experiences

Wed Jul 11 15:35:42 CDT 2007

Hi,
In the past days I've been actively hacking on automated performance testing. 
I have test scripts for Half Life 2, 3Dmark2000 and 3dmark2001 and a small z 
buffer test app. I am playing around with UT2004 and I want to take a look at 
plain old unreal tornament. I have the tests running on my old laptop and my 
amd 64 box. I want to extend that to my mac too. To collect some experience 
I've built a quick and dirty server "app", consisting of 200 lines of php, to 
get a better idea about what I want.

Failed bechmark attempts are Half Life 1 and the codecreatures engine 
benchmark. Both failed because I did not find a way to write the results to a 
file. In HL1 this may work by marking a bit of text with the right mouse 
coords and copying it to clipboard. With codecreatures we're out of luck, 
unless we use OCR. Final Fantasy XI has the same problem, but I knew that 
already.

My server side graphs can be seen here:

http://84.112.174.163/~stefan/

The amd64/ and laptop/ folders contain folders for each test, which contains a 
result.php which calls gnuplot to generate graphs and build a formatted 
table. It's all highly inflexible and not meant as a permanent solution. 
Adding a new test or a new computer means major copypaste and find / replace 
work.

Some problems I've come accross are recording cxtest scripts, which just does 
not work for games that need 3D in VNC. So I've written them by hand. Games 
for a big part also resist controllabiliy with the wait_window script. Some 
games just create their only window when they are started, and then take 
another X secounds to really start up and react to clicks or keypresses. 
wait_window causes a 1 secound X server freeze every few secounds, which 
invalidates benchmark results. For this reason I have put a sleep into the 
3DMark2000 test script to freeze cxtest while the benchmark is running.

Another problem is that the performance can be quite different from run to run 
with the same wine version. I'm not sure what causes this, but I will try to 
run the tests with everything else shut down. Failing that, we can still run 
all tests repeatadely and find an average result.