sc2 performance (was: Wine and D3D)

Stefan Dösinger stefan at
Tue Jan 11 04:38:40 CST 2011

Hash: SHA1

Am 10.01.2011 um 22:03 schrieb Emanuele Oriani:
> 2) Stefan, do you have any hint on where we should start optimizing the calls of D3D -> OpenGL functions?
> Where do we waste time/CPU cycles?
A few bugs that are probably easier to fix(there's no really easy bug. It would have been fixed already):
*) context_validate(context.c). The way it performs the checking is expensive. We already hook the wndproc to intercept messages, we could just intercept the messages involved with window destruction and set the valid flag on the context.

*) stream declaration parsing(device_update_stream_info, device.c): This needs some better data structures to either avoid re-parsing, or make parsing faster. The current problem is that we have to do this every time the shader or vertex declaration or a vertex buffer is changed. That virtually means we have to do this every frame.

*) FBO application. Currently we do this every draw, this is unnecessary. Unfortunately there are multiple conditions when this has to be done, among them:
- -> A render target is changed
- -> The depth stencil is changed
- -> The contents of one of those surfaces has changed(e.g. Surface::Map)
- -> There are many more, compiling a list is a good starting point to fix this issue

*) The vertex shader is re-bound needlessly. The vertex shader depends on the vertex declaration, but only minimally(D3DCOLOR input type swizzling if GL_ARB_vertex_bgra isn't supported). The question however is how often does this happen in real apps. It causes half of the fps problem in my test app, but probably only a minor hit in most real apps.

*) render target and depth stencil dirtification in drawprimitive(directx.c). This isn't overly expensive, but does sum up, and like FBO reapplication it is rarely needed. There are however many situations in which it is needed.

*) Some global compile stuff: -fPIC costs quite a bit, and compiling out debugging stuff improves performance too. This is something a user who wants fast games and doesn't care about the drawbacks currently has to do on his owm.

*) Various locking things are expensive, the wined3d lock, the X11 lock. You can probably compile them out for single threaded games, but there's no general thing we can do. Also this is pretty specific to my test app too, although you can see the cost of locking in real apps as well(e.g. 3DMark 2001)

I think those are the main ones. You can test my test app, it is fairly hard to break with hacks. With a number of hacks I made it run faster than on Windows, but getting all those fixes in is highly unrealistic.

Some performance data(Macbook pro, 2.8 ghz core 2 duo, geforce 9600):
GL version, 64 bit, Linux: 3200 fps
GL version, 32 bit, Linux: 2400 fps
Windows GL version, Wine, locking hacked out: 1600 fps
GL version, 32 bit, Windows: 1400 fps
D3D version, Windows: 730 fps(has to run in fullscreen, sometimes driver forces vsync)
Windows GL version, Wine: 500 fps
D3D version, Wine: 80 fps.

Those numbers are from my memory, so they may be wrong. But in general you get the idea. Note that you don't have to worry too much about the 500 fps for the GL version in Wine - Due to the nature of the app(many, many tiny draws) it hits the locking overhead really hard.

> 3) Setting affinity is a "game changer".
I'm curious, have you done the same tests on Windows?

Note that you can also compile wined3d for windows and test it with SC2. That helps separate 3D related bugs from non-3D bugs. You cannot use this technique to split blame between wined3d and the driver though. Only between the 3D subsystem and the rest of the code.

Version: GnuPG/MacGPG2 v2.0.16 (Darwin)


More information about the wine-devel mailing list