Asynchronous Direct3D rendering design sugeestion

Wed Jul 19 13:13:29 CDT 2006

Hi,
(warning: long mail)

Currently the wined3d code is doing more or less syncronous rendering, that 
means that a Direct3D function call from the app results directly results in 
the equivalent opengl call(s). There are a few issues with that:

* Multithreaded Direct3D: Opengl calls can only be done from the thread that 
owns the glX context. Direct3D calls can be done from any thread. Passing 
around the context is only possible with hacks(SetThreadContext or 
pthread_kill) and prone to deadlocks.

* Performance: Applications expect the 3D calls to return immediately so they 
can do other things while the gpu is rendering. GL works in the same way, so 
our Direct3D rendering functions should return almost immediately too, but 
due to the state changes and drawStridedSlow seem to cause gl to wait until 
the pipeline is empty.

My suggestion is to create a per-device thread which does the rendering and 
owns the thread, and the rendering calls only place some tokens into a queue 
and return immediately. This way the app gets the control back immediately 
and multithreaded direct3d is only about locking the queue correctly. The 
rendering thread and all rendering code would be in drawprim.c(and maybe a 
new file e.g. opengl_utils.c). The other files would contain no gl code.

Here are some more concrete suggestions for implementing this:

The pipeline is a block of memory with a fixed size(e.g. 64k, whatever), and 
the work orders that are placed in it consist of an opcode and any number of 
arguments. A NULL opcode means that the place is empty. When a new operation 
doesn't fit at the bottom of the pipeline we start again. A instruction 
pointer points at the opcode of the next instruction. If the next opcode is 
NULL that means that the pipeline is empty. When an instruction was executed 
the memory of the instruction is zeroed and the instruction pointer set to 
the next byte after the old instruction. A new instruction doesn't fit into 
the pipeline if it would overwrite nonzero memory, then we issue a warning 
and wait until some more space is free.

A little modification would be to fix the number of arguments to an opcode. 
Checking for emtpy instructions would be easier then because we only have to 
check if the address holding the operation code is NULL and not the whole 
memory when placing a new instruction, but on the other hand waste memory.

Of course we can HeapAlloc the instructions and place pointers to the 
allocated memory. That doesn't waste memory(maybe, depends on how HeapAlloc 
works), but imposes the overhead of regular HeapAlloc / HeapFree calls.

So what instructions would we need? Everything that issues GL calls. Here are 
some I could think of and some implementation thoughts:

SetRenderState:
IWineD3DDevice::SetRenderState sets the update stateblock, and if not 
recording to a stateblock places a SETRENDERSTATE operation. Arguments are 
the render state to set and the value to set it to. When the instruction from 
the pipeline is exectured the value is set in the actual render  stateblock 
and the gl state is updated with the code that is in setrenderstate already.
IWineD3DDevice::GetRenderState returns the value from the update stateblock, 
so it is independent from the execution state of the pipe.

SetTextureStageState:
Arguments are the stage, state and value, otherwise it is simmilar to 
SetRenderState

SetStreamSource, SetTexture, Set*Shader:
Update the update state block, update the refcounts and if not recording queue 
a setting operation for the stream/texture/shader. This operation updates the 
render state block, but does not necessarilly change the gl setting(e.g. 
SetTexture requires texture coords in the vertex too)
The Getters return the values stored in the update state block.

SetDisplayMode, GetDisplayMode: Not GL calls

SetClipPlane, SetMateral, SetLight, SetLightenable, SetTransform, 
MultiplyTransform, SetViewport: Pretty simple, update the 
updatestateblock, ...

SetFVF, SetVertexDeclaration: Updates the update stateblock and queues a 
SetDeclaration operation. The declaration is stored in the render stateblock 
and referenced for rendering. I'd suggest that the render thread should not 
deal with FVFs

Set*ShaderConstant: No idea

UpdateTexture, UpdateSurface: No idea either. Maybe relay to DirectDraw Blits

ApplyStateBlock:
Compare the stateblock contents against the updatestateblock, update the 
updatestateblock and queue Set* commands for different ones

Surface Locking:
Set up the local memory for the surface, and if necessary issue a command to 
read back the surface from gl. Wait for this command to be executed and wait 
until the last command referencing the surface is finished. If a surface is 
locked often keep the local memory copy to avoid flushing the whole pipeline 
for the readback command. When the surface memory is ready pass return

Surface unlocking:
If necessary start converting the surface e.g. for color keying in a seperate 
thread and return. If the surface is used for drawing before the conversion 
is complete the rendering thread has to wait until the conversion is 
finished. Uploading the surface to gl is done during drawprim when the 
surface is used.

Vertex Buffer locking:
Simmilar to surface locking. If neither NOOVERWRITE or DISCARD locking flags 
are provided wait until all rendering with the buffer is done. Then return 
the buffer data. We may have to give up the idea of mapping gl memory via 
glMapBuffer or we might have to wait for the whole pipeline to be executed to 
place a command for that.

Unlocking vertex buffers:
If the semantics of the data is known start fixing up vertices in a seperate 
thread. When done fixing up the buffer place a preload command into the 
pipeline to load the buffer as early as possible, some gl implementations 
seem to need that. Again if the buffer is used for drawing until the 
conversion is done the drawing thread has to wait. Also convert buffers if no 
vbos are available to get rid of drawStridedSlow completely.

Drawprim: This is the most complex thing:
First, check if all bound textures and vertex buffers are unlocked(Unit 
test!). Then increase the rendering reference counter of all textures and 
buffers(to count how often the object is used in the queue). Then queue a 
drawprim command and return. If drawing from a user pointer we either have to 
wait until drawing is done or create a copy before placing the call(this is 
my favorite, we can fixup colors too while we're at it)

Blits: Find out if the blit can be handled in opengl and queue a blit 
call(which will draw a textured quad). If gl can't handle that fall back to 
the gdi code, it will perform everything in software, from a gl perspective 
surface locks are done. This is slow, we will want to handle everything in 
gl.

GetDC: At the moment this is a LockRect from the gl pov, we may want to write  
a gl gdi driver which queues commands on the pipeline

Present:
Queue a FLIP command and wait for the pipeline to be emptied, then return. 
Ideally the rendering is done when Present is called and present returns 
immediately.

Destroying objects: Wait until they aren't needed in the pipeline anymore in 
Release.

Open issues:

SetRenderTarget:
Afaik those have their own gl context. Should we have a different pipeline or 
request the worker to switch to a different gl context? Synchonisation is a 
issue

Multiple swapchains: Simmilar issue

Anything I forgot?

How do we reference objects in the pipeline? For the start I'd suggest to use 
the implementation pointer, later we may want to replace it by handles(to 
avoid issues with the pointer size on 64 bit). See the roadmap.

My suggestion for the roadmap:
1) Start by protecting the ddraw, d3d8 and d3d9 objects with critical sections 
against race conditions(easy).
2) Move the code around in wined3d a bit, split up COM from GL stuff without 
actually changing the way rendering is done.
3) Add a stub pipeline and add code queing the command
4) Move the context into the worker thread and call the actual gl commands 
from there

Additional stuff that can be done if we feel like it:
* Get rid of COM in wined3d
* Move non-rendering things like Private Data, Stateblocks and Getters into 
ddraw, d3d8 and d3d9, leave only rendering code in wined3d. The software 
ddraw code has to stay in wined3d though
* Adopt ddi, ddentry or whatever is used in windows xp / vista(Potential legal 
issues as these interfaces aren't well documented).

Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
Url : http://www.winehq.org/pipermail/wine-devel/attachments/20060719/feefa271/attachment.pgp