[Bug 11674] Dual-core unsupported in WoW and SC2

Tue Dec 25 11:36:46 CST 2012

http://bugs.winehq.org/show_bug.cgi?id=11674

--- Comment #272 from Pierre-Loup Griffais <bugs.winehq.org at plagman.net> 2012-12-25 11:36:46 CST ---
Please keep in mind all my input so far applies to my original point of
leveraging the potential performance gains that the NVIDIA threaded
optimizations can provide, not the general case, where I would expect both
updating methods to be roughly equivalent (with the small tradeoff of MapBuffer
requiring more address space but less copies).

Please see the "Threaded Optimizations" section in the link below for more
context:

ftp://download.nvidia.com/XFree86/Linux-x86/313.09/README/openglenvvariables.html

The goal here is to have _minimal_ CPU overhead in the main thread, so the
logic there doesn't have any knowledge of the GL state. This greatly improves
performance in both CPU-bound and GPU-bound use-cases (since it reduces
starvation problems and allows the driver to more easily perform optimizations
at the command-stream level, rather than dealing with a single command at a
time), at the expense of not interacting well with totally synchronous commands
such as all Gets and MapBuffer. Semi-synchronous APIs that were designed with
pipelining in mind such as queries are still a fast path.

Currently the only two threading modes exposed are "forced-off" (the default)
and "forced-on", which the __GL_THREADED_OPTIMIZATIONS environment variable
controls. In the future there will be an "auto" mode similar to Windows where
the driver will know to fall out of threaded mode if it detects that the
workload uses a large number of synchronous calls, to avoid impairing
performance in these cases.

To summarize, I think that in the current state of things, MapBufferRange vs
BufferSubData for the regular case are two fast, valid approaches with
tradeoffs on each side. If address space is a concern, relying on buffer
mappings might be problematic, however.

But to get the best throughput, I recommend enabling the NVIDIA threaded
optimizations and using BufferSubData with the invalidation scheme I explained
(since it gives the driver similar information to the what the MapBuffer path
specifies) to unlock further performance gains and be faster than D3D at
dynamic buffers.

I hope you're having a great holiday season; best wishes!

-- 
Configure bugmail: http://bugs.winehq.org/userprefs.cgi?tab=email
Do not reply to this email, post in Bugzilla using the
above URL to reply.
------- You are receiving this mail because: -------
You are watching all bug changes.