Fwd: Re: Fix catalyst brain damage to speed up Falcon BMS 2x

Stanisław Halik sthalik at misaki.pl
Sat Feb 16 10:44:23 CST 2013


Ben allowed me to forward this email.

Kudos to him for all the knowledge his brainbox contains!

-------- Original Message --------
Subject: Re: Fix catalyst brain damage to speed up Falcon BMS 2x
Date: Sat, 16 Feb 2013 11:20:35 -0500
From: Ben Supnik <bsupnik at xsquawkbox.net>
To: Stanislaw Halik <sthalik at misaki.pl>

Hi Guys,

I'm afraid I don't know enough about the _specific_ situation you guys
are seeing.  I can tell you guys a few things from my GL work:

1. The ATI OpenGL Linux team is pretty accessible; do you guys have
anyone in the fglrx beta program?

2. What we found was that for stream-draw buffers that need to be
orphaned, mapped, unmapped and drawn, there was a fixed overhead in the
ATI drivers compared to NV; this 'performance gap' is cross-platform -
both NV and ATI use the same GL stack (more or less) for Windows and
Linux, and we saw the slow-down on both.

3. We originally were using map buffer (not map buffer range) with a
NULL glBufferData to "orphan" the buffer (the equivalent of d3d
map-discard).  I think I tried MBR and it didn't fix it - both were
expensive because the fundamental memory mapping operation was slow.

4. The slowness was in milliseconds, e.g. "this hits our fps by 20% or
30%" - but it wasn't "this is 3x slower because it stalled the GPU."  So
if you're seeing truly face-meltingly bad performance, like a total
pipeline stall, you have a different bug.

5. As a general statement, the original glMapBuffer is subject to a lot
of heuristic behavior in the drivers; app developers are very fast and
loose with how they use it, so the driver vendors tend to try to make it
do the fastest, most useful, least crash-y thing because the apps use it
like monkeys on type-writers.  By comparison, MBR came out later and has
much more specific semantics for particular optimizations, as a result,
the MBR implementation will often do exactly what you say, _even_ if
it's slower.  Getting even one flag wrong in MBR can cause it to hit a
face-meltingly slow path.

We worked around the perf cost of mapping a buffer on ATI hw by using
pinned memory (but we do have a Linux-only bug where we get corrupt
geometry with pinned memory - it works on Windows); I have some todo
items to investigate the problem more thoroughly now.

Cheers
Ben

On 2/16/13 5:47 AM, Stanislaw Halik wrote:
> On 2013-02-16 09:04, Stefan Dösinger wrote:
>> What you really want to do is figure out why GL_ARB_map_buffer_range
>> is slow on fglrx, and make sure that the problem is really fglrx
>> specific. I fixed a number of dynamic buffer performance problems in
>> the past months, but there are still problems if we're falling back
>> to draw_strided_slow for some reason, like fixed function material
>> tracking.
>
> Thanks for reviewing this.
>
> Going to ask Ben Supnik from Laminar Research (X-Plane developer) and
> BCC him, since he has apparently run into the same issue. There's much
> info of fglrx woes (not really Linux specific, either) on
> http://developer.x-plane.com/
>
> He said publicly to be in contact with AMD themselves, and been friendly
> to OSS by releasing an X-Plane Linux version, as well as overall cool
> fellow.
>
> Ben, Please help!
>
>> Other than being wrong conceptually, you're disabling dynamic buffers
>> the wrong way: The "proper" way would be to add a quirk to the
>> quirk_table in directx.c that removes ARB_map_buffer_range from the
>> list of supported extensions if the driver vendor is AMD.
>
> Like this? Patch attached.
>
> I've run into hard GPU hangs with fglrx 13.2, no VT switch either. This
> helps:
>
> [Software\\Wine\\Direct3D]
> "DirectDrawRenderer"="gdi"
> "Multisampling"="disabled"
> "OffscreenRenderingMode"="fbo"
> "UseGLSL"="enabled"
>
> Lack of GLSL disables HDR apparently.
>
> Without GDI, there's some nasty display corruption on FBOs.
>
> Also Catalyst likes to hang display when switching from 3D to 2D and VT
> switch helps.
>
> But with all this busywork, performance is near-native. Catalyst at
> least supports indirect addressing (whatever that means) and doesn't
> choke on > 128 temps... FYI Mesa bug submitted:
>
> https://bugs.freedesktop.org/show_bug.cgi?id=55420
>
> -sh
>

-- 
Scenery Home Page: http://scenery.x-plane.com/
Scenery blog: http://www.x-plane.com/blog/
Plugin SDK: http://www.xsquawkbox.net/xpsdk/
X-Plane Wiki: http://wiki.x-plane.com/
Scenery mailing list: x-plane-scenery at yahoogroups.com
Developer mailing list: x-plane-dev at yahoogroups.com





More information about the wine-devel mailing list