[Wine] Re: Slow performance with many 3D games

Wed Sep 3 03:37:53 CDT 2008

ulberon wrote:
> I didn't realize I was starting a flamewar when I told the OP to bind the process... 
> 
> The reason I believe there is a performance difference (warning, subjective), I think has more to do with the linux scheduler.  If I try to run several programs at once, for whatever reason, it appears the linux scheduler passes the process around like a party favor (I have a quad core machine).  When I bind the process, almost all game stuttering (which is only occasional, same with music stuttering) stops.  I think this is directly related to the cost of the context switch (and flushing the cache), which you reduce by binding the process.
> 
> On an unrelated note: I really wish I understood directx better, it's entirely not my field.  However, in regards to the programs being CPU dependent, I've often wondered about the implementation, like math.c.  For instance, you have the following code:
> 
> 
>  
> Code:
> D3DXPLANE* WINAPI D3DXPlaneTransformArray(
>     D3DXPLANE* out, UINT outstride, CONST D3DXPLANE* in, UINT instride,
>     CONST D3DXMATRIX* matrix, UINT elements)
> {
>     UINT i;
>     TRACE("\n");
>     for (i = 0; i < elements; ++i) {
>         D3DXPlaneTransform(
>             (D3DXPLANE*)((char*)out + outstride * i),
>             (CONST D3DXPLANE*)((const char*)in + instride * i),
>             matrix);
>     }
>     return out;
> }
> 
> 
> 
> Which calls:
> 
> Code:
> D3DXPLANE* WINAPI D3DXPlaneTransform(D3DXPLANE *pout, CONST D3DXPLANE *pplane, CONST D3DXMATRIX *pm)
> {
>     CONST D3DXPLANE plane = *pplane;
>     pout->a = pm->u.m[0][0] * plane.a + pm->u.m[1][0] * plane.b + pm->u.m[2][0] * plane.c + pm->u.m[3][0] * plane.d;
>     pout->b = pm->u.m[0][1] * plane.a + pm->u.m[1][1] * plane.b + pm->u.m[2][1] * plane.c + pm->u.m[3][1] * plane.d;
>     pout->c = pm->u.m[0][2] * plane.a + pm->u.m[1][2] * plane.b + pm->u.m[2][2] * plane.c + pm->u.m[3][2] * plane.d;
>     pout->d = pm->u.m[0][3] * plane.a + pm->u.m[1][3] * plane.b + pm->u.m[2][3] * plane.c + pm->u.m[3][3] * plane.d;
>     return pout;
> }
> 
> 
> 
> There must be a way to do this in parallel on the GPU instead of on the CPU.  I have no expertise in this area, and not even going to pretend I know what I'm talking about.  I'm just curious if many of these maths functions could be parallelized on the GPU (even older GPU's), does directx really do it this way?


you didn't start a flamewar, some people just said some things that were incorrect

I don't know much about directx either, but I'd use oprofile to see how much time is actually being spent in that function