D3D performance debugging report

Sun May 1 07:34:53 CDT 2011

Indeed, I've written a spinlock with GCC extension and replaced the 
EnterCriticalSection in the x11 drv file.
Apart that the lock has got to be recursive, so I implemented a quick 
(but incorrect) recursive spinlock for the purpose of running SC2 and 
difference was barely negligible.
The biggest issue imho is that in this case we have to call a 
function... it would be great to inline all that code, but again, 
probably the best thing is to limit the number of calls.
I can try a spinlock for the BKL-like which is wined3d lock. I hope this 
hasn't got to be recursive, right?
I'm asking this because in case of a recursive lock I'm performing an 
extra syscall:

static volatile pid_t    x11_lock = 0;
static volatile int        x11_lock_cnt = 0;

/***********************************************************************
  *        wine_tsx11_lock   (X11DRV.@)
  */
void CDECL wine_tsx11_lock(void)
{
     pid_t        th_id = syscall(SYS_gettid);    // This might be 
expensive!
                                                                   // I 
don't like recursive locks for this reason!
     while (th_id != __sync_val_compare_and_swap(&x11_lock, 0, th_id));
     ++x11_lock_cnt;
     asm volatile("lfence" ::: "memory");
}

/***********************************************************************
  *        wine_tsx11_unlock   (X11DRV.@)
  */
void CDECL wine_tsx11_unlock(void)
{
     if(!--x11_lock_cnt)
         x11_lock=0;
     asm volatile("sfence" ::: "memory");
}

Please keep in mind this is a test code, but apparently it's working.
Again, performance in case of SC2 isn't that much... but probably should 
test better/with other games?

Let me know,
Cheers,

On 01/05/11 09:33, Stefan Dösinger wrote:
> On Saturday 30 April 2011 18:26:04 Emanuele Oriani wrote:
>> Hi Stefan,
>>
>> What do you think about using inline spinlocks (in asm code maybe) to
>> implement locks?
>> Clearly an optimized spinlock would mean different code for different
>> compilers/architectures, but shouldn't it be the best solution?
> I am usually pessimistic about hand-written assembler optimizations. You can
> give it a try, but compilers are pretty clever these days.
>
> I think trying to optimize the lock calls is a more promising way. We can't
> simply drop the ENTER_GL/LEAVE_GL calls, as you found out in SC2. We may be
> able to reduce the number of those calls by moving blocks of opengl calls
> closer together.
>
> There's also the wined3d lock, which is somewhat like the big kernel lock.
> There's room for improvement there as well, if we soften the "you must call
> wined3d under lock" rule. However the wined3d lock is the smaller problem
> compared to the X11 lock.