[PATCH 2/2] wined3d: Pass a wined3d_device pointer to wined3d_from_cs().

Tue Jul 6 12:34:48 CDT 2021

On 7/6/21 6:43 PM, Henri Verbeet wrote:
> On Tue, 6 Jul 2021 at 15:32, Giovanni Mascellani
> <gmascellani at codeweavers.com> wrote:
>> Il 05/07/21 14:28, Henri Verbeet ha scritto:
>>> Well, yes, it doesn't. Are you saying we need those here though?
>>
>> I think so.
>>
>> My understanding is that we are relying on undefined behavior, and I
>> think we shouldn't do that, because it might break randomly at some
>> point after unrelated changes, or with a different compiler; and it
>> could result in a kind of bug that is very painful to debug.
>>
> C before C11 doesn't specify a whole lot about concurrency, so in that
> sense we're already in trouble simply for using threads; the C
> standard isn't going to help us much here.
> 
> As I understand it, we don't need a barrier here because this is an
> (aligned) CPU word sized variable that's only updated with
> InterlockedExchange(). I.e., the combination of InterlockedExchange()
> for stores and volatile for loads should be sufficient for preventing
> torn/fused/invented loads and stores, as well as any reordering we
> care about. I don't particularly mind being wrong about that, but I'd
> need something more concrete than (essentially) "volatile is not a
> full memory barrier".
> 

AFAIK volatile accesses are only compiler ordered wrt other volatile 
accesses, so it's generally not safe to only use a volatile variable as 
a barrier to read some non-volatile data written by another thread. I 
think [1] describes that for instance.

Then, on x86(_64), volatile are also execution order safe, because x86 
has a very strong memory model, where although out of order execution is 
possible, the load and store operation ordering is almost completely 
ordered wrt each other, including from another core perspective.

This is not the case on other architectures, where you always need to 
use atomic operations (through compiler builtin) to get load / store 
order guarantee across threads, or use synchronization primitives.

Then I don't know about this specific case and what exactly is shared, 
but even with a single variable I think it's possible that on some arch 
(not x86) you would have local load cache where even a volatile read 
could return stale value compared to what an atomic read would do.

[1] https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html
-- 
Rémi Bernon <rbernon at codeweavers.com>