[PATCH 2/2] wined3d: Pass a wined3d_device pointer to wined3d_from_cs().

Tue Jul 6 14:24:27 CDT 2021

On 7/6/21 12:34 PM, Rémi Bernon wrote:
> On 7/6/21 6:43 PM, Henri Verbeet wrote:
>> On Tue, 6 Jul 2021 at 15:32, Giovanni Mascellani
>> <gmascellani at codeweavers.com> wrote:
>>> Il 05/07/21 14:28, Henri Verbeet ha scritto:
>>>> Well, yes, it doesn't. Are you saying we need those here though?
>>>
>>> I think so.
>>>
>>> My understanding is that we are relying on undefined behavior, and I
>>> think we shouldn't do that, because it might break randomly at some
>>> point after unrelated changes, or with a different compiler; and it
>>> could result in a kind of bug that is very painful to debug.
>>>
>> C before C11 doesn't specify a whole lot about concurrency, so in that
>> sense we're already in trouble simply for using threads; the C
>> standard isn't going to help us much here.
>>
>> As I understand it, we don't need a barrier here because this is an
>> (aligned) CPU word sized variable that's only updated with
>> InterlockedExchange(). I.e., the combination of InterlockedExchange()
>> for stores and volatile for loads should be sufficient for preventing
>> torn/fused/invented loads and stores, as well as any reordering we
>> care about. I don't particularly mind being wrong about that, but I'd
>> need something more concrete than (essentially) "volatile is not a
>> full memory barrier".
>>
> 
> AFAIK volatile accesses are only compiler ordered wrt other volatile
> accesses, so it's generally not safe to only use a volatile variable as
> a barrier to read some non-volatile data written by another thread. I
> think [1] describes that for instance.
> 
> Then, on x86(_64), volatile are also execution order safe, because x86
> has a very strong memory model, where although out of order execution is
> possible, the load and store operation ordering is almost completely
> ordered wrt each other, including from another core perspective.
> 
> This is not the case on other architectures, where you always need to
> use atomic operations (through compiler builtin) to get load / store
> order guarantee across threads, or use synchronization primitives.
> 
> 
> Then I don't know about this specific case and what exactly is shared,
> but even with a single variable I think it's possible that on some arch
> (not x86) you would have local load cache where even a volatile read
> could return stale value compared to what an atomic read would do.

To be fair, I think the real benefit of a strong memory model is that 
even if the current code is safe, you don't have to convince yourself 
it's safe—e.g. you don't have to reason about what ordering problems 
there could be, and ask yourself if it's safe on any of the given 
architectures we support.

Not that win32 actually has a strong memory model, or that we can depend 
on the C11 memory model being available, but we could always pull a 
Linux and roll our own. Or use Interlocked functions even for atomic 
read/write.

Just food for thought.