[PATCH v2 1/5] winevulkan: Support prefixing function parameters.

Rémi Bernon rbernon at codeweavers.com
Fri Dec 10 08:36:09 CST 2021


On 12/10/21 15:22, Paul Gofman wrote:
> On 12/10/21 17:00, Rémi Bernon wrote:
>>
>>
>> I'm not completely aware of why the full FPU context needs to be saved 
>> and restored for instance, but if it's only for the debugging 
>> experience, could that simply be stripped in such builds?
>>
> If you plainly just strip saving FPU context you will get 
> NtGetContextThread broken without any debugger involved. I believe 
> restoring the FPU context is optimized out already for the (majority of) 
> cases when it is not needed due to setting FPU context for thread. Of 
> course one can hack something around and then enable only for games 
> which really need that, after spending time finding out that they do.
> 
> If we talk about AVX (ymm) registers with xsavec support that is only 
> actually saved if there are nonzero registers YMM registers upon the 
> call. And given those registers are volatile compilers tend to often do 
> vzeroupper before function calls as far as I could see (probably exactly 
> to avoid context saving overhead on the syscalls otherwise present both 
> under Windows and Linux without Wine involved).
> 
> Then, there is a part of non-volatile ms_abi XMM registers which are 
> volatile on sysv_abi and those are allways saved in compiler generated 
> prologue once ms_abi function calls sysv_abi. So going through 
> Wine->Unix gate just changes the place where those are saved and in 
> general a clear PE - unix part separation should be removing a great 
> amount of these saves across function calls.
> 
> The part which stays excessive is volatile XMM register saves, but that 
> is probably relatively minor and might be subject for fine but ugly 
> optimization if we ever to introduce a "lightweight" dispatcher for 
> non-blocking call. But I'd expect this overhead to be less than what we 
> gain over the split for removing extra non-volatile XMM register saves.
> 

Well as far as I could see, in all the measurements I've made for a long 
while xsavec64 / xrstor64 were the highest hitters, and something like 
10% to 30% CPU time spent on these two instructions.

I don't know if that should be part of the "optimized vast majority of 
cases", but it's the same with Proton or with current upstream Wine. In 
any case it's way worse and nowhere near the overhead we had a year ago 
with SSE XMM register spilling from the ABI transitions generated by the 
compiler.
--
Rémi Bernon <rbernon at codeweavers.com>



More information about the wine-devel mailing list