[PATCH v2 1/5] winevulkan: Support prefixing function parameters.

Rémi Bernon rbernon at codeweavers.com
Fri Dec 10 09:13:44 CST 2021


On 12/10/21 15:43, Paul Gofman wrote:
> On 12/10/21 17:36, Rémi Bernon wrote:
>>
>> Well as far as I could see, in all the measurements I've made for a 
>> long while xsavec64 / xrstor64 were the highest hitters, and something 
>> like 10% to 30% CPU time spent on these two instructions.
>>
>> I don't know if that should be part of the "optimized vast majority of 
>> cases", but it's the same with Proton or with current upstream Wine. 
>> In any case it's way worse and nowhere near the overhead we had a year 
>> ago with SSE XMM register spilling from the ABI transitions generated 
>> by the compiler.
> 
> It is strange. Unlike the current Proton, xrstor64 should not appear at 
> all with upstream Wine (unless the app is setting the context for 
> threads constantly). Are you sure that is the case with upstream? In 
> that case maybe there is some genuine bug or easy optimization 
> opportunity involved.
> 
> Then, for xsavec64, it is doing saves in one instruction which otherwise 
> split in many with compiler generated registers saves. Can it be it 
> affects its place in perf top? Overall I don't think perf can show 
> anything meaningful on such a fine measurement level due to sampling 
> measurements specifics and CPU instruction flow specifics. A good and 
> simple way would be to measure the time over a large amount of calls 
> with rdtsc with and without xsavec, in the real game by instrumenting 
> the syscall dispatcher with that measurement (fwiw I have such an 
> instrumentation for relay stubs locally but not for syscall dispatcher 
> atm).
> 

I'm sure perf is very able to tell us useful information. That it uses 
sampling doesn't matter as it's not something that only happens randomly 
and spuriously, but a constant overhead, and if 10% of the samples now 
fall into the __wine_syscall_dispatcher, it's just very likely that it 
just takes 10% more CPU time.

Now, because I'm a bit annoyed that perf usage is always questioned, I 
also did the test:

1) With just this patch series applied, I get ~125fps, ~10% global CPU 
time spent in __wine_syscall_dispatcher_prolog_end (highest perf top 
hitter), of which 76% are assigned directly to the xsavec64 instruction.

2) Commenting out xrstor64, in __wine_syscall_dispatcher, I get roughly 
the same thing, unsurprisingly.

3) Commenting out both xsavec64 and xrstor64, I get ~150fps, ~3% global 
CPU time spent in the dispatcher (still highest hitter but close to 
d3d12_desc_copy_range, which is the main one without this series), and 
of which it is spread quite evenly over all the dispatcher instructions.

Can we put a little bit more trust in the tool now? I'm not saying it's 
always right, and that CPU time wouldn't be spent or wasted by the game 
otherwise, but here, it's clearly telling us that we are adding some 
overhead and where it comes from.
-- 
Rémi Bernon <rbernon at codeweavers.com>



More information about the wine-devel mailing list