Hi,
I had a problem with Majesty Gold HD running at glacial speed on my machine (quad Core i5 2.9GHz, NVidia GeForce GT 750M 1GB, OS X 10.9.5). I'm using wine 1.7.35, but I've had similar problems with older wine versions. It's a DirectDraw7 game that has been partially remastered to run on modern systems, but still uses DirectDraw7.
This is a common problem on Windows too, but generally can be resolved by telling the game to use the DirectDraw blitter (instead of blitting itself). That didn't help on Wine though.
I profiled wine (Instruments time profile) and noticed that most time was spent in wined3d's convert_r5g6b5_x8r8g8b8. Replacing that routine with an optimized sse2 version from pixman did not make much of a difference.
Then I discovered that if I told the game to render to a 16 bit instead of to a 32 bit surface, wine would let OpenGL handle the colour conversion. Unfortunately, while this significantly reduced wine's cpu usage, it by no means made the game any faster. Looking at the OpenGL Driver Monitor stats, it seems the cpu was simply waiting all the time on the GPU, probably while it was blitting all of those 1920x1080 images to the screen. Reducing the resolution to the lowest supported by the game (800x600) made it slightly faster, but not much.
Next, I added the DirectDrawRenderer registry key and set it to gdi. Now, while it's still slow at 1920x1080, the game runs much faster at 800x600 and even still at 1024x768. Profiling the gdi renderer shows that it has way higher cpu usage than the OpenGL renderer (virtually all in convert_to_8888), but it seems that for some reason it has causes much less traffic to the GPU.
The wiki's wording suggests the gdi renderer is deprecated though. Does that mean this qualifies as a bug in the OpenGL renderer (maybe it tries to update the screen more often than necessary, saturating the bus that way?), and are there any things I can try to narrow down why the gdi renderer is so much faster? I didn't immediately see where it blits things actually to the screen.
Thanks,
Jonas
On Feb 19, 2015, at 2:12 PM, Jonas Maebe [email protected] wrote:
I had a problem with Majesty Gold HD running at glacial speed on my machine (quad Core i5 2.9GHz, NVidia GeForce GT 750M 1GB, OS X 10.9.5).
… are there any things I can try to narrow down why the gdi renderer is so much faster? I didn't immediately see where it blits things actually to the screen.
GDI32 rendering goes to an in-main-memory window surface (backbuffer). USER32 tells the Mac driver to flush that window surface to screen at certain points, but the Mac driver implements that asynchronously. Basically, the flush operation queues a request to the Cocoa main thread to call [window.contentView setNeedsDisplayInRect:theDirtyRect]. The Wine thread is allowed to go on as soon as that request is queued. It doesn't wait for the view to be actually marked as needing display, let alone for Cocoa to get around to actually displaying it.
When the Cocoa main thread does go to display the view, it locks the window surface and creates CGImages from its data and draws those to the window. Then it unlocks the surface. But those draw operations are not flushed immediately to screen while that lock is held. So, the surface lock should not be held while Cocoa waits for the GPU or the Window Server refresh cycle.
Basically, GDI32 and USER32 can overdrive the drawing loop without having to wait for the GPU or the refresh cycle.
See if the game has a setting for disabling vsync. If it doesn't, try setting:
[HKEY_CURRENT_USER\Software\Wine\Mac Driver] "AllowVerticalSync"="n"
Does that help?
Others would understand better why the OpenGL renderer is slow. Does your system support 16-bit display modes? Is the game/wined3d actually switching it into one of those?
-Ken
On 19/02/15 21:45, Ken Thomases wrote: [snip]
Basically, GDI32 and USER32 can overdrive the drawing loop without having to wait for the GPU or the refresh cycle.
Great, thanks for the explanation! That's indeed exactly what this game seems to need. It generates and tries to display a new screen per game step, but there's no need to display them all.
See if the game has a setting for disabling vsync. If it doesn't, try setting:
[HKEY_CURRENT_USER\Software\Wine\Mac Driver] "AllowVerticalSync"="n"
Does that help?
The game has an option to disable it, but just to make sure I also added the registry key. It didn't make a (noticeable) difference.
Others would understand better why the OpenGL renderer is slow. Does your system support 16-bit display modes?
I don't know:
trace:wgl:init_pixel_formats renderer_properties 0: trace:wgl:dump_renderer Renderer ID: 0x00022700 trace:wgl:dump_renderer Buffer modes: trace:wgl:dump_renderer Monoscopic: YES trace:wgl:dump_renderer Stereoscopic: NO trace:wgl:dump_renderer Single buffer: YES trace:wgl:dump_renderer Double buffer: YES trace:wgl:dump_renderer Color buffer modes: trace:wgl:dump_renderer Color size 15, Alpha size 0 trace:wgl:dump_renderer Color size 32, Alpha size 8 trace:wgl:dump_renderer Color size 64, Alpha size 16, Float trace:wgl:dump_renderer Color size 128, Alpha size 32, Float trace:wgl:dump_renderer Accumulation buffer sizes: { 128, } trace:wgl:dump_renderer Depth buffer sizes: { 0, 16, 24, } trace:wgl:dump_renderer Stencil buffer sizes: { 0, 8, } trace:wgl:dump_renderer Max. Auxiliary Buffers: 2 trace:wgl:dump_renderer Max. Sample Buffers: 1 trace:wgl:dump_renderer Max. Samples: 8 trace:wgl:dump_renderer Offscreen: NO trace:wgl:dump_renderer Accelerated: YES trace:wgl:dump_renderer Backing store: YES trace:wgl:dump_renderer Window: YES trace:wgl:dump_renderer Online: YES trace:d3d:wined3d_adapter_init_fb_cfgs iPixelFormat=1,
There are also several enumerated pixel formats with 16 as "color depth" (but still 32 as "color bits", so I'm not sure which one counts): trace:wgl:enum_renderer_pixel_formats w/p/a 1/1/1 col 32/8 dp/stn/ac/ax/b/db/str 16/0/0/0/0/0/0 samp 0/0 0000000000000804f remapped from w/p/a 1/0/1 col 15/0 dp/stn/ac/ax/b/db/str 16/0/0/0/0/0/0 samp 0/0 0000000000000801d trace:wgl:enum_renderer_pixel_formats w/p/a 1/1/1 col 32/8 dp/stn/ac/ax/b/db/str 16/0/0/0/0/0/0 samp 1/2 0000000140000804f remapped from w/p/a 1/0/1 col 15/0 dp/stn/ac/ax/b/db/str 16/0/0/0/0/0/0 samp 1/2 0000000140000801d trace:wgl:enum_renderer_pixel_formats w/p/a 1/1/1 col 32/8 dp/stn/ac/ax/b/db/str 16/0/0/0/0/0/0 samp 1/4 0000000240000804f remapped from w/p/a 1/0/1 col 15/0 dp/stn/ac/ax/b/db/str 16/0/0/0/0/0/0 samp 1/4 0000000240000801d trace:wgl:enum_renderer_pixel_formats w/p/a 1/1/1 col 32/8 dp/stn/ac/ax/b/db/str 16/0/0/0/0/0/0 samp 1/8 0000000440000804f remapped from w/p/a 1/0/1 col 15/0 dp/stn/ac/ax/b/db/str 16/0/0/0/0/0/0 samp 1/8 0000000440000801d trace:wgl:enum_renderer_pixel_formats w/p/a 1/1/1 col 32/8 dp/stn/ac/ax/b/db/str 16/0/0/0/0/0/0 samp 0/0 0000000000000804f remapped from w/p/a 1/1/1 col 15/0 dp/stn/ac/ax/b/db/str 16/0/0/0/0/0/0 samp 0/0 0000000000000801f (duplicate)
All of the "col" values are either 32, 64f or 128f.
Is the game/wined3d actually switching it into one of those?
I'm not sure:
trace:d3d:init_format_fbo_compat_info Checking if format WINED3DFMT_B5G6R5_UNORM is supported as FBO color attachment... trace:d3d:check_fbo_compat Framebuffer format check call ok utils.c / 1344 trace:d3d:check_fbo_compat Format WINED3DFMT_B5G6R5_UNORM is supported as FBO color attachment. trace:d3d:check_fbo_compat RB attachment call ok utils.c / 1420 trace:d3d:check_fbo_compat Post-pixelshader blending check call ok utils.c / 1468 trace:d3d:check_fbo_compat Format supports post-pixelshader blending. trace:d3d:check_fbo_compat Color output: 0xff7b0000 trace:d3d:check_fbo_compat RB cleanup call ok utils.c / 1500
...
trace:d3d:wined3d_set_adapter_display_mode wined3d 0x11fe50, adapter_idx 0, mode 0x33fcfc. trace:d3d:wined3d_set_adapter_display_mode mode 1024x768@0 WINED3DFMT_B5G6R5_UNORM 0. ... trace:d3d:wined3d_get_adapter_display_mode wined3d 0x11fe50, adapter_idx 0, display_mode 0x33fc08, rotation 0x0. warn:d3d:wined3d_get_adapter_display_mode Overriding format WINED3DFMT_B8G8R8X8_UNORM with stored format WINED3DFMT_B5G6R5_UNORM. trace:d3d:wined3d_get_adapter_display_mode Returning 1024x768@60 WINED3DFMT_B5G6R5_UNORM 0x1. ... trace:d3d:swapchain_update_render_to_fbo Single buffered rendering. trace:d3d:context_create swapchain 0x161238, target 0x1618f0, window 0x3003a. trace:d3d:context_choose_pixel_format device 0x131748, dc 0x90033, color_format WINED3DFMT_B5G6R5_UNORM, ds_format WINED3DFMT_D24_UNORM_S8_UINT, aux_buffers 0, find_compatible 0. trace:d3d:getColorBits format WINED3DFMT_B5G6R5_UNORM. trace:d3d:getColorBits Returning red: 5, green: 6, blue: 5, alpha: 0, total: 16 for format WINED3DFMT_B5G6R5_UNORM. trace:d3d:getDepthStencilBits format WINED3DFMT_D24_UNORM_S8_UINT. trace:d3d:getDepthStencilBits Returning depthSize: 24 and stencilSize: 8 for format WINED3DFMT_D24_UNORM_S8_UINT. trace:d3d:context_choose_pixel_format Found iPixelFormat=69 for ColorFormat=WINED3DFMT_B5G6R5_UNORM, DepthStencilFormat=WINED3DFMT_D24_UNORM_S8_UINT trace:d3d:context_enter Entering context 0x141f68, level 1. trace:d3d:device_context_add Adding context 0x141f68. trace:d3d:context_set_current Switching to D3D context 0x141f68, GL context 0x31001, device context 0x90033.
Jonas
On Feb 20, 2015, at 4:16 PM, Jonas Maebe [email protected] wrote:
On 19/02/15 21:45, Ken Thomases wrote: [snip]
Basically, GDI32 and USER32 can overdrive the drawing loop without having to wait for the GPU or the refresh cycle.
Great, thanks for the explanation! That's indeed exactly what this game seems to need. It generates and tries to display a new screen per game step, but there's no need to display them all.
If that's the case, then maybe there are unnecessary calls to glFlush() or even glFinish() that can/should be skipped.
If Stefan has the inclination and the opportunity, I'd wait for him to get a chance to analyze its behavior.
Thanks for the log snippets in response to my questions about display modes, but I no longer think that's relevant.
-Ken
On Feb 20, 2015, at 5:49 PM, Ken Thomases [email protected] wrote:
On Feb 20, 2015, at 4:16 PM, Jonas Maebe [email protected] wrote:
On 19/02/15 21:45, Ken Thomases wrote: [snip]
Basically, GDI32 and USER32 can overdrive the drawing loop without having to wait for the GPU or the refresh cycle.
Great, thanks for the explanation! That's indeed exactly what this game seems to need. It generates and tries to display a new screen per game step, but there's no need to display them all.
If that's the case, then maybe there are unnecessary calls to glFlush() or even glFinish() that can/should be skipped.
There is one more thing you could try along these lines, but it's a long shot. Try setting:
[HKEY_CURRENT_USER\Software\Wine\Mac Driver] "SkipSingleBufferFlushes"="y"
-Ken
It could very well be the case that this happens because we don't properly implement ddraw asynchronous blits. The gdi renderer might mitigate that with the behavior Ken described. If that's the case, the csmt patchset would in theory allow this to be fixed, although I suspect it currently doesn't.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi,
Am 2015-02-19 um 21:12 schrieb Jonas Maebe:
I profiled wine (Instruments time profile) and noticed that most time was spent in wined3d's convert_r5g6b5_x8r8g8b8. Replacing that routine with an optimized sse2 version from pixman did not make much of a difference.
Is there a demo version of this game somewhere? I am reworking the d3d blitting code at the moment, and it seems like this would be an interesting game to look at.
If the game does video memory r5g6b5 to video memory x8r8g8b8 blits then we should be able to do this on the GPU. It's quite possible however that this is either a system memory to system memory blit, in which case doing this in the CPU is the correct thing. If the game uploads from a system memory r5g6b5 to a video memory x8r8g8b8 texture then we can in theory let OpenGL do the work via glTexSubImage2D, but it'll mostly mean that OpenGL converts the data using the CPU before sending it to the GPU.
This may be a game bug - you say the game has troubles on Windows too. It may also be a bug in our modesetting code, in the sense that the game sets the display format to r5g6b5, but we stay at x8r8g8b8 because X11 (and I think OSX, Ken correct me please) can't switch the color depth. Ideally OpenGL takes care of the resulting conversion, but that's not always the case.
Cheers, Stefan
On 20/02/15 10:19, Stefan Dösinger wrote:
Am 2015-02-19 um 21:12 schrieb Jonas Maebe:
I profiled wine (Instruments time profile) and noticed that most time was spent in wined3d's convert_r5g6b5_x8r8g8b8. Replacing that routine with an optimized sse2 version from pixman did not make much of a difference.
Is there a demo version of this game somewhere? I am reworking the d3d blitting code at the moment, and it seems like this would be an interesting game to look at.
There's a demo of the non-HD version of the game available at http://www.cyberlore.com/Majesty/demo.htm . At first sight it exhibits the same symptoms (OpenGL driver monitor shows a lot of cpu waiting time on the GPU). It's hard to verify though, because the demo doesn't allow you to change the in-game speed, unlike the full game (so I can't test whether it would go faster if I'd yank up the game speed).
I have actually 2 copies of Majesty Gold HD (one on Gamersgate and one on GOG), but since I've downloaded it already from both I don't think I can gift you one. It's currently also on sale at GOG till Tuesday, so if Codeweavers has $4,99/€4,49 to spare... :) (http://www.gog.com/game/majesty_gold_hd )
If the game does video memory r5g6b5 to video memory x8r8g8b8 blits then we should be able to do this on the GPU. It's quite possible however that this is either a system memory to system memory blit, in which case doing this in the CPU is the correct thing. If the game uploads from a system memory r5g6b5 to a video memory x8r8g8b8 texture then we can in theory let OpenGL do the work via glTexSubImage2D, but it'll mostly mean that OpenGL converts the data using the CPU before sending it to the GPU.
This may be a game bug - you say the game has troubles on Windows too.
Yes. The most common workarounds for slowness under Windows appear to be a) tell the game to use a 32 bit rather than a 16 bit mode (doesn't help in my case for Wine). Note that while the demo contains a gameopt.ini file in which you can change the depth to 32, it will just crash it. Changing this works fine in the full Majesty Gold HD version b) tell the game to use DirectDraw for bliting rather than its internal routines (is already the default) (-useddblit command line parameter -- don't know whether it does anything for the demo)
It may also be a bug in our modesetting code, in the sense that the game sets the display format to r5g6b5, but we stay at x8r8g8b8 because X11 (and I think OSX, Ken correct me please) can't switch the color depth. Ideally OpenGL takes care of the resulting conversion, but that's not always the case.
If you need any more/specific trace output other than what I pasted in my reply to Ken, let me know!
Thanks,
Jonas
On Feb 19, 2015, at 2:12 PM, Jonas Maebe [email protected] wrote:
I had a problem with Majesty Gold HD running at glacial speed on my machine (quad Core i5 2.9GHz, NVidia GeForce GT 750M 1GB, OS X 10.9.5).
The wiki's wording suggests the gdi renderer is deprecated though. Does that mean this qualifies as a bug in the OpenGL renderer …?
For what it's worth, I think this does qualify for a bug report. It will be easier to keep track of there.
-Ken
On 21/02/15 00:46, Ken Thomases wrote:
On Feb 19, 2015, at 2:12 PM, Jonas Maebe [email protected] wrote:
I had a problem with Majesty Gold HD running at glacial speed on my machine (quad Core i5 2.9GHz, NVidia GeForce GT 750M 1GB, OS X 10.9.5).
The wiki's wording suggests the gdi renderer is deprecated though. Does that mean this qualifies as a bug in the OpenGL renderer …?
For what it's worth, I think this does qualify for a bug report. It will be easier to keep track of there.
I finally got around to that, after figuring out how to solve it (using a hack: limiting the glFlush calls to 60Hz): https://bugs.winehq.org/show_bug.cgi?id=39421
BTW: how come there is no ddraw or directdraw or similar component on bugzilla? Is there another component I can use?
Jonas
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Am 2015-10-11 um 19:22 schrieb Jonas Maebe:
BTW: how come there is no ddraw or directdraw or similar component on bugzilla? Is there another component I can use?
d3d