[NDR] Implement NdrClientCall2 and NdrServerCall2

David Laight david at l8s.co.uk
Thu Dec 1 16:05:04 CST 2005


On Thu, Dec 01, 2005 at 11:09:29AM +0100, Alexandre Julliard wrote:
> Robert Shearman <rob at codeweavers.com> writes:
> 
> > +    "shrl $2, %ecx\n\t"         /* divide by 4 */
> > +    "rep movsl\n\t"             /* Copy dword blocks */
> > +    "movl %eax, %ecx\n\t"
> > +    "andl $3, %ecx\n\t"         /* modulus 4 */
> > +    "rep movsb\n\t"             /* Copy remainder */
> 
> If the argument size is not a multiple of 4 you are in serious
> trouble...

Not only that, but the code above is not very efficient!
The setup time for 'rep movsx' instruction is significant on many
modern cpus, making the second 'rep movsb' particularly slow.
I'm not even sure what the break-even length for the one is!

Sequence like (give or take assembler syntax):
	mov %eax,(%esi+%ecx-4)
	mov (%edi+%ecx-4),%eax
	shrl $2, %ecx
	rep movsl
should be better given large enough %ecx

	David

-- 
David Laight: david at l8s.co.uk



More information about the wine-devel mailing list