"optimized" assembly functions in wine
Rein Klazes
rklazes at xs4all.nl
Tue Sep 21 07:57:39 CDT 2004
Hi,
Just did not feel like chasing bugs the other day. I decided to have
some fun with something that I wondering for a long time: the usefulness
of inline i86 assembly in string functions.
This is the test program as.c:
---------------------------------8<-------------------------------------
#include <malloc.h>
typedef unsigned short WCHAR, *PWCHAR;
static inline WCHAR *strcpyW( WCHAR *dst, const WCHAR *src )
{
#ifdef ASM
int dummy1, dummy2, dummy3;
__asm__ __volatile__( "cld\n"
"1:\tlodsw\n\t"
"stosw\n\t"
"testw %%ax,%%ax\n\t"
"jne 1b"
: "=&S" (dummy1), "=&D" (dummy2), "=&a"
(dummy3)
: "0" (src), "1" (dst)
: "memory" );
#else
WCHAR *p = dst;
while ((*p++ = *src++));
#endif
return dst;
}
#define SZ 3000
main()
{
int i;
PWCHAR s,d;
s=malloc(SZ*sizeof(WCHAR));
d=malloc(SZ*sizeof(WCHAR));
memset(s,'x',SZ);
s[SZ-1]=0;
for(i=0;i<1000000;i++)
strcpyW(d,s);
}
---------------------------------8<-------------------------------------
The function strcpyW is a copy from Wine with the #ifdef modified.
I used the following commands
gcc-3.3 -O2 as.c -o as -DASM ; time ./as;time ./as; time ./as
and
gcc-3.3 -O2 as.c -o as ; time ./as;time ./as; time ./as
The resulting times are (all user time):
test# asm C
-----------------------
1 15.970 15.899
2 15.966 15.943
3 15.959 15.941
------ ------
ave 15.964 15.928
Notes:
- tested on a PII 450 MHz;
- I tested with gcc 2.95 and 3.4.2 as well, result are essentially the
same.
- size of main() is 0x7a (assembly) vs 0x82 (C-code) bytes;
- I experimented with longer strings to see if there was any mem cache
hit/miss effects and found none.
Conclusions:
1. these routines are so fast that it is hard to imagine that these
functions will be a bottleneck, justifying such optimization;
2. nothing shows here that inline assembly brings any advantage.
Rein.
--
Rein Klazes
rklazes at xs4all.nl
More information about the wine-devel
mailing list