Fast thread-local storage for OpenGL drivers

Sat Feb 22 12:32:05 CST 2003

Daniel Jacobowitz wrote:
> 
> Note the "always" in Roland's paragraph.

Note the fact that he said it would require one of the dynamic access models
(GD or LD), which require at least one function call to access thread local
variables.  As I've said, this is an unacceptable hit on performance.

> When you say two or three, are these two or three pointers or two or
> three large tables?

Two or three pointers.  I'm pretty sure we use less than 8 pointers all up,
although many of those aren't performance critical.  Three of ours most
definitely are, and it would be nice if moving to a couple more didn't break
things.  We only ever use thread-local pointers, never whole structs or
anything like that.

> In any case, it sounds like you could:
>  - select the thread-local variables that you need fast access to
>  - Arrange for those variables to be tagged with an
>    __attribute__((tls_model("initial-exec"))), or something similar.
>  - Make sure the TLS_STATIC_SURPLUS is big enough to hold them.

Will this be okay, considering that two shared libraries will need access to
the variables (libGL.so itself and the driver backend)?  Can you use IE or
LE with variables that live in another shared library?

> I don't see a problem, but you'd have to do some serious reading of the
> TLS ABI documents.... they're quite thorough.

Sure, the code itself isn't hard to understand.  The problem is, at runtime,
how do I know what code to generate to access a given __thread variable?  Do
I have do disassemble a function that accesses the variable to know the
right model to use?  Fixed offsets make this trivial, but maybe this isn't a
real problem after all.

--
Gareth Hughes (gareth at nvidia.com)
OpenGL Developer, NVIDIA Corporation