Fast thread-local storage for OpenGL drivers
Daniel Jacobowitz
drow at false.org
Sat Feb 22 12:06:35 CST 2003
On Sat, Feb 22, 2003 at 09:51:26AM -0800, Gareth Hughes wrote:
> Roland McGrath wrote:
> >
> > These people clearly haven't read all of the TLS paper, or looked at the
> > GCC implementation of __thread long enough to notice -ftls-model and
> > __attribute__ ((tls_model)).
>
> This is what I was talking about. I've read the entire document several
> times, and still can't see a way that a dynamically loadable shared library
> can be guaranteed to use the single-instruction Local Exec access model. If
> I'm wrong, please explain why.
>
> > I think the TLS document intends to explain what the models mean in
> > practical terms on each architecture, but I can believe it's not all
> > that clear. The GCC manual doesn't explain the access models and code
> > sequences, just tells you how to tell the compiler what you want in the
> > terms that the TLS document defines.
> >
> > If you want maximal flexibility, i.e. to always work with dlopen, then
> > indeed you must use the "dynamic" TLS access models (GD or LD). You can
> > use the Initial Exec model if you want faster accesses at the cost of some
> > flexibility.
>
> libGL.so simply has to work with dlopen -- if for no other reason than
> essentially all major 3D games (Quake3, Doom3, UT2003 etc) dlopen libGL.so
> rather than linking with it. This is not going to change.
Note the "always" in Roland's paragraph.
> > In glibc, we actually allocate some excess space in the thread-local
> > storage area layout determined at startup time. This lets a dynamically
> > loaded module use static TLS if its PT_TLS segment fits in the available
> > surplus. (In sysdeps/generic/dl-tls.c, see TLS_STATIC_SURPLUS.) If there
> > is insufficient space preallocated, then loading the module will fail. In
> > fact, we put this feature there with GL in mind and can adjust the
> > preallocated surplus for what is most useful in practice.
>
> I think the set of performance critical thread-local variables is something
> like two or three (depending on the implementation). The libGL.so API
> dispatcher needs fast access to one or two of these (dispatch table
> pointers), while the driver backend needs fast access to all of them
> (context pointer and dispatch table pointers). The other thread-local
> variables are generally not accessed in performance-critical situations.
When you say two or three, are these two or three pointers or two or
three large tables?
In any case, it sounds like you could:
- select the thread-local variables that you need fast access to
- Arrange for those variables to be tagged with an
__attribute__((tls_model("initial-exec"))), or something similar.
- Make sure the TLS_STATIC_SURPLUS is big enough to hold them.
> Another issue I forgot to mention, or forgot to make clear, is that we need
> to be able to access these thread-local variables in runtime generated code.
> A driver's top-level API functions are often generated at runtime, and need
> to be able to do things like switch dispatch tables (obviously, they'd have
> direct access to the context they were associated with, and so wouldn't need
> to go through the pointer in TLS). Are we guaranteed that the __thread
> variables aren't going to move around? How would we work out what code to
> generate to access a given __thread variable?
I don't see a problem, but you'd have to do some serious reading of the
TLS ABI documents.... they're quite thorough.
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
More information about the wine-devel
mailing list