A few years ago, memory management for the graphics card was performed as single static allocation in the X server which was then carved up into surfaces and used for the scanout and important pixmaps such as renderbuffers for its DRI clients. The upside of this simple scheme meant that all locations were known, allocation was very fast and we could always tell the GPU where its surfaces were. The downside was that the amount of video memory was therefore predetermined and could not be resized, not even if you added a second monitor and needed a new framebuffer, or if you were running a game and it wanted lots of textures. So X was relieved of its role in memory management and the task given to the kernel under the guise Graphics Execution Memory. Now userspace has no idea where its surfaces are and so needs to ask the kernel to patch up its command buffers to insert the correct addresses. This relocation of command buffers is a bottleneck in the new design, and from the outset people were complaining about the performance loss going from XAA to GEM/UXA.
A few years have passed and we’ve been gradually tuning all the parties, but the question remains have we managed to recover that speed which we threw away so long ago?
I compiled Xorg-1.5 and xf86-video-intel-2.6 for my 965gm (a ThinkPad t61) by discarding anything that no longer compiled and was left with a very light shell for investigating XAA in its heyday. I then ran x11perf under the ancient XAA and EXA, and the modern UXA and SNA to see how things had changed.
By looking at the geometric mean of the all the x11perf tests we can get a rough feel for the overall performance change against XAA:
EXA 1.117 (12% faster than XAA)
UXA -1.108 (11% slower than XAA)
SNA 2.284 (128% faster than XAA)
As with all averages of micro-benchmarks take this with an extremely large pinch of salt.