After checking on the progress of Glamor for Intel chipsets, it is time to have a look and see what the state of play is for a Radeon HD5770. This card is now a few years old and is sitting in a Sandybridge i5-2500 desktop.
The baseline I have chosen here is the performance of the Intel DDX using SNA but with acceleration disabled – that is it is completely rendering using the i5-2500 CPU.
In comparison, we find on average that
- NoAccel is 1.8x slower
- fglrx is 9.2x slower
- EXA is 2.9x slower
- Glamor is 2.0x slower
Or to put a positive spin on it, the new Glamor acceleration on this particular r600g device is about 50% faster than the existing EXA radeon driver. If you look closely there are just a couple of traces that EXA performs better than Glamor, with those regression fixed Glamor would be a clear improvement for radeon. And almost as fast as not using Glamor at all! However, Glamor was not able to complete the benchmark run without crashing.
For this particular set of benchmarks based on Cairo traces taken from real applications. If we look at synthetic benchmarks, Glamor is significantly faster in several key metrics than EXA, and fglrx is much faster again. Always take benchmarks with a pinch of salt.
So I have a new toy, an i7-4950hq processsor. This little beast is one of the special Intel chips sporting an Iris Pro 5200, better known as Haswell GT3e. That GPU has 40 execution units and 128MiB of eDRAM to serve as a fourth-level cache for both the CPU and GPU.
Enough spiel, just how fast is it?
For context, here are some results comparing it with my old Sandybridge laptop (with an i5-2520m).
Comparing the processor using the single-threaded cairo-image:
and again comparing the GPUs, using SNA and cairo-xlib:
On the whole, we see a two-fold increase of both single-threaded CPU performance and GPU performance (for 2D graphics using cairo) from the jump from a Sandybridge i5-2520m to a Haswell i7-4950hq. In most cases SNA is being limited by how fast the application can feed it commands and so the performance increase is mostly due to same improvement in CPU speed. (This increase is above and beyond the expected improvements due to IPC, so it is more likely the ability of the Haswell chip to turbo higher and longer thanks to improved thermals and cooling.)
And we can compare the relative merits of using OpenGL and a specialised 2D driver by comparing the various rendering backends available for the DDX. The results are normalized to the cairo-image results, and we have
- none – a multithreaded CPU renderer inside the DDX
- blt – disable the render acceleration, but allow the DDX to use the BLT engine to move data about i.e. copies and fills
- sna – SNA render acceleration, default in xf86-video-intel-3.0
- uxa – UXA render acceleration, current default
- glamor – Glamor render acceleration, uses OpenGL to offload rendering operations onto the GPU
The summary here is that Glamor offers a meagre improvement over UXA. However, both are still much slower on average than cairo-image, i.e. the performance attainable by using a single CPU core. It takes multiple threads inside the DDX to match the performance of cairo-image – this is due to the inherent inefficiencies of the current Render protocol. However, if we then utilize the render acceleration on the GPU (using SNA) we can indeed outperform cairo-image, on average about 2x faster and about 4x faster than UXA and Glamor. Thus SNA does deliver hardware acceleration that succeeds in offloading work onto the GPU (letting the CPU get on with other tasks) and performs faster than rendering everything with the CPU.