Skip navigation

So I have a new toy, an i7-4950hq processsor. This little beast is one of the special Intel chips sporting an Iris Pro 5200, better known as Haswell GT3e. That GPU has 40 execution units and 128MiB of eDRAM to serve as a fourth-level cache for both the CPU and GPU.

Enough spiel, just how fast is it?

For context, here are some results comparing it with my old Sandybridge laptop (with an i5-2520m).

Comparing the processor using the single-threaded cairo-image:

Comparison of i7-4950hq to i5-2520m

and again comparing the GPUs, using SNA and cairo-xlib:

Comparison of i7-4950hq to i5-2520m

On the whole, we see a two-fold increase of both single-threaded CPU performance and GPU performance (for 2D graphics using cairo) from the jump from a Sandybridge i5-2520m to a Haswell i7-4950hq. In most cases SNA is being limited by how fast the application can feed it commands and so the performance increase is mostly due to same improvement in CPU speed. (This increase is above and beyond the expected improvements due to IPC, so it is more likely the ability of the Haswell chip to turbo higher and longer thanks to improved thermals and cooling.)

And we can compare the relative merits of using OpenGL and a specialised 2D driver by comparing the various rendering backends available for the DDX. The results are normalized to the cairo-image results, and we have

  • none – a multithreaded CPU renderer inside the DDX
  • blt – disable the render acceleration, but allow the DDX to use the BLT engine to move data about i.e. copies and fills
  • sna – SNA render acceleration, default in xf86-video-intel-3.0
  • uxa – UXA render acceleration, current default
  • glamor – Glamor render acceleration, uses OpenGL to offload rendering operations onto the GPU

Comparison of DDX backends on an i5-2520m

Comparison of DDX backends on an i7-4950q

The summary here is that Glamor offers a meagre improvement over UXA. However, both are still much slower on average than cairo-image, i.e. the performance attainable by using a single CPU core. It takes multiple threads inside the DDX to match the performance of cairo-image – this is due to the inherent inefficiencies of the current Render protocol. However, if we then utilize the render acceleration on the GPU (using SNA) we can indeed outperform cairo-image, on average about 2x faster and about 4x faster than UXA and Glamor. Thus SNA does deliver hardware acceleration that succeeds in offloading work onto the GPU (letting the CPU get on with other tasks) and performs faster than rendering everything with the CPU.

About these ads


  1. Is your i7-4950HQ from a shipping system (and if so, where can I buy such a beast), or is it an Intel dev system that’s not available for purchase?

    • It is an SDP that I had to beg for and promise favours in return. I am sorely tempted by the low-end iMac just to get the i5-4550R… But I think I can wait long enough to see what becomes available in the next couple of months.

  2. glamor is unfortunately hardly maintained, and not much work was put into optimizations. I just skimmed the code a little and already found some low hanging fruit and fixing those improved performance noticeably. And there are a couple more things that can be done which aren’t too hard.

    • We look forward to your patches.

        • Aaron W
        • Posted November 4, 2013 at 6:23 pm
        • Permalink

        Now that Grigori’s patches have been committed to upstream glamor, would you be able to refresh these benchmarks (I’m more interested in the effect on radeon, but any hardware works). Alternatively, if you could disclose/document your methodology I can try to recreate these benchmarks myself.

        • ickle
        • Posted November 4, 2013 at 9:56 pm
        • Permalink

        Sure, I’ll look into running the benchmarks again. Iirc the patches had very little impact for my Intel systems. The tests I do is simply cairo-perf-trace (from using the benchmarks in, with the cairo-xlib backend obviously.

  3. Your post is excellent as usual and very educating. Thank you !!!
    Can you tell us whats coming for xorg intel down the road except supoporting new GPU’s ?
    Is SNA considered complete at this point ? And what about Wayland? What will be the preferred back end for rendering with cairo there ?

    Thanks in advance..

    • Supporting new (and old) GPUs keeps me occupied for most of my time. SNA isn’t quite finished – optimisation is a never ending battle, but I have a few features planned to try and make DRI more efficient (to support pageflipping on a subset of connected monitors). The biggest crux I see at the moment is that Cairo does not meet the requirements of its users (i.e. there are gaps in our support of , SVG and PDF features) and that Render is inadequate for accelerating Cairo. In the post-X world, Cairo will need its own dispatch mechanism that would look very similar to a Render+. It’s too early to call where Cairo will end up – to go fast on a GPU, you have to design your entire interface around the limitations of the GPU – which is the opposite of Cairo’s hardware agnostic minimal interface. The tradeoff may simply be that Cairo remains best on the CPU with the “accelerated compositing” being done on the GPU. That still requires a lot of work in Mesa to make data transfer efficient, if we were to use Mesa for that task. Clear as mud!

  4. We want more of your benchmarks but against Keith’s glamour branch – !

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

%d bloggers like this: