Skip navigation

Having looked at the impact of the move from XAA to UMS/EXA and then to KMS/UXA on performance of the core drawing operations, we can turn to look at the impact upon RENDER acceleration. One of the arguments for the reason behind dropping XAA support and writing a new architecture was to address the needs of accelerating the new advanced rendering protocol, RENDER. Did the claim really live up to reality, did the switch to EXA actually help?

Comparison of acceleration architectures on 965gm

In short, the switch to EXA was catastrophic in the beginning. The new acceleration architecture regressed performance in both the core rendering operations and failed to deliver the claim of improving RENDER acceleration. Fast forward a few years and through improvements to the RENDER protocol and many improvements to the driver and we reach UXA, where we start to actually see some benefit from enabling GPU acceleration. But it is not until we look at SNA that we reach a level of maturity and consistency in the driver to have all round performance that is finally at least as good as XAA again (effectively software rendering in these benchmarks).

An outstanding question that is regularly asked is what are design differences between EXA, UXA and SNA?

UXA was originally EXA with the pixmap migration removed. It was argued that given a unified memory architecture such as found on IGP, there was no need to migrate pixmaps between the various GPU memory domains. Instead all pixmap allocations were to be made in the single GPU domain and if necessary the pixmap would be mapped and read by the CPU through the GTT (effectively an uncached readback, very very slow).

In hindsight, that decision was flawed. As it turns out, not only we have both mappable and unmappable memory domains within the IGP, and so we cannot simply map any GPU pixmap without cost, but we can also have snoopable GPU memory (memory that can exist in the CPU cache). The single GPU memory domain argument was a fallacy from the start, and even more so with the advent of the shared last-level-cache between the CPU and GPU. Also it tends to be much much faster to copy from the GPU pixmap into system memory, perform the operation on the copy in system memory and then copy it back, than it is to try and perform the operation through a GTT mapping of a GPU pixmap (more so if you take care to amoritize the cost of the migration). Not to mention the asymmetry between upload and download speeds, and that we can exploit snoopable GPU memory to accelerate those transfers. So it turns out that having a very efficient pixmap migration strategy is the core of an acceleration architecture. In this regard SNA is very much like EXA.

From the design point of view, the only real difference between EXA and SNA is that EXA is a mere midlayer whose existence is to hinder the driver from doing the right thing, and SNA is a complete implementation. Since the devil is in the details (fallbacks are slow, don’t!), that difference turns out to be quite huge.

Advertisement

16 Comments

  1. Amazing post as usual !!!! Thanks for your work !!!!
    A couple of questions :
    1. Do you expect any other performance improvements in SNA or this is the best we can get ?
    2. Any plans to switch/support glamor ? What are the longterm plans for this driver ?
    3. Do you plan to remove EXA/UXA support when SNA is the default acceleration method ?

    • It is certainly not the end of the road for SNA; though I think it is getting to being almost as good as it can get with the current design. There are a few kernel patches pending to enable extra features and acceleration of a few more code paths (though insignificant for the cairo workload, they help to accelerate application like Chromium) and I’ve just begun to overhaul the shaders used by SNA to accelerate the actual operations. That should produce some significant improvements for the RENDER workloads. Looking beyond that, I need to improve the protocol to enable greater offload from the client to the driver and GPU – the rate limiting step now tends to be the application itself and cairo is a big part of the bottleneck in the rendering process.

      2. No. My long term plan for the driver is for it be the best possible in terms of stability, performance and features (including platform enabling). Glamor offers none of those, instead regressing on all.

      3. Yes. Once SNA has had sufficient testing to be enabled by default, there remains no reason for UXA.

  2. Hi Chris,

    Fedora 17 just delivered a new xorg-x11-drv-intel-2.20.1-1.fc17.x86_64 update which I got installed, however Xorg.0.log still reports UXA being enabled as an acceleration method. How do I enable SNA instead? I’ve got a GM45 as a GPU.

    Thanks,

    -Ilyes

    • Add either as a snippet in /etc/X11/xorg.conf.d/ or in /etc/X11/xorg.conf itself:

      Section “Device”
      Identifier “Device0”
      Driver “intel”
      Option “AccelMethod” “sna”
      EndSection

    • Option “AccelMethod” “sna” in a /etc/X11/xorg.conf.d/20-intel.conf config file.

  3. Yes 🙂

    Chris, how can I benchmark the two accel. methods on my laptop?

    Is there a dedicated test-suite, recommended benchmarks (maybe also cairo-based)?

    -Ilyes

    • The best benchmark is obviously the tasks you use everyday. If you can find a way to measure those, I’ll be very happy to add that to the list of things to optimise for.

      The metrics I use are x11perf, cairo-perf-trace, glxgears, mplayer along with watching top and perf top whilst working. If you want a demonstration of render performance differences between the drivers, checkout http://cgit.freedesktop.org/~ickle/cairo-demos

      • > I’ll be very happy to add that to the list of things to optimise for.

        Probably Web browsers, HTML5 (canvas-2d, WebGL, gstreamer) is a good use-case candidate for optimizations.

        OK, thanks! I’ll get few numbers.

        -Ilyes

  4. Chriss i enabled SNA on Fedora 17 and it works great. I will keep using it and report any problems i find.
    One question though : after the move to wayland the rendering will be happening by cairo without X11. What back end are you planning to use for cairo and wayland ? will the performance be comparable to X11/SNA ?
    Is there hope for Mac-like smooth graphics on the desktop with wayland and cairo in the (hopefully) near future ?

    Thanks in advance

    • We’d need Fedora to start packaging cairo/drm along with cairo/x11!

      I’ve been also testing on Fedora 17, and the entire desktop feel a lot snappier. The fish benchmark makes the session hang (I think that’s a SNA bug, Fedora 17 is still at the 2.20.1). VT still work though and I could get another shell and do stuff.

      -Ilyes

        • ickle
        • Posted August 5, 2012 at 8:21 am
        • Permalink

        Oh, that’s a cairo bug… It’s not hanging just taking seconds per frame as Cairo is doing its own fallback instead of noticing that it has two Xlib surfaces that it can render between. Fixed in cairo-1.12.

        The other thing it can be is that the fish-demo on UXA on gen2/3 is extremely, extremely slow…

    • Remember that Wayland is not a panacea, it is just an encapsulation of frame events into the core of the rendering protocol. It is technically possible to do everything Wayland does through X11, and we will only get the “every pixel is perfect” desktop if and only if every single toolkit/application are rewritten to give that smooth experience.

      So far the evidence has been that cairo-gl is simply not performant on any libGL, least of all 965_dri.so and 915_dri.so. It is going to be a problem in the brave new future.

      • Hi Chris,

        > So far the evidence has been that cairo-gl is simply not performant on any libGL,
        > least of all 965_dri.so and 915_dri.so. It is going to be a problem in the brave new future.

        Is it because of a possibly high context-setup overhead induced by toolkit/cairo/GL for simple 2D drawing operations?

        If yes, would this still be an issue if Weston have been targeting a (straightforward) 2D drawing API instead of OpenGL (still w/ an inter-operable surface allocation back-end)?

        -Ilyes

        • ickle
        • Posted August 5, 2012 at 9:49 am
        • Permalink

        Nope, the issue lies within the overhead of the mesa state tracker, inefficient command generation and poor buffer management of the driver.

        The tests I use only look at how well it renders a command stream, without even considering the complexities of toolkit interactions, more or less. (The actual trace does have behavioral influences from the toolkit that created it and those will undoubtably change in light of the GL overhead – but only because of the significant increase in overhead in comparison to the existing stack.)

      • Chris,

        I do remember that there was a talk scheduled during FOSDEM 2012, about optimizing 2D rendering on modern graphics stacks (post-Wayland) or something similar, however that talk didn’t happen. Do you still have that presentation and would it be possible to share it on this blog?

        -Ilyes

        • ickle
        • Posted August 5, 2012 at 10:01 am
        • Permalink

        Right, I had intended to go to FOSDEM and deconstruct the cairo-demos and describe what goes on underneath to put the pixels on the screen and contrast the methods used by the various backends. Sounds like a worthwhile exercise…


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: