Skip navigation

Monthly Archives: March 2012

The only working Nvidia system I have at the moment is an antiquated, cheap little Ion netbook with a weak N270 Atom. So let see how it fares in the debate whether it is preferrable to use the image backend and push images to the server rather than send geometry via XRender.

Relative performance on Nvidia Ion

The white baseline in the centre is the performance of the image backend on the N270. Above that line and the driver is faster and more responsive, below it consumes more CPU, more power and lags more than if we were just to render it ourselves.

What we can see is that as a result of Nvidia investing engineering time in their driver, on the whole, it performs better than we could by just using the CPU. Better performance means more responsive user interfaces that sip less power, meaning happier users for longer.

Advertisement

18 months is a long time without an update. So why all the excitement now? Well it is a little story of one or two features and lots of performance…

First up, lets compare the relative performance of Cairo versions 1.10 and 1.12 on a recent SandyBridge machine, a i5-2520m, by looking at how long it takes to replay a selection of application traces.
Relative Cairo peformance on Sandybridge

The white line across the centre represents the baseline performance of cairo-1.10. Above that line and that backend is faster with cairo-1.12 for that application, below slower.

Across the board, with one or two exceptions, Cairo version 1.12 is much faster than 1.10. (Those one or two exceptions are interesting and quite unexpected. Analysing those regressions should help push Cairo further.) If we focus on the yellow bar, you can see that the performance of the baseline image backend has been improved consistently, peaking at almost 4 times faster, but on average around 50% faster. Note this excludes any performance gains also made by pixman within the same time frame. The other bars compare the other backends, the xlib backend using the SNA and UXA acceleration methods along with the pure CPU rasteriser (xvfb), and the OpenGL backend.

That tells us that we have made good improvements to Cairo’s performance, but a question that is often asked is whether using XRender is worthwhile and whether we would have better performance and more responsive user interfaces if we were just to use CPU rasterisation (using the image backend) and push those images to the X server, https://bugzilla.mozilla.org/show_bug.cgi?id=738937 for instance. To answer that we need to look beyond the relative performance of the same backend over time, and instead look at the relative performance of the backends against image. (Note this excludes any transfer overhead, which in practice is mostly negligible.)

Relative performance of X against the image backend on SandyBridge
 
The answer is then a resounding no, even on integrated graphics, provided you have a good driver. And that lead will be stretched further in favour of using X (and thus the GPU) as the capabilities of the GPU grow faster than the CPU. Whilst we remain GPU bound at least! On the contrary, though it does say that without any driver it would be preferrable to perform client side rasterisation into a shared memory image to avoid he performance loss due to trapezoid passing in the protocol.

Given that we are forced by the XRender protocol to use a sub-optimal rendering method, we should do even better if we cut out the indirect rendering via X and rendered directly onto the GPU using OpenGL. What happens when we try?

Relative performance of X against the image backend on SandyBridge

Not only is the current gl backend overshadowed by the xlib backend, but it fails to even match the pure CPU rasteriser. There are features of the GPU that we are not using yet and are being experimented with, such as using multi-sample buffers and relying on the GPU to perform low quality antialiasing, but as for today it is applying the same methodology as the X driver uses internally and yet performs much worse.

The story is more less the same across all the generations of Intel’s integrated graphics.

Going back a generation to IronLake integrated graphics:
Relative performance of X against the image backend on IronLake

And for the netbooks we use PineView integrated graphics:
Relative performance of X against the image backend on PineView

So the next time your machine feels slow or runs out of battery, it is more than likely to be the fault of your drivers.