Skip navigation

Category Archives: cairo

The only working Nvidia system I have at the moment is an antiquated, cheap little Ion netbook with a weak N270 Atom. So let see how it fares in the debate whether it is preferrable to use the image backend and push images to the server rather than send geometry via XRender.

Relative performance on Nvidia Ion

The white baseline in the centre is the performance of the image backend on the N270. Above that line and the driver is faster and more responsive, below it consumes more CPU, more power and lags more than if we were just to render it ourselves.

What we can see is that as a result of Nvidia investing engineering time in their driver, on the whole, it performs better than we could by just using the CPU. Better performance means more responsive user interfaces that sip less power, meaning happier users for longer.

Advertisements

18 months is a long time without an update. So why all the excitement now? Well it is a little story of one or two features and lots of performance…

First up, lets compare the relative performance of Cairo versions 1.10 and 1.12 on a recent SandyBridge machine, a i5-2520m, by looking at how long it takes to replay a selection of application traces.
Relative Cairo peformance on Sandybridge

The white line across the centre represents the baseline performance of cairo-1.10. Above that line and that backend is faster with cairo-1.12 for that application, below slower.

Across the board, with one or two exceptions, Cairo version 1.12 is much faster than 1.10. (Those one or two exceptions are interesting and quite unexpected. Analysing those regressions should help push Cairo further.) If we focus on the yellow bar, you can see that the performance of the baseline image backend has been improved consistently, peaking at almost 4 times faster, but on average around 50% faster. Note this excludes any performance gains also made by pixman within the same time frame. The other bars compare the other backends, the xlib backend using the SNA and UXA acceleration methods along with the pure CPU rasteriser (xvfb), and the OpenGL backend.

That tells us that we have made good improvements to Cairo’s performance, but a question that is often asked is whether using XRender is worthwhile and whether we would have better performance and more responsive user interfaces if we were just to use CPU rasterisation (using the image backend) and push those images to the X server, https://bugzilla.mozilla.org/show_bug.cgi?id=738937 for instance. To answer that we need to look beyond the relative performance of the same backend over time, and instead look at the relative performance of the backends against image. (Note this excludes any transfer overhead, which in practice is mostly negligible.)

Relative performance of X against the image backend on SandyBridge
 
The answer is then a resounding no, even on integrated graphics, provided you have a good driver. And that lead will be stretched further in favour of using X (and thus the GPU) as the capabilities of the GPU grow faster than the CPU. Whilst we remain GPU bound at least! On the contrary, though it does say that without any driver it would be preferrable to perform client side rasterisation into a shared memory image to avoid he performance loss due to trapezoid passing in the protocol.

Given that we are forced by the XRender protocol to use a sub-optimal rendering method, we should do even better if we cut out the indirect rendering via X and rendered directly onto the GPU using OpenGL. What happens when we try?

Relative performance of X against the image backend on SandyBridge

Not only is the current gl backend overshadowed by the xlib backend, but it fails to even match the pure CPU rasteriser. There are features of the GPU that we are not using yet and are being experimented with, such as using multi-sample buffers and relying on the GPU to perform low quality antialiasing, but as for today it is applying the same methodology as the X driver uses internally and yet performs much worse.

The story is more less the same across all the generations of Intel’s integrated graphics.

Going back a generation to IronLake integrated graphics:
Relative performance of X against the image backend on IronLake

And for the netbooks we use PineView integrated graphics:
Relative performance of X against the image backend on PineView

So the next time your machine feels slow or runs out of battery, it is more than likely to be the fault of your drivers.

Just an update incorporating suggestions into the presentation of the graphs – the principal change being to trim outliers from the relative performance graph. Let me know what you think.
cairo-perf-chart-absolute
cairo-perf-chart-relative

[graphs uploaded outside of wordpess to avoid rescaling]

My first hands-on with an i965. Running from a live image as I haven’t got a spare hard-drive to dual-boot yet. Aside from that and that cairo-drm on an i965 is completely unaccelerated (so the overhead of this backend is entirely down to buffer management). Caveat lector.
cairo-perf-chart

image drm xlib
evolution 0.0 -497.9 -594.1
firefox-planet-gnome 0.0 -157.0 -98.7
firefox-talos-gfx 0.0 -6.1 -139.7
firefox-talos-svg 0.0 -12.0 -105.4
gnome-system-monitor 0.0 -7.9 -9.4
gnome-terminal-vim 0.0 -11.2 -579.6
gvim 0.0 -658.0 -583.7
poppler 0.0 -0.2 -579.0
swfdec-giant-steps 0.0 -13.1 -169.3
swfdec-youtube 0.0 -4.5 -39.1

[image] 1.9.2-457-g3bc00af.image.karmic-alpha4
[drm] 1.9.2-457-g3bc00af.drm.karmic-alpha4
[xlib] 1.9.2-457-g3bc00af.xlib.karmic-alpha4

Hmm, I think there is room for improvement here.

Or, “why we don’t use glitz anymore”. Adding a cairo-glitz run on tiny (an i915):
cairo-perf-chart

image xlib drm gl glitz
epiphany-20090810 0.0 -106.5 -19.8 -235.8 -741.1
evolution-20090607 0.0 -100.3 -130.6 -440.6 -4150.4
evolution-20090618 0.0 -61.9 -69.4 -380.3
firefox-20090601 0.0 -103.5
firefox-periodic-table 0.0 -92.3 20.6 -228.6 -927.3
firefox-talos-gfx-20090702 0.0 8.3 381.6 -207.1 -230.5
firefox-world-map 0.0 -186.2 27.0 -51.1 -259.6
gnome-terminal-20090601 0.0 -29.7 207.9 -287.3 -262.0
gnome-terminal-20090728 0.0 60.0 406.7 -70.5 -129.4
poppler-20090811 0.0 -70.2 132.4 -344.2 -322.4
poppler-bug-12266 0.0 -101.6 59.0 -146.5
swfdec-fill-rate 0.0 -112.9 36.3 30.1
swfdec-fill-rate-2xaa 0.0 -69.3 345.1 45.7 -5498.1
swfdec-fill-rate-4xaa 0.0 -244.4 0.6 -3.2
swfdec-giant-steps 0.0 -41.7 -66.6 -260.8 -275.6
swfdec-youtube 0.0 12.1 192.9 7.6 -81.4

[image] 1.9.2-505-g2e9cad3.tiny
[xlib] 1.9.2-505-g2e9cad3.xlib.tiny
[drm] 1.9.2-525-g8c7de80.drm.tiny
[gl] 1.9.2-525-g8c7de80.gl.tiny
[glitz] 1.9.2-564-g1b24626.glitz.tiny

And some performance results for a really old machine, this must be almost 5 years old. 😉
cairo-perf-chart

image xlib
evolution 0.0 -179.1
firefox-planet-gnome 0.0 -67.9
firefox-talos-gfx 0.0 9.1
firefox-talos-svg 0.0 -376.1
gnome-system-monitor 0.0 -112.3
gvim 0.0 140.1
poppler 0.0 -84.3
swfdec-giant-steps 0.0 -151.3
swfdec-youtube 0.0 -88.6
vim 0.0 -12.1

[image] 1.9.2-560-g221285f.image.inspired
[xlib] 1.9.2-560-g221285f.xlib.inspired

Another snapshot of performance this time from minime (a G4 mini-mac with a Radeon 200 GPU):
cairo-perf-chart

image xlib gl
epiphany-20090810 0.0 -94.1 -1777.1
evolution-20090607 0.0 -704.5 -4270.2
evolution-20090618 0.0 -146.2 -3470.6
firefox-20090601 0.0 -142.3 -1326.4
firefox-periodic-table 0.0 -26.2 -819.2
firefox-talos-gfx-20090702 0.0 32.7 -358.7
firefox-talos-svg-20090702 0.0 -49.7 -280.5
firefox-world-map 0.0 -107.9 -1138.3
gnome-terminal-20090601 0.0 20.9 -1374.8
gnome-terminal-20090728 0.0 39.8 -1941.1
poppler-20090811 0.0 -10.7 -1192.6
poppler-bug-12266 0.0 -30.3 -416.5
swfdec-fill-rate 0.0 -50.2 -843.3
swfdec-fill-rate-2xaa 0.0 -45.7 -1569.2
swfdec-fill-rate-4xaa 0.0 -84.0 -1724.1
swfdec-giant-steps 0.0 -18.3 -3232.6
swfdec-youtube 0.0 -14.0 -1100.5

[image] 1.9.2-507-g0136989.image.minime
[xlib] 1.9.2-507-g0136989.xlib.minime
[gl] 1.9.2-507-g0136989.gl.minime

Oh dear, the performance is truly dire.

This time comparing different backends using the ‘fast’ wip/stroke-to-path branch:
cairo-perf-chart

image xlib drm gl
epiphany-20090810 0.0 -106.5 -19.8 -235.8
evolution-20090607 0.0 -100.3 -130.6 -440.6
evolution-20090618 0.0 -61.9 -69.4 -380.3
firefox-20090601 0.0 -103.5
firefox-periodic-table 0.0 -92.3 20.6 -228.6
firefox-talos-gfx-20090702 0.0 8.3 381.6 -207.1
firefox-world-map 0.0 -186.2 27.0 -51.1
gnome-terminal-20090601 0.0 -29.7 207.9 -287.3
gnome-terminal-20090728 0.0 60.0 406.7 -70.5
poppler-20090811 0.0 -70.2 132.4 -344.2
poppler-bug-12266 0.0 -101.6 59.0
swfdec-fill-rate 0.0 -112.9 36.3 30.1
swfdec-fill-rate-2xaa 0.0 -69.3 345.1 45.7
swfdec-fill-rate-4xaa 0.0 -244.4 0.6 -3.2
swfdec-giant-steps 0.0 -41.7 -66.6 -260.8
swfdec-youtube 0.0 12.1 192.9 7.6

[image] 1.9.2-505-g2e9cad3.tiny
[xlib] 1.9.2-505-g2e9cad3.xlib.tiny
[drm] 1.9.2-525-g8c7de80.drm.tiny
[gl] 1.9.2-525-g8c7de80.gl.tiny

As always there is more work to do.

This time I’m comparing the performance of cairo-xlib using the latest intel-gfx drivers for i915. The big news here is that the use of server-side gradients causes a performance regression as opposed to their presumed benefits. In contrast tot the previous table, this show performance as a percentage speedup relative to the first result (in this case normally 1.8.8). This makes the regressions much clearer.

cairo-perf-chart

epiphany-20090810 0.0 5.5 -18.4
evolution-20090607 0.0 6.3 4.6
evolution-20090618 0.0 4.0
firefox-20090601 0.0 1.6 -21.5
firefox-periodic-table 0.0 -15.9 -12.6
firefox-talos-gfx-20090702 0.0 1.8 0.3
firefox-world-map 0.0 3.6 16.5
gnome-terminal-20090601 0.0 7.0 7.2
gnome-terminal-20090728 0.0 2.1 0.3
poppler-20090811 0.0 1.5 1.6
poppler-bug-12266 0.0 -0.9 1256.8
swfdec-fill-rate 0.0 -0.4 0.9
swfdec-fill-rate-2xaa 0.0 -0.1 11.1
swfdec-fill-rate-4xaa 0.0 -1.3 3.3
swfdec-giant-steps 0.0 7.8 11.3
swfdec-youtube 0.0 -3.5 -15.3

[0] 1.8.8.xlib.tiny
[1] 1.9.2.xlib.tiny
[2] 1.9.2-505-g2e9cad3.xlib.tiny

Performance comparision of cairo-image from 1.8.8 to current on a slow netbook:
performance-1.8.8..

epiphany-20090810 1.00 1.02 1.48
evolution-20090607 1.02 1.00 4.33
evolution-20090618 1.00 1.02 2.15
firefox-20090601 1.00 1.02 1.83
firefox-periodic-table 1.19 1.00 1.86
firefox-talos-gfx-20090702 1.12 1.00 1.65
firefox-world-map 1.00 2.51 2.27
gnome-terminal-20090601 1.01 1.00 1.27
gnome-terminal-20090728 1.09 1.00 1.45
poppler-20090811 1.00 1.00 1.09
poppler-bug-12266 1.32 1.00 13.16
swfdec-fill-rate 1.00 1.03 1.09
swfdec-fill-rate-2xaa 1.00 1.00 1.15
swfdec-fill-rate-4xaa 1.00 1.00 1.16
swfdec-giant-steps 1.00 1.23 1.31
swfdec-youtube 1.00 1.00 1.14

[0] 1.8.8.tiny, image
[1] 1.9.2.tiny, image
[2] 1.9.2-505-g2e9cad3.tiny, image