The introduction of KMS and GEM into the i915 driver broke the i830/i845 chipsets, and a lots of hearts. But fear not! A decade after its introduction, we finally have a driver that is not only stable, but capable of accelerating firefox.
The problem?
The problem was, simply, we could not find a way to enable dynamic video memory on the ancient i830/i845 chipsets without it eventually eating garbage. Since dynamic memory management was the raison d’etre of GEM and critical for acceleration, it is a requirement of the current driver stack. The first cunning solution was simply never to reuse batch buffers, and keep a small amount of memory reserved for our usage. This stopped the command streamer from seeing the garbage, and my system has remained stable for many hours of thrashing. Daniel Vetter extended my solution to implement a kernel workaround whereby every batch would be copied into a reserved area before execution. In the end, we compromised so that I could avoid that extra copy and assume responsibility in the driver for ensuring the batch was coherent, but the kernel would intervene for any non-cooperative driver.
With these workarounds in place, we are finally able to run through the test suites. Which brought us to the next problem:
The sad fact is that UXA is inadequate for the challenge of accelerating the Render protocol.
If we compare with an architecture that was designed to accelerate cairo, SNA:
We find a much happier result. In all cases the performance is at least as good as using a software rasteriser in the X server, and often much better than if we avoided the Render protocol entirely and did the rasterisation in the client. With a little more tuning, we may be able to achieve parity even in the worst case – if we can win on an old GPU with an ancient CPU (single core, virtually no cache and even less memory bandwidth) we should be able to excel on more recent GPUs and CPUs, and be more efficient in the process.
Yet, the Render protocol is not the be-all-and-end-all of acceleration. We need to keep an eye on the basics as well, the copies, the fills and the uploads, to know if we are achieving our goals. The basic premise is that using the driver (and thus the GPU) is faster than just using the CPU for everything. (In reality, the choice is more complicated because we have to consider the efficacy of GPU offload for enabling the CPU to get on with other tasks and overall power efficiency.)
1: Baseline performance of Xvfb
2: SNA with acceleration disabled (shadow)
3: UXA
4: SNA1 2 3 4 Operation
-------- ------ ------ ------ ---------
277000.0 2.04 1.06 4.58 Char in 80-char aa line (Charter 10)
265000.0 2.15 1.11 4.83 Char in 80-char rgb line (Charter 10)
312000.0 0.66 0.15 1.38 Copy 10x10 from window to window
6740.0 0.90 1.56 1.75 Copy 100x100 from window to window
382.0 0.92 1.30 1.36 Copy 500x500 from window to window
268000.0 0.74 0.17 1.50 Copy 10x10 from window to pixmap
7260.0 0.87 1.43 1.85 Copy 100x100 from window to pixmap
376.0 0.94 1.28 1.37 Copy 500x500 from window to pixmap
154000.0 0.74 0.69 0.86 PutImage 10x10 square
1880.0 1.04 1.05 1.04 PutImage 100x100 square
87.1 1.03 1.02 1.01 PutImage 500x500 square
308000.0 0.58 0.46 0.66 ShmPutImage 10x10 square
6500.0 1.02 1.16 1.24 ShmPutImage 100x100 square
380.0 1.00 1.24 1.28 ShmPutImage 500x500 square
So it appears that using the GPU for basic operations such as moving the windows about is only at most a marginal win over using a shadow buffer (and often times UXA fails at even that). Overall then it seems that enabling UXA should bring nothing but misery.
8 Comments
Hi!
Sorry for my English.
Please tell me, for which versions of xf86-video-intel and kernel this is relevant.
There are two patches which can either be used separately or together to bring stability to the system. In the kernel, we expect to land the patch first in 3.8-rc and then push it back through the stable trees, so hopefully stable+1 (3.7.1 and 3.6.11). In the ddx, SNA is stable on 830/845 with 2.20.16 and I’ll push 2.20.17 to take advantage of the opt-out as soon as the kernel patch goes upstream.
Thank you very much.
Thank you very much. I have a actively used D845GVAD2 board and I get the following message almost everytime I boot up (Ubuntu 12.04):
[ 25.522002] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[ 25.524021] render error detected, EIR: 0x00000010
[ 25.524021] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking
[ 25.524021] render error detected, EIR: 0x00000010
And sometimes I get “GPU hung” error after which the screen gets really scary weird colors. Thank you very much for supporting the old Intel hardware.
The first error is from a conflict with BIOS whilst setting up KMS. Eventually we will get the ordering robust against whatever else is going on. It is the second class error that we hope to have finally fixed. Packages for you to test should be available in xorg-edgers, but I guess those are only based on 12.10/13.04.
Wow, I am running Ubuntu 12.04 Precise with the xorg-edgers packages and I don’t get the above error and neither do I get GPU hangs or screen corruptions. “Works like a charm.” Thanks a very very lot for reviving this chipset! Yay!!!!!
Out of curiousity, what are the specs of the system used to run these benchmarks (the exact cpu and gpu model)?
A P4 celeron, with a Brookdale (845g):
i845:~$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Celeron(R) CPU 2.40GHz
stepping : 9
microcode : 0x1a
cpu MHz : 2392.236
cache size : 128 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr
bogomips : 4784.47
clflush size : 64
cache_alignment : 128
address sizes : 36 bits physical, 32 bits virtual
power management:
i845:~$ lspci -v -s 0:0:2.0
00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 03) (prog-if 00 [VGA controller])
Subsystem: Dell Device 0149
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at e0000000 (32-bit, prefetchable) [size=128M]
Memory at f6f80000 (32-bit, non-prefetchable) [size=512K]
Expansion ROM at [disabled]
Capabilities:
Kernel driver in use: i915