Skip navigation

The introduction of KMS and GEM into the i915 driver broke the i830/i845 chipsets, and a lots of hearts. But fear not! A decade after its introduction, we finally have a driver that is not only stable, but capable of accelerating firefox.

The problem?

The problem was, simply, we could not find a way to enable dynamic video memory on the ancient i830/i845 chipsets without it eventually eating garbage. Since dynamic memory management was the raison d’etre of GEM and critical for acceleration, it is a requirement of the current driver stack. The first cunning solution was simply never to reuse batch buffers, and keep a small amount of memory reserved for our usage. This stopped the command streamer from seeing the garbage, and my system has remained stable for many hours of thrashing. Daniel Vetter extended my solution to implement a kernel workaround whereby every batch would be copied into a reserved area before execution. In the end, we compromised so that I could avoid that extra copy and assume responsibility in the driver for ensuring the batch was coherent, but the kernel would intervene for any non-cooperative driver.

With these workarounds in place, we are finally able to run through the test suites. Which brought us to the next problem:

UXA vs software rasterisation on 845g

The sad fact is that UXA is inadequate for the challenge of accelerating the Render protocol.

If we compare with an architecture that was designed to accelerate cairo, SNA:

SNA vs software rasterisation on 845g

We find a much happier result. In all cases the performance is at least as good as using a software rasteriser in the X server, and often much better than if we avoided the Render protocol entirely and did the rasterisation in the client. With a little more tuning, we may be able to achieve parity even in the worst case – if we can win on an old GPU with an ancient CPU (single core, virtually no cache and even less memory bandwidth) we should be able to excel on more recent GPUs and CPUs, and be more efficient in the process.

Yet, the Render protocol is not the be-all-and-end-all of acceleration. We need to keep an eye on the basics as well, the copies, the fills and the uploads, to know if we are achieving our goals. The basic premise is that using the driver (and thus the GPU) is faster than just using the CPU for everything. (In reality, the choice is more complicated because we have to consider the efficacy of GPU offload for enabling the CPU to get on with other tasks and overall power efficiency.)

1: Baseline performance of Xvfb
2: SNA with acceleration disabled (shadow)
3: UXA
4: SNA

     1        2       3       4    Operation
--------  ------  ------  ------   ---------
277000.0    2.04    1.06    4.58   Char in 80-char aa line (Charter 10) 
265000.0    2.15    1.11    4.83   Char in 80-char rgb line (Charter 10) 
312000.0    0.66    0.15    1.38   Copy 10x10 from window to window 
  6740.0    0.90    1.56    1.75   Copy 100x100 from window to window 
   382.0    0.92    1.30    1.36   Copy 500x500 from window to window 
268000.0    0.74    0.17    1.50   Copy 10x10 from window to pixmap 
  7260.0    0.87    1.43    1.85   Copy 100x100 from window to pixmap 
   376.0    0.94    1.28    1.37   Copy 500x500 from window to pixmap 
154000.0    0.74    0.69    0.86   PutImage 10x10 square 
  1880.0    1.04    1.05    1.04   PutImage 100x100 square 
    87.1    1.03    1.02    1.01   PutImage 500x500 square 
308000.0    0.58    0.46    0.66   ShmPutImage 10x10 square 
  6500.0    1.02    1.16    1.24   ShmPutImage 100x100 square 
   380.0    1.00    1.24    1.28   ShmPutImage 500x500 square 

So it appears that using the GPU for basic operations such as moving the windows about is only at most a marginal win over using a shadow buffer (and often times UXA fails at even that). Overall then it seems that enabling UXA should bring nothing but misery.

About these ads

8 Comments

    • Eugeny Shkrigunov
    • Posted December 18, 2012 at 9:21 am
    • Permalink
    • Reply

    Hi!
    Sorry for my English.
    Please tell me, for which versions of xf86-video-intel and kernel this is relevant.

    • There are two patches which can either be used separately or together to bring stability to the system. In the kernel, we expect to land the patch first in 3.8-rc and then push it back through the stable trees, so hopefully stable+1 (3.7.1 and 3.6.11). In the ddx, SNA is stable on 830/845 with 2.20.16 and I’ll push 2.20.17 to take advantage of the opt-out as soon as the kernel patch goes upstream.

        • Eugeny Shkrigunov
        • Posted December 18, 2012 at 10:31 am
        • Permalink

        Thank you very much.

  1. Thank you very much. I have a actively used D845GVAD2 board and I get the following message almost everytime I boot up (Ubuntu 12.04):

    [ 25.522002] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
    [ 25.524021] render error detected, EIR: 0×00000010
    [ 25.524021] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0×00000010, masking
    [ 25.524021] render error detected, EIR: 0×00000010

    And sometimes I get “GPU hung” error after which the screen gets really scary weird colors. Thank you very much for supporting the old Intel hardware.

    • The first error is from a conflict with BIOS whilst setting up KMS. Eventually we will get the ordering robust against whatever else is going on. It is the second class error that we hope to have finally fixed. Packages for you to test should be available in xorg-edgers, but I guess those are only based on 12.10/13.04.

        • Aditya
        • Posted December 28, 2012 at 1:00 pm
        • Permalink

        Wow, I am running Ubuntu 12.04 Precise with the xorg-edgers packages and I don’t get the above error and neither do I get GPU hangs or screen corruptions. “Works like a charm.” Thanks a very very lot for reviving this chipset! Yay!!!!!

  2. Out of curiousity, what are the specs of the system used to run these benchmarks (the exact cpu and gpu model)?

    • A P4 celeron, with a Brookdale (845g):

      i845:~$ cat /proc/cpuinfo
      processor : 0
      vendor_id : GenuineIntel
      cpu family : 15
      model : 2
      model name : Intel(R) Celeron(R) CPU 2.40GHz
      stepping : 9
      microcode : 0x1a
      cpu MHz : 2392.236
      cache size : 128 KB
      fdiv_bug : no
      hlt_bug : no
      f00f_bug : no
      coma_bug : no
      fpu : yes
      fpu_exception : yes
      cpuid level : 2
      wp : yes
      flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr
      bogomips : 4784.47
      clflush size : 64
      cache_alignment : 128
      address sizes : 36 bits physical, 32 bits virtual
      power management:

      i845:~$ lspci -v -s 0:0:2.0
      00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 03) (prog-if 00 [VGA controller])
      Subsystem: Dell Device 0149
      Flags: bus master, fast devsel, latency 0, IRQ 11
      Memory at e0000000 (32-bit, prefetchable) [size=128M]
      Memory at f6f80000 (32-bit, non-prefetchable) [size=512K]
      Expansion ROM at [disabled]
      Capabilities:
      Kernel driver in use: i915


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: