Commit Graph

28 Commits

Author SHA1 Message Date
Chris Wilson c6dc27562a uxa: Only recreate the glyph cache on *generational* updates
The screen resources are recreated when the screen is rotated as well,
without being finalized. In this case, we do not need to reconstuct the
cache (or if we did, we would need to tear it down first).

Reported-by: Till Matthiesen <entropy@everymail.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33412
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-01-24 20:29:57 +00:00
Fernando Carrijo 6e08b0f48f Purge macro NEED_EVENTS
Signed-off-by: Fernando Carrijo <fcarrijo@yahoo.com.br>
Acked-by: Tiago Vignatti <tiago.vignatti@nokia.com>
Reviewed-by: Alan Coopersmith <alan.coopersmith@oracle.com>
2010-07-09 20:49:13 -07:00
Chris Wilson 4b7142baa0 uxa: Enable SHM pixmaps
Now with streaming uploads and downloads for composite operations in
place, shared memory pixmaps are no longer that dire performance wise.
With careful use these can in fact be the most efficient means of
transfer between a wholly software renderer in the client and a backing
store. For instance, Chromium renders internally to an ARGB32 image
buffer and uses a shared pixmap to composite dirty regions into the
backing store. Thereby using the GPU to either perform the blit or the
format conversion. Enabling shared pixmaps, reduces our CPU overhead
whilst scrolling by a factor of 5 or so.

And this is achieved simply by deleting obsolete code!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-06-19 13:39:48 +01:00
Chris Wilson d56ea7a852 Use the direct dixGevPrivate() API when available
This is quicker and smaller than the old indirect function call to
dixLookupPrivate().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-06-07 00:20:35 +01:00
Keith Packard 42ddc39430 Adapt to DevPrivate API changes
This allows the driver to be built against either the old or new
DevPrivate API.

Signed-off-by: Keith Packard <keithp@keithp.com>
2010-06-06 16:00:12 -07:00
Chris Wilson cd38b705be Disable acceleration if we detect a hardware error.
This is wildly optimistic, but it should work in a surprising number of
error situations and some output in those cases will be hopefully be
better than none...

If we submit a batchbuffer and the kernel reports the GPU is hung (which
will be caused by an earlier execbuffer, and so the kernel should have
had enough time to determine whether or not it could reset the GPU) then
disable any further attempt to accelerate gfx and force fallbacks to map
the buffers and use the CPU. We cannot normally map any more buffers if
the GPU is hung, so only those already mapped prior to the hang can be
written to, or those allocated in system memory. However, we can expect
that the framebuffer is already mapped, and so have a reasonable
expectation to continue to see the display update.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-31 18:00:11 +01:00
Chris Wilson 5fff430046 uxa: Mega-Glyphs!
Rewrite glyph rendering to avoid the intermediate buffer, accumulating
the glyph rectangles directly in the backend composite routines. And
modify the glyph cache routines to fully utilise the allocated size of
the tiled buffer on older hardware. To do this we alias all glyph sizes
into the same texture using a technique suggested by Keith Packard.

PineView:
  885/856-> 1150/1110 kglyph/s (aa/rgb)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-31 14:03:42 +01:00
Chris Wilson 5b2254838e uxa: Make the glyph caches' fixed size explicit.
Until we actual resize the glyph cache dynamically, make it obvious to
the reader and the compiler that the size is fixed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-28 12:47:26 +01:00
Chris Wilson 11581dda99 uxa: Use a glyph private rather than a hash table.
Store the cache position directly on the glyph using a devPrivate rather
than an through auxiliary hash table.

x11perf on PineView:
650/638 kglyphs/s -> 701/686 kglyphs/s [aa/rgb]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-28 12:44:34 +01:00
Chris Wilson 91f560034f uxa: Composite glyphs directly onto dst when possible.
Without using a mask and compositing directly onto the destination,
takes us from 580 kglyphs/s to 850 kglyphs/s on i945 [x11perf -aa10text].

However, the extra intersection check almost entirely cancels out the
speed up and we discover that the glyphs in x11perf are always
overlapping. Nothing is ever easy.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 18:31:15 +01:00
Chris Wilson e5c971e763 uxa: Spans! OMG!
Use composite rather than solid blits in order to bring performance on
a par with the CPU when using GEM and relocations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-23 18:43:29 +01:00
Chris Wilson cb887cfc67 uxa: solid rects
The cost of performing relocations outweigh the advantages of using the
blitter for solids with lots of rectangles.

References:

  Bug 22127 - [UXA] 50% performance regression for XRenderFillRectangles
  https://bugs.freedesktop.org/show_bug.cgi?id=22127

By using the 3D pipeline we improve our performance by around 4x on
i945, measured by the jxbench microbenchmark, and a factor of 10x by
short-cutting to the 3D pipeline for blended rectangles.

Before, on a i945GME:
  19982.412060 Ops/s; rects (!); 15x15
  9599.131693 Ops/s; rects (!); 75x75
  3803.654743 Ops/s; rects (!); 250x250
  6836.743772 Ops/s; rects blended; 15x15
  1443.750000 Ops/s; rects blended; 75x75
  495.335821 Ops/s; rects blended; 250x250
  23247.933884 Ops/s; rects composition (!); 15x15
  10993.073048 Ops/s; rects composition (!); 75x75
  3595.905172 Ops/s; rects composition (!); 250x250

After:
  87271.145975 Ops/s; rects (!); 15x15
  32347.744361 Ops/s; rects (!); 75x75
  5884.177215 Ops/s; rects (!); 250x250
  73500.000000 Ops/s; rects blended; 15x15
  33580.882353 Ops/s; rects blended; 75x75
  5858.811749 Ops/s; rects blended; 250x250
  25582.317073 Ops/s; rects composition (!); 15x15
  6664.728682 Ops/s; rects composition (!); 75x75
  14965.909091 Ops/s; rects composition (!); 250x250 [suspicious]

This has no impact on Cairo, but I have a suspicion from watching xtrace
that Qt likes to blit thousands of 1x1 rectangles with the same colour.
However, we are still around 2-3x slower than the reported figures for
EXA!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-12 12:50:31 +01:00
Chris Wilson c1afc831c8 uxa: Cache solid fills.
Maintain a small cache of pixmaps to hold SolidFill pictures. Currently
we create a pixmap the size of the damaged region and fill that using
pixman before downloading it to the GPU and compositing. Needless to say
this is extremely expensive compared to simply emitting the solid
colour. To mitigate this cost, we maintain a small cache of 1x1R
pictures which is recognised by the driver as being a solid, but at the
very least is maintained as a GPU ready pixmap.

This gives a good boost to cairo-xcb (which uses solid fills) on a gm45:

Before:
  gnome-terminal-vim: 41.9s
After:
  gnome-terminal-vim: 31.7s

Compare with using a cache of 1x1R pixmaps in cairo-xcb:
  gnome-terminal-vim: 31.6s

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2009-12-07 21:37:31 +00:00
Eric Anholt 8ae0e44e42 Move to kernel coding style.
We've talked about doing this since the start of the project, putting it off
until "some convenient time".  Just after removing a third of the driver seems
like a convenient time, when backporting's probably not happening much anyway.
2009-10-06 17:10:31 -07:00
Keith Packard 6361c3b9af Fix SHM functions to work with server after 1.6.0
Signed-off-by: Keith Packard <keithp@keithp.com>
2009-08-25 19:33:25 -07:00
Peter Hutterer 0a4c4c5fe8 Update to xextproto 7.1 support.
DPMS header was split into dpms.h (client) and dpmsconst.h (server). Drivers
need to include dpmsconst.h if xextproto 7.1 is available.

SHM is now shm.h instead of shmstr. Requires definition of ShmFuncs that's
not exported by the server.

Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
2009-07-18 12:10:18 +10:00
Eric Anholt 47591334a1 Remove pre-server-1.5 support. 2009-04-27 16:50:34 -07:00
Alan Coopersmith b8ca146b06 Fix UXA to build with Sun compilers (use __func__ instead of __FUNCTION__)
Signed-off-by: Alan Coopersmith <alan.coopersmith@sun.com>
2009-04-24 16:04:13 -07:00
Eric Anholt 22dc9a5580 Fix UXA for server 1.4. 2009-02-26 14:20:42 -08:00
Eric Anholt 3012d85cc5 uxa: Fix breakage from UXA_FALLBACK conversion from "do {} while (0)" construct.
Thanks to keithp for post-commit review.
2009-02-10 18:47:28 -08:00
Eric Anholt 5009127de7 uxa: Fix driver against fbDoCopy -> miDoCopy change in the server. 2009-02-10 18:23:35 -08:00
Eric Anholt 5212ec6515 uxa: hook up the fallback debug to the driver's fallback debug option. 2009-02-10 15:35:20 -08:00
Keith Packard 632f816c72 uxa: handle uxa_prepare_access failure
uxa_prepare_access may fail to map the pixmap into user space. Recover from
this without crashing.

Signed-off-by: Keith Packard <keithp@keithp.com>
2009-01-06 09:31:39 -08:00
Eric Anholt 261c20a479 uxa: Add in EnableDisableFBAccess handling like examodule.c did.
This fixes assertion failures when rendering text while VT switched.
2008-12-05 12:13:26 -08:00
Eamon Walsh 808b72f814 Change uxa private keys to integer variables.
Prepares for a devPrivates system that will store an index.
2008-08-26 22:34:05 -04:00
Keith Packard b2d058d80c Rename uxa using _ instead of caps 2008-08-05 15:41:52 -07:00
Keith Packard fc4d9c55a7 Change PrepareAccess to take access mode rather than index 2008-08-05 15:41:51 -07:00
Keith Packard 59774e9aca Add UXA - the unified memory acceleration architecture.
This eliminates the cost of EXA migration management while providing full
pixmap allocation control to the driver. The goal is to make something
useful for UMA drivers.
2008-08-05 15:29:50 -07:00