Carl Worth did the hard work in identifying that the regression in
cairo between X.org 1.6 and 1.7 was caused by cairo sending an a1
mask to the server in 1.7 whereas in 1.6 cairo used local fallbacks
(as the source was using RepeatPad, which triggers cairo's
'buggy_pad_reflect' fallback for X.org 1.6). This was causing the driver
to do a fallback to handle the a1 mask instead, which due to the GPU
pipeline stall is much more expensive than the equivalent fallback in
cairo.
Reference:
cairo's performance downgrades 4X with server master than server-1.6.
https://bugs.freedesktop.org/show_bug.cgi?id=23184
The fix is a relatively simple extension of the current
uxa_picture_from_pixman_image() to use CompositePicture() instead of
CopyArea() when we need to convert to a new format.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Currently when asked to composite using a gradient source or mask, we
fallback to using fbComposite(). This has the side-effect of causing a
readback on the destination surface, stalling the GPU pipeline. Instead,
like uxa_trapezoids(), we can use pixman to fill a scratch pixmap and then
copy that to an offscreen pixmap for use with uxa_composite().
Speedups on i915:
firefox-talos-svg: 710378.14 -> 549262.96: 1.29x speedup
No slowdowns.
Thanks to Søeren Sandmann Pedersen for spotting the missing
ValidatePicture().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This lets the driver allocate a nice idle buffer object instead of a
busy one, reducing runtime of firefox-20090601 on my G45 from 50.7 (+/- .41%)
to 48.4 (+/- 1.1%).
This was needed when we were doing the mask computations in this pixmap,
but now they're done in a temporary and then uploaded later.
This reduces runtime of firefox-20090601 from 52.6 (+/- .96%) to 50.7
(+/- .41%) seconds on my G45.
DPMS header was split into dpms.h (client) and dpmsconst.h (server). Drivers
need to include dpmsconst.h if xextproto 7.1 is available.
SHM is now shm.h instead of shmstr. Requires definition of ShmFuncs that's
not exported by the server.
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Don't do it, treat this the same as every other prepare access call in uxa.
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Owain Ainsworth <zerooa@googlemail.com>
Since we're only doing software rasterization right now, anyway, it
makes more sense to just rasterize to system memory and then upload
to a pixmap once complete. This avoids expensive read-modify-write
cycles.
This results in a 2.4x speedup for a real-world test case that's
heavy on trapezoids, which is swfdec running on the following file:
http://michalevy.com/wp-content/uploads/Giant%20Steps%202007.swf
Many thanks to Chris Wilson for his cairo-traces repository and
cairo-perf-trace tool which makes it so easy to measure things
like this.
With glyphs sitting in per-glyph pixmaps, there's no reason to use the CPU
to move them to the cache pixmap, and lots of reasons to use the accelerator.
Signed-off-by: Keith Packard <keithp@keithp.com>
Without this, rendering component-alpha glyphs may break without a mask.
Bug #19534. Ported from fix by Michel Dänzer <daenzer@vmware.com> in
xserver commit 639f289dcdbe00a516820f573c01a8339e120ed4
This avoids prepare/finish_access_gc overhead when we're not changing things
(since GCTile is already handled) and get us the RW flag for the prepare on
of the stipple pixmap so thing will be synced correctly.
We can get a case with gnome-terminal + links, where we get two arrays
of glyphs all with 0 width and 0 heights in them. If this happens
we manage to get to this case without any buffer setup and segfault.
(cherry picked from commit 717c7492a0f6ba3fb3eabda33515881eef314155)
GCC isn't smart enough to analyze the control flow and figure out that
these are false positives, but initializing them shouldn't hurt, so work
around it.
This eliminates the cost of EXA migration management while providing full
pixmap allocation control to the driver. The goal is to make something
useful for UMA drivers.