Set the correct offset for the gradients patterns after rendering to a
local Picture.
Fixes cairo/test/huge-radial and friends
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the source may not cover the extents, we need to represent those
areas as transparent in the fallback picture, ergo we need an alpha
channel. We could be smarter and force a format conversion when
necessary, and we could let the backend choose the most appropriate
format.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
References:
Bug 28098 Compiz renders shadows wrong, garbage line of pixels along left
and top edge of windows
https://bugs.freedesktop.org/show_bug.cgi?id=28098
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I'm seeing garbage alpha for rendercheck blend:
x8r8g8b8a 10x10 SRC ar8g8b8a
so disable blitting until I work out if we can fast-path it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Allow us to check whether we can handle the operation using the blitter
prior to doing any work.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In the change to prevent blitting between incompatible sources, we also
prevented 1x1R pixmaps from being used for solid fills. Reorder the
sequence of conditions to enable this fast path again.
If we are destined to target an !offscreen drawable, then uploading the
trapezoid mask to a bo is the last thing we actually want to do...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The cost of performing relocations outweigh the advantages of using the
blitter for solids with lots of rectangles.
References:
Bug 22127 - [UXA] 50% performance regression for XRenderFillRectangles
https://bugs.freedesktop.org/show_bug.cgi?id=22127
By using the 3D pipeline we improve our performance by around 4x on
i945, measured by the jxbench microbenchmark, and a factor of 10x by
short-cutting to the 3D pipeline for blended rectangles.
Before, on a i945GME:
19982.412060 Ops/s; rects (!); 15x15
9599.131693 Ops/s; rects (!); 75x75
3803.654743 Ops/s; rects (!); 250x250
6836.743772 Ops/s; rects blended; 15x15
1443.750000 Ops/s; rects blended; 75x75
495.335821 Ops/s; rects blended; 250x250
23247.933884 Ops/s; rects composition (!); 15x15
10993.073048 Ops/s; rects composition (!); 75x75
3595.905172 Ops/s; rects composition (!); 250x250
After:
87271.145975 Ops/s; rects (!); 15x15
32347.744361 Ops/s; rects (!); 75x75
5884.177215 Ops/s; rects (!); 250x250
73500.000000 Ops/s; rects blended; 15x15
33580.882353 Ops/s; rects blended; 75x75
5858.811749 Ops/s; rects blended; 250x250
25582.317073 Ops/s; rects composition (!); 15x15
6664.728682 Ops/s; rects composition (!); 75x75
14965.909091 Ops/s; rects composition (!); 250x250 [suspicious]
This has no impact on Cairo, but I have a suspicion from watching xtrace
that Qt likes to blit thousands of 1x1 rectangles with the same colour.
However, we are still around 2-3x slower than the reported figures for
EXA!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the first step to handling unsupported texture formats, double check
that the converted pattern can be used as a texture by the card.
Fixes: rendercheck -t repeat
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
x11perf regression caused by 2D driver
https://bugs.freedesktop.org/show_bug.cgi?id=28047
caused by
commit a7b800513f
uxa: Extract sub-region from in-memory buffers.
The issue is that as we extract the region prior to checking whether the
composite can in fact be accelerated, we perform expensive surplus
operations. This is particularly noticeable for ComponentAlpha text,
such as rgb10text. The solution here is to rearrange the
check_composite() prior to acquiring the sources, and only extracting
the subregion if the render path can not actually handle the texture.
Performance (on PineView):
a7b800513^: aa=68600 glyphs/s, rgb=29900 glyphs/s
a7b800513: aa=65700 glyphs/s, rgb=13200 glyphs/s
now: aa=66800 glyph/s, rgb=28800 glyphs/s
The residual lossage seems to be from the extra function call and
dixPrivate lookups. Hmm. More warning is the extremely low performance,
however the results are consistent so the improvement looks real...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We can also convert a composite with an integer translation into a
blit, so long as the sample extents remains within the source.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the buffer is too large or not suitable for a GPU operation, we
currently fallback and perform the composite on the CPU. An alternative
is too extract the small region out of the source (as usually the
sample extents are much smaller than the actual surface size) and try
the composite with the new surface.
The effect is particularly noticeable on pathological websites that use
very large background images. For example, http://www.woodtv.com/ uses a
1299x15000 pattern that is obscured by another opaque pattern.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This reverts commit 6d50553e8f.
Now we have taught the fallback path not to infinitely recurse,
re-enable the accelerated path for ShmPutImage and friends.
Often, for example in the fallback for ShmPutImage, we will attempt to
use uxa_copy_area() copying to a normal pixmap from a memory buffer.
This triggers a fallback, and maps the destination pixmap back into the
GTT. The accelerated put_image path will attempt to stream a blit to the
destination pixmap if it is currently active, avoiding the stall.
Do not try to fixup the alpha in the ff/shaders as this has the
side-effect of overriding the alpha value of the border color, causing
images to be padded with black rather than transparent. This can
generate large and obnoxious visual artefacts.
Fixes:
Bug 17933 - x8r8g8b8 doesn't sample alpha=0 outside surface bounds
http://bugs.freedesktop.org/show_bug.cgi?id=17933
and many related cairo test suite failures.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The code was using uint32_t where an XID (currently an unsigned long)
was specified in the prototype. Use XID to avoid both the warning and
any potential problem.
In separating the boolean logic out into a separate function, dc6522dd,
I reversed the sense of one particular test:
src->format == dst->format
The OVER optimisation is only valid if the src and dst formats match,
but not always.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
On failing to extract the pixel value for an alpha-only solid we
actually triggered a fallback. Since this path is commonly hitting
whilst fading in images, for example cairo_paint_with_alpha(), the
fallback was detected during the Moblin boot sequence where it was
adding a second to the overall boot time.
See
fallback intel: Moblin startup is hitting a composite fallback, costing
a ton of performance
https://bugs.freedesktop.org/show_bug.cgi?id=26189
Based on the initial patch by Arjan van de Van.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The prototype says this function returns a Bool and not just an int, so
be pedantic and return TRUE/FALSE.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
One of the convoluted if branches dereferenced Drawable when it is
potentially NULL. Avoid this by explicitly handling the NULL Drawable
cases earlier, and enabling solid fills for solid sources.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I added a jump if there was no src or mask Drawable, but we do actually
need to check for useless src repeats even if we have a source-only
mask.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The fallback log for http://bugs.freedesktop.org/show_bug.cgi?id=26189
does not actually state the reason why we actually fallback. This is
possibly because we need to fallback for reasons other than the
operation cannot be performed in hardware -- such as using an alpha map
or the screen is swapped out, so add this information to the fallback
log.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This reverts commit 3f11bbec42.
For unknown reasons, enabling tiling for the glyph cache is causing
glyph corruption both across suspend and resume and VT switching, on a
wide range of chipsets (reports include both i8xx and gm45)
This strongly suggests that we are handling tiling, or updates to tiled
buffers, incorrectly across i915_gem_idle(). However, until we can find
the root cause, we want to fix this regression before the next stable
release, so simply revert this patch. :(
Fixes:
[Bug 25406] fonts garbled after resuming from suspend since 6729b508http://bugs.freedesktop.org/show_bug.cgi?id=25406
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Maintain a small cache of pixmaps to hold SolidFill pictures. Currently
we create a pixmap the size of the damaged region and fill that using
pixman before downloading it to the GPU and compositing. Needless to say
this is extremely expensive compared to simply emitting the solid
colour. To mitigate this cost, we maintain a small cache of 1x1R
pictures which is recognised by the driver as being a solid, but at the
very least is maintained as a GPU ready pixmap.
This gives a good boost to cairo-xcb (which uses solid fills) on a gm45:
Before:
gnome-terminal-vim: 41.9s
After:
gnome-terminal-vim: 31.7s
Compare with using a cache of 1x1R pixmaps in cairo-xcb:
gnome-terminal-vim: 31.6s
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Around a call to uxa_put_image() it is possible to mix both accelerated
and fallback paths, with the fallback code making the presumed
optimisation of only trying to call uxa_prepare_access() once. This
fails if the accelerated path also uses prepare/finish access on the
same drawable and then later fallback to the fallback path. This can
happen currently if an error is reported whilst attempting to accelerate
PutImage.
#0 memcpy () at ../sysdeps/x86_64/memcpy.S:162
#1 0x00007ffff43ce4bd in fbBlt (srcLine=<value optimized out>, srcStride=40, srcX=<value optimized out>, dstLine=0xffffffffffffffff, dstStride=64, dstX=0, width=<value optimized out>, height=8, alu=3, pm=4294967295, bpp=8, reverse=0, upsidedown=0) at fbblt.c:93
#2 0x00007ffff43ce740 in fbBltStip (src=0xffffffffffffffff, srcStride=156555204, srcX=34, dst=0xfffffffc, dstStride=64, dstX=40, width=304, height=8, alu=3, pm=4294967295, bpp=8) at fbblt.c:944
#3 0x00007ffff4c32c53 in uxa_do_put_image (pDrawable=0x246aa410, pGC=0x2c0a4f0, depth=8, x=0, y=0, w=38, h=8, leftPad=0, format=2, bits=0x954d7c4 "") at uxa-accel.c:196 #4 uxa_do_shm_put_image (pDrawable=0x246aa410, pGC=0x2c0a4f0, depth=8, x=0, y=0, w=38, h=8, leftPad=0, format=2, bits=0x954d7c4 "") at uxa-accel.c:223
#5 uxa_put_image (pDrawable=0x246aa410, pGC=0x2c0a4f0, depth=8, x=0, y=0, w=38, h=8, leftPad=0, format=2, bits=0x954d7c4 "") at uxa-accel.c:289
#6 0x00000000004d574f in damagePutImage (pDrawable=0x246aa410, pGC=0x2c0a4f0, depth=8, x=0, y=0, w=38, h=8, leftPad=0, format=2, pImage=0x954d7c4 "") at damage.c:905
#7 0x00000000004287db in ProcPutImage (client=0x47ca72d0) at dispatch.c:2073
#8 0x000000000042bd94 in Dispatch () at dispatch.c:445
#9 0x000000000042513a in main (argc=4, argv=0x7fffffffe2a8, envp=<value optimized out>) at main.c:285
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Avoid mapping the glyph cache back to the cpu by allocating temporary
buffer objects to store the glyph pixmap and blit to the cache.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Particularly noting to route alpha to the green channel when blending
with a8 destinations.
Fixes:
rendercheck/repeat/triangles regressed
http://bugs.freedesktop.org/show_bug.cgi?id=25047
introduced with commit 14109a.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In the case of x8r8g8b8 and similar where the alpha channel is ignored,
but should be interpreted as being 1, then it is convenient if those bits
are set appropriately in the colour. In order to do so for these formats,
where PIXMAN_FORMAT_A() returns 0 we need to compute the alpha channel
width as the remaining bits instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This avoids a crash when an XRenderComposite call is made with a
-1 value for width/height, (which apparently compiz's gtk-window-
decorator likes to do). Fixes bug:
X crashes in uxa_acquire_pattern when logging in (gdm)
http://bugs.freedesktop.org/show_bug.cgi?id=24724
Signed-off-by: Albert Damen <albrt@gmx.net>
Reviewed-by: Carl Worth <cworth@cworth.org>
Pull the common methods for creating a Picture given a pixman format
into its own method, and tidy the surrounding code. The benefit is that
we can now composite directly to the Picture and so save an intermediate
copy when creating patterns for gradients.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We've talked about doing this since the start of the project, putting it off
until "some convenient time". Just after removing a third of the driver seems
like a convenient time, when backporting's probably not happening much anyway.
Carl Worth did the hard work in identifying that the regression in
cairo between X.org 1.6 and 1.7 was caused by cairo sending an a1
mask to the server in 1.7 whereas in 1.6 cairo used local fallbacks
(as the source was using RepeatPad, which triggers cairo's
'buggy_pad_reflect' fallback for X.org 1.6). This was causing the driver
to do a fallback to handle the a1 mask instead, which due to the GPU
pipeline stall is much more expensive than the equivalent fallback in
cairo.
Reference:
cairo's performance downgrades 4X with server master than server-1.6.
https://bugs.freedesktop.org/show_bug.cgi?id=23184
The fix is a relatively simple extension of the current
uxa_picture_from_pixman_image() to use CompositePicture() instead of
CopyArea() when we need to convert to a new format.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Currently when asked to composite using a gradient source or mask, we
fallback to using fbComposite(). This has the side-effect of causing a
readback on the destination surface, stalling the GPU pipeline. Instead,
like uxa_trapezoids(), we can use pixman to fill a scratch pixmap and then
copy that to an offscreen pixmap for use with uxa_composite().
Speedups on i915:
firefox-talos-svg: 710378.14 -> 549262.96: 1.29x speedup
No slowdowns.
Thanks to Søeren Sandmann Pedersen for spotting the missing
ValidatePicture().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This lets the driver allocate a nice idle buffer object instead of a
busy one, reducing runtime of firefox-20090601 on my G45 from 50.7 (+/- .41%)
to 48.4 (+/- 1.1%).
This was needed when we were doing the mask computations in this pixmap,
but now they're done in a temporary and then uploaded later.
This reduces runtime of firefox-20090601 from 52.6 (+/- .96%) to 50.7
(+/- .41%) seconds on my G45.
DPMS header was split into dpms.h (client) and dpmsconst.h (server). Drivers
need to include dpmsconst.h if xextproto 7.1 is available.
SHM is now shm.h instead of shmstr. Requires definition of ShmFuncs that's
not exported by the server.
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Don't do it, treat this the same as every other prepare access call in uxa.
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Owain Ainsworth <zerooa@googlemail.com>
Since we're only doing software rasterization right now, anyway, it
makes more sense to just rasterize to system memory and then upload
to a pixmap once complete. This avoids expensive read-modify-write
cycles.
This results in a 2.4x speedup for a real-world test case that's
heavy on trapezoids, which is swfdec running on the following file:
http://michalevy.com/wp-content/uploads/Giant%20Steps%202007.swf
Many thanks to Chris Wilson for his cairo-traces repository and
cairo-perf-trace tool which makes it so easy to measure things
like this.
With glyphs sitting in per-glyph pixmaps, there's no reason to use the CPU
to move them to the cache pixmap, and lots of reasons to use the accelerator.
Signed-off-by: Keith Packard <keithp@keithp.com>
Without this, rendering component-alpha glyphs may break without a mask.
Bug #19534. Ported from fix by Michel Dänzer <daenzer@vmware.com> in
xserver commit 639f289dcdbe00a516820f573c01a8339e120ed4
This avoids prepare/finish_access_gc overhead when we're not changing things
(since GCTile is already handled) and get us the RW flag for the prepare on
of the stipple pixmap so thing will be synced correctly.
We can get a case with gnome-terminal + links, where we get two arrays
of glyphs all with 0 width and 0 heights in them. If this happens
we manage to get to this case without any buffer setup and segfault.
(cherry picked from commit 717c7492a0f6ba3fb3eabda33515881eef314155)
GCC isn't smart enough to analyze the control flow and figure out that
these are false positives, but initializing them shouldn't hurt, so work
around it.
This eliminates the cost of EXA migration management while providing full
pixmap allocation control to the driver. The goal is to make something
useful for UMA drivers.