If the render operation requires a temporary source Picture and the
operation is large, larger than the maximum permitted bo, then we will
fail to allocate the bo. In this case, we need to fallback and perform
the operation on the CPU rather than dereference a NULL bo.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34399
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
uxa_acquire_solid returns NULL under OOM. Thus the value of solid
must be checked before dereferencing it in the uxa_get_offscreen()
call.
Signed-off-by: Bryce Harrington <bryce@canonical.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The region is used to paint onto the backing pixmap (and thus
translated) prior to being passed to the damage layer (wrt to the
drawable). So the local translation needs to be undone first.
Identified by Christopher James Halse Rogers.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33650
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the source is outside the drawable, then CopyArea will fail to
initialise the source correctly. The simplest fix in this case is to
fallback to pixman to generate the source texture.
Fixes:
Bug 28497 - Graphics corruption after opening a specific website
https://bugs.freedesktop.org/show_bug.cgi?id=28497
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is wildly optimistic, but it should work in a surprising number of
error situations and some output in those cases will be hopefully be
better than none...
If we submit a batchbuffer and the kernel reports the GPU is hung (which
will be caused by an earlier execbuffer, and so the kernel should have
had enough time to determine whether or not it could reset the GPU) then
disable any further attempt to accelerate gfx and force fallbacks to map
the buffers and use the CPU. We cannot normally map any more buffers if
the GPU is hung, so only those already mapped prior to the hang can be
written to, or those allocated in system memory. However, we can expect
that the framebuffer is already mapped, and so have a reasonable
expectation to continue to see the display update.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[xserver-1.8] The damage layer doesn't wrap CompositeRects, so we need to
manually append the damaged region ourselves. This works for
miCompsiteRects since that translates the call into multiple invocations
of either PolyFillRectangle or Composite, which themselves cause damage.
Fixes:
Bug 28120 - Tint2's tooltip borders end up at 0,0 and do not disappear
https://bugs.freedesktop.org/show_bug.cgi?id=28120
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Since we have at most 8 bits of alpha, we treat >= 0xff00 as opaque.
However, being paranoid we should set the alpha value to 0xfff in case
something unexpected happens when converting from the xRenderColor to
the pixel value.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the destination cannot fit into the 3D pipeline when we need to
composite, we fallback to doing the operation on the CPU. This is very
slow, and quite easy to trigger on i915 by plugging in an external
display.
An alternative is to extract the extents of the operation from the
destination using the blitter which can usually handle much larger
operations. This gives us a temporary target that can fit into the 3D
pipeline and thus be accelerated, before copying back into the larger
real destination.
For x11perf this boosts glyph rendering on PineView, from 38kglyphs/s to
480kglyphs/s. Just a little shy of the native performance of 601kglyphs/s
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Without using a mask and compositing directly onto the destination,
takes us from 580 kglyphs/s to 850 kglyphs/s on i945 [x11perf -aa10text].
However, the extra intersection check almost entirely cancels out the
speed up and we discover that the glyphs in x11perf are always
overlapping. Nothing is ever easy.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When compositing, we need to convert the box into a rect and so the
advantages of using REGION_TRANSLATE are lost.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Use composite rather than solid blits in order to bring performance on
a par with the CPU when using GEM and relocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Set the correct offset for the gradients patterns after rendering to a
local Picture.
Fixes cairo/test/huge-radial and friends
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the source may not cover the extents, we need to represent those
areas as transparent in the fallback picture, ergo we need an alpha
channel. We could be smarter and force a format conversion when
necessary, and we could let the backend choose the most appropriate
format.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
References:
Bug 28098 Compiz renders shadows wrong, garbage line of pixels along left
and top edge of windows
https://bugs.freedesktop.org/show_bug.cgi?id=28098
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I'm seeing garbage alpha for rendercheck blend:
x8r8g8b8a 10x10 SRC ar8g8b8a
so disable blitting until I work out if we can fast-path it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Allow us to check whether we can handle the operation using the blitter
prior to doing any work.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In the change to prevent blitting between incompatible sources, we also
prevented 1x1R pixmaps from being used for solid fills. Reorder the
sequence of conditions to enable this fast path again.
If we are destined to target an !offscreen drawable, then uploading the
trapezoid mask to a bo is the last thing we actually want to do...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The cost of performing relocations outweigh the advantages of using the
blitter for solids with lots of rectangles.
References:
Bug 22127 - [UXA] 50% performance regression for XRenderFillRectangles
https://bugs.freedesktop.org/show_bug.cgi?id=22127
By using the 3D pipeline we improve our performance by around 4x on
i945, measured by the jxbench microbenchmark, and a factor of 10x by
short-cutting to the 3D pipeline for blended rectangles.
Before, on a i945GME:
19982.412060 Ops/s; rects (!); 15x15
9599.131693 Ops/s; rects (!); 75x75
3803.654743 Ops/s; rects (!); 250x250
6836.743772 Ops/s; rects blended; 15x15
1443.750000 Ops/s; rects blended; 75x75
495.335821 Ops/s; rects blended; 250x250
23247.933884 Ops/s; rects composition (!); 15x15
10993.073048 Ops/s; rects composition (!); 75x75
3595.905172 Ops/s; rects composition (!); 250x250
After:
87271.145975 Ops/s; rects (!); 15x15
32347.744361 Ops/s; rects (!); 75x75
5884.177215 Ops/s; rects (!); 250x250
73500.000000 Ops/s; rects blended; 15x15
33580.882353 Ops/s; rects blended; 75x75
5858.811749 Ops/s; rects blended; 250x250
25582.317073 Ops/s; rects composition (!); 15x15
6664.728682 Ops/s; rects composition (!); 75x75
14965.909091 Ops/s; rects composition (!); 250x250 [suspicious]
This has no impact on Cairo, but I have a suspicion from watching xtrace
that Qt likes to blit thousands of 1x1 rectangles with the same colour.
However, we are still around 2-3x slower than the reported figures for
EXA!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the first step to handling unsupported texture formats, double check
that the converted pattern can be used as a texture by the card.
Fixes: rendercheck -t repeat
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
x11perf regression caused by 2D driver
https://bugs.freedesktop.org/show_bug.cgi?id=28047
caused by
commit a7b800513f
uxa: Extract sub-region from in-memory buffers.
The issue is that as we extract the region prior to checking whether the
composite can in fact be accelerated, we perform expensive surplus
operations. This is particularly noticeable for ComponentAlpha text,
such as rgb10text. The solution here is to rearrange the
check_composite() prior to acquiring the sources, and only extracting
the subregion if the render path can not actually handle the texture.
Performance (on PineView):
a7b800513^: aa=68600 glyphs/s, rgb=29900 glyphs/s
a7b800513: aa=65700 glyphs/s, rgb=13200 glyphs/s
now: aa=66800 glyph/s, rgb=28800 glyphs/s
The residual lossage seems to be from the extra function call and
dixPrivate lookups. Hmm. More warning is the extremely low performance,
however the results are consistent so the improvement looks real...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We can also convert a composite with an integer translation into a
blit, so long as the sample extents remains within the source.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the buffer is too large or not suitable for a GPU operation, we
currently fallback and perform the composite on the CPU. An alternative
is too extract the small region out of the source (as usually the
sample extents are much smaller than the actual surface size) and try
the composite with the new surface.
The effect is particularly noticeable on pathological websites that use
very large background images. For example, http://www.woodtv.com/ uses a
1299x15000 pattern that is obscured by another opaque pattern.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>