If the destination cannot fit into the 3D pipeline when we need to
composite, we fallback to doing the operation on the CPU. This is very
slow, and quite easy to trigger on i915 by plugging in an external
display.
An alternative is to extract the extents of the operation from the
destination using the blitter which can usually handle much larger
operations. This gives us a temporary target that can fit into the 3D
pipeline and thus be accelerated, before copying back into the larger
real destination.
For x11perf this boosts glyph rendering on PineView, from 38kglyphs/s to
480kglyphs/s. Just a little shy of the native performance of 601kglyphs/s
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Without using a mask and compositing directly onto the destination,
takes us from 580 kglyphs/s to 850 kglyphs/s on i945 [x11perf -aa10text].
However, the extra intersection check almost entirely cancels out the
speed up and we discover that the glyphs in x11perf are always
overlapping. Nothing is ever easy.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When compositing, we need to convert the box into a rect and so the
advantages of using REGION_TRANSLATE are lost.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In theory this should allow us to pack far more operations into a single
batch buffer, and reduce our overheads.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
By using pwrite() instead of dri_bo_map() we can write to the batch buffer
through the GTT and not be forced to map it back into the CPU domain and
out again, eliminating a double clflush.
Measing x11perf text performance on PineView:
Before:
16000000 trep @ 0.0020 msec (511000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @ 0.0021 msec (480000.0/sec): Char in 80-char rgb line (Charter 10)
After:
16000000 trep @ 0.0019 msec (532000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @ 0.0020 msec (496000.0/sec): Char in 80-char rgb line (Charter 10)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
On my PineView box these represent ~5% overhead on x11perf text:
Before:
16000000 trep @ 0.0020 msec (495000.0/sec): Char in 80-char aa line (Charter 10)
12000000 trep @ 0.0022 msec (461000.0/sec): Char in 80-char rgb line (Charter 10)
After:
16000000 trep @ 0.0020 msec (511000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @ 0.0021 msec (480000.0/sec): Char in 80-char rgb line (Charter 10)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Combine all the calls to composite between prepare_composite and
done_composite into a single primitive list, rather than a primitive
call per composite().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Use composite rather than solid blits in order to bring performance on
a par with the CPU when using GEM and relocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
References:
Bug 28135 - [855GM] Slowdown/High CPU-Usage after Git-Commit
926fbc7d90https://bugs.freedesktop.org/show_bug.cgi?id=28135
The simple answer is that I had assumed that 0 was a reserved value.
However, without the bbp encoded into the format 0 was used for a8r8g8b8
and r5g6b5, which are very common formats!
The other possibility for the slowdown is that gtkperf is using of the
now verboten xrgb formats -- but would in fact be valid if the source
covers the clip and we could fixup the alpha value in the fixed function
combine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Set the correct offset for the gradients patterns after rendering to a
local Picture.
Fixes cairo/test/huge-radial and friends
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the source may not cover the extents, we need to represent those
areas as transparent in the fallback picture, ergo we need an alpha
channel. We could be smarter and force a format conversion when
necessary, and we could let the backend choose the most appropriate
format.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
References:
Bug 28098 Compiz renders shadows wrong, garbage line of pixels along left
and top edge of windows
https://bugs.freedesktop.org/show_bug.cgi?id=28098
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
All textures are now properly declared so that the alpha swizzling
occurs in the sampler or not at all. The downside is that for quite a
few composite operations we have to fallback to software on older
hardware. There is scope for more performing the alpha expansion in
shaders or combiners when we know the picture covers the clip - which is
almost all of the time for normal operations especially those
constructed by Cairo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We no longer workaround the lack of alpha expansion for xrgb textures as
this interferes with EXTEND_NONE, though we could if we know the source
covers the clip...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I'm seeing garbage alpha for rendercheck blend:
x8r8g8b8a 10x10 SRC ar8g8b8a
so disable blitting until I work out if we can fast-path it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Allow us to check whether we can handle the operation using the blitter
prior to doing any work.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I was blindly fixing rendercheck without thinking. We need to force the
alpha value to be in the blend unit and not before -- otherwise we
generate the incorrect result whilst blending. D'oh.
GEM handles serialisation of the new front buffer with respect to page
flipping and rendering and reports back when the flip is complete.
Adding a sync point here is then redundant.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we schedule swaps for some time in the future and may process a
detachment prior to receiving the vblank notification from the kernel,
we need to hold a reference to the buffers for our swap event handler.
Fixes:
Bug 28080 - "glresize" causes X server segfault with indirect rendering.
https://bugs.freedesktop.org/show_bug.cgi?id=28080
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In the change to prevent blitting between incompatible sources, we also
prevented 1x1R pixmaps from being used for solid fills. Reorder the
sequence of conditions to enable this fast path again.
Ensure that garbage is not stored in the unused alpha channel so that
we can rely on it being currently initialiased when used as a source or
returning via GetImage.
Partial fix for rendercheck -t blend
1. Instead of swapping bos, swap the entire private structure.
2. If we update the pixmap bo for the Screen, make sure we update the
reference inside intel->front_buffer so that xrandr still functions.
Fixes:
Bug 27922 - i965: Rapidly resizing OpenGL window causes GPU to hang.
https://bugs.freedesktop.org/show_bug.cgi?id=27922
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we are destined to target an !offscreen drawable, then uploading the
trapezoid mask to a bo is the last thing we actually want to do...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>