References:
Bug 28098 Compiz renders shadows wrong, garbage line of pixels along left
and top edge of windows
https://bugs.freedesktop.org/show_bug.cgi?id=28098
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
All textures are now properly declared so that the alpha swizzling
occurs in the sampler or not at all. The downside is that for quite a
few composite operations we have to fallback to software on older
hardware. There is scope for more performing the alpha expansion in
shaders or combiners when we know the picture covers the clip - which is
almost all of the time for normal operations especially those
constructed by Cairo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We no longer workaround the lack of alpha expansion for xrgb textures as
this interferes with EXTEND_NONE, though we could if we know the source
covers the clip...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I'm seeing garbage alpha for rendercheck blend:
x8r8g8b8a 10x10 SRC ar8g8b8a
so disable blitting until I work out if we can fast-path it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Allow us to check whether we can handle the operation using the blitter
prior to doing any work.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I was blindly fixing rendercheck without thinking. We need to force the
alpha value to be in the blend unit and not before -- otherwise we
generate the incorrect result whilst blending. D'oh.
GEM handles serialisation of the new front buffer with respect to page
flipping and rendering and reports back when the flip is complete.
Adding a sync point here is then redundant.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we schedule swaps for some time in the future and may process a
detachment prior to receiving the vblank notification from the kernel,
we need to hold a reference to the buffers for our swap event handler.
Fixes:
Bug 28080 - "glresize" causes X server segfault with indirect rendering.
https://bugs.freedesktop.org/show_bug.cgi?id=28080
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In the change to prevent blitting between incompatible sources, we also
prevented 1x1R pixmaps from being used for solid fills. Reorder the
sequence of conditions to enable this fast path again.
Ensure that garbage is not stored in the unused alpha channel so that
we can rely on it being currently initialiased when used as a source or
returning via GetImage.
Partial fix for rendercheck -t blend
1. Instead of swapping bos, swap the entire private structure.
2. If we update the pixmap bo for the Screen, make sure we update the
reference inside intel->front_buffer so that xrandr still functions.
Fixes:
Bug 27922 - i965: Rapidly resizing OpenGL window causes GPU to hang.
https://bugs.freedesktop.org/show_bug.cgi?id=27922
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we are destined to target an !offscreen drawable, then uploading the
trapezoid mask to a bo is the last thing we actually want to do...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to prevent overcommitting the aperture, and in particular if we
allocate a buffer larger than available space we will fail to mmap it in
and rendering will fail. Trying to allocate multiple large buffers in
the aperture, often the case when falling back, causes thrashes and
eviction of useful buffers. So from the outset simply do not allocate a
bo if the the required size is more than half the available aperture
space.
Fixes allocation failure in ocitymap.trace for instance.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The cost of performing relocations outweigh the advantages of using the
blitter for solids with lots of rectangles.
References:
Bug 22127 - [UXA] 50% performance regression for XRenderFillRectangles
https://bugs.freedesktop.org/show_bug.cgi?id=22127
By using the 3D pipeline we improve our performance by around 4x on
i945, measured by the jxbench microbenchmark, and a factor of 10x by
short-cutting to the 3D pipeline for blended rectangles.
Before, on a i945GME:
19982.412060 Ops/s; rects (!); 15x15
9599.131693 Ops/s; rects (!); 75x75
3803.654743 Ops/s; rects (!); 250x250
6836.743772 Ops/s; rects blended; 15x15
1443.750000 Ops/s; rects blended; 75x75
495.335821 Ops/s; rects blended; 250x250
23247.933884 Ops/s; rects composition (!); 15x15
10993.073048 Ops/s; rects composition (!); 75x75
3595.905172 Ops/s; rects composition (!); 250x250
After:
87271.145975 Ops/s; rects (!); 15x15
32347.744361 Ops/s; rects (!); 75x75
5884.177215 Ops/s; rects (!); 250x250
73500.000000 Ops/s; rects blended; 15x15
33580.882353 Ops/s; rects blended; 75x75
5858.811749 Ops/s; rects blended; 250x250
25582.317073 Ops/s; rects composition (!); 15x15
6664.728682 Ops/s; rects composition (!); 75x75
14965.909091 Ops/s; rects composition (!); 250x250 [suspicious]
This has no impact on Cairo, but I have a suspicion from watching xtrace
that Qt likes to blit thousands of 1x1 rectangles with the same colour.
However, we are still around 2-3x slower than the reported figures for
EXA!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The pitch needs to be set on the pixmap prior to the private
intel_pixmap structure being created so that it can record the correct
value from the pixmap.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we can not accelerate these either as a destination or a source,
don't bother allocating a buffer object for them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the first step to handling unsupported texture formats, double check
that the converted pattern can be used as a texture by the card.
Fixes: rendercheck -t repeat
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
x11perf has a regression
https://bugs.freedesktop.org/show_bug.cgi?id=25068
caused by
commit e581ceb738
i915: Use the color channels to pass along solid sources and masks.
Do not convert 1x1R pixmaps into a solid color as the readback from the
bo negates all the performances advantages of using a smaller vertex
buffer and fewer samplers.
Before (PineView):
aa=66800 glyph/s, rgb=28800 glyphs/s
Now:
aa=96800 glyphs/s, rgb=48500 glyphs/s
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
x11perf regression caused by 2D driver
https://bugs.freedesktop.org/show_bug.cgi?id=28047
caused by
commit a7b800513f
uxa: Extract sub-region from in-memory buffers.
The issue is that as we extract the region prior to checking whether the
composite can in fact be accelerated, we perform expensive surplus
operations. This is particularly noticeable for ComponentAlpha text,
such as rgb10text. The solution here is to rearrange the
check_composite() prior to acquiring the sources, and only extracting
the subregion if the render path can not actually handle the texture.
Performance (on PineView):
a7b800513^: aa=68600 glyphs/s, rgb=29900 glyphs/s
a7b800513: aa=65700 glyphs/s, rgb=13200 glyphs/s
now: aa=66800 glyph/s, rgb=28800 glyphs/s
The residual lossage seems to be from the extra function call and
dixPrivate lookups. Hmm. More warning is the extremely low performance,
however the results are consistent so the improvement looks real...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>