The pitch needs to be set on the pixmap prior to the private
intel_pixmap structure being created so that it can record the correct
value from the pixmap.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we can not accelerate these either as a destination or a source,
don't bother allocating a buffer object for them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the first step to handling unsupported texture formats, double check
that the converted pattern can be used as a texture by the card.
Fixes: rendercheck -t repeat
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
x11perf has a regression
https://bugs.freedesktop.org/show_bug.cgi?id=25068
caused by
commit e581ceb738
i915: Use the color channels to pass along solid sources and masks.
Do not convert 1x1R pixmaps into a solid color as the readback from the
bo negates all the performances advantages of using a smaller vertex
buffer and fewer samplers.
Before (PineView):
aa=66800 glyph/s, rgb=28800 glyphs/s
Now:
aa=96800 glyphs/s, rgb=48500 glyphs/s
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
x11perf regression caused by 2D driver
https://bugs.freedesktop.org/show_bug.cgi?id=28047
caused by
commit a7b800513f
uxa: Extract sub-region from in-memory buffers.
The issue is that as we extract the region prior to checking whether the
composite can in fact be accelerated, we perform expensive surplus
operations. This is particularly noticeable for ComponentAlpha text,
such as rgb10text. The solution here is to rearrange the
check_composite() prior to acquiring the sources, and only extracting
the subregion if the render path can not actually handle the texture.
Performance (on PineView):
a7b800513^: aa=68600 glyphs/s, rgb=29900 glyphs/s
a7b800513: aa=65700 glyphs/s, rgb=13200 glyphs/s
now: aa=66800 glyph/s, rgb=28800 glyphs/s
The residual lossage seems to be from the extra function call and
dixPrivate lookups. Hmm. More warning is the extremely low performance,
however the results are consistent so the improvement looks real...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We can also convert a composite with an integer translation into a
blit, so long as the sample extents remains within the source.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the buffer is too large or not suitable for a GPU operation, we
currently fallback and perform the composite on the CPU. An alternative
is too extract the small region out of the source (as usually the
sample extents are much smaller than the actual surface size) and try
the composite with the new surface.
The effect is particularly noticeable on pathological websites that use
very large background images. For example, http://www.woodtv.com/ uses a
1299x15000 pattern that is obscured by another opaque pattern.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Complete the prepare access for the PutImage fallback via fbCopyArea(),
by remembering to set the private pointer to the GTT mapping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
On older versions of pixman, pixman_blt() can return false if the images
are <= 8bpp. If we are being called from CopyArea, then we cannot return
FALSE here as that will trigger an infinite recursion. Instead we must
manually perform the fallback using fbCopyArea().
Reported-by: Peter Clifton <pcjc2@cam.ac.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is some fallout from my xvmc cleanup.
Original-Patch-by: Rico Tzschichholz <ricotz@t-online.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we need to allocate a new bo for use as a gpu target, first check
if we can reuse a pixmap that has already been relocated into the
aperture as a temporary target, for instance a glyph mask or a clip mask.
Before:
backend test min(s) median(s) stddev.
xlib firefox-planet-gnome 50.568 50.873 0.30%
xcb firefox-planet-gnome 49.686 53.003 3.92%
xlib evolution 40.115 40.131 0.86%
xcb evolution 28.241 28.285 0.18%
After:
backend test min(s) median(s) stddev.
xlib firefox-planet-gnome 47.759 48.233 0.80%
xcb firefox-planet-gnome 48.611 48.657 0.87%
xlib evolution 38.954 38.991 0.05%
xcb evolution 26.561 26.654 0.19%
And even more dramatic improvements when using a font size larger than
the maximum size of the glyph cache:
xcb firefox-36-20090611: 1.79x speedup
xlib firefox-36-20090611: 1.74x speedup
xcb firefox-36-20090609: 1.62x speedup
xlib firefox-36-20090609: 1.59x speedup
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we only use the glyph cache for small glyphs, those large than 32x32
will first be copied to a bo and used as a mask in a composite
operation. We can avoid the allocation and upload per use by allocating
a bo for the over-sized glyph from the start. As the glyph is large
anyway, the excess memory allocation is less significant.
Using normal font sizes, firefox shows no change - as expected. However,
using the 36 font size traces, we see around a 10% improvement on g45.
Before:
xcb firefox-36-20090609 127.333 127.897 0.22%
xcb firefox-36-20090611 87.456 88.624 0.66%
xcb firefox-20090601 19.522 20.194 1.69%
xlib firefox-36-20090609 201.054 201.780 0.18%
xlib firefox-36-20090611 133.468 133.717 0.09%
xlib firefox-20090601 23.740 23.975 0.49%
With large glyphs in bo:
xcb firefox-36-20090609 117.256 118.254 0.42%
xcb firefox-36-20090611 79.462 79.962 0.31%
xcb firefox-20090601 19.658 20.024 0.92%
xlib firefox-36-20090609 185.645 188.202 0.68%
xlib firefox-36-20090611 123.592 124.940 0.54%
xlib firefox-20090601 23.917 24.098 0.38%
Thanks to Owain G. Ainsworth for the suggestion!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This reverts commit 6d50553e8f.
Now we have taught the fallback path not to infinitely recurse,
re-enable the accelerated path for ShmPutImage and friends.
In order to avoid an infinite recursion after enabling CopyArea to use
the put_image acceleration to either stream a blit or to copy in-place,
we cannot call CopyArea from put_image for the fallback path. Instead,
we can simply call pixman_blt directly, which coincidentally is a tiny
bit faster.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This slighlty improves xrender performance on fence reg starved
i8xx hw.
I've also changed a few function calls to the new names from the
compat ones while looking at the code.
The i915 textured video path is not converted because atm the xv
code does not use tiled surfaces.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Often, for example in the fallback for ShmPutImage, we will attempt to
use uxa_copy_area() copying to a normal pixmap from a memory buffer.
This triggers a fallback, and maps the destination pixmap back into the
GTT. The accelerated put_image path will attempt to stream a blit to the
destination pixmap if it is currently active, avoiding the stall.
We appear to have a confusion of stride in terms of pixels, pitch in
terms of bytes and the actual width of the surface.
i830_pad_drawable_width() appears to be operating aligning *pixels* to a
64 pixel boundary and has never used the chars-per-pixel causing
considerable confusion in its callers. Remove the parameter and ensure
that the callers are expecting a value in pixels returned, multiplying
by cpp where necessary to get the pitch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Caught by a malloc library assert.
Note to self: Don't just copy&paste codelines around :(
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Buzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27540
Tested-by: Nick Bowler <nbowler@draconx.ca>
Tested-by: Calvin Walton <calvin.walton@gmail.com>
For some reason I've made a mess out of the overlay stride constrains.
Fix it up.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Calvin Walton <calvin.walton@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27453
In my recent fix for the chroma pitch for i915 xvmc I've forgotten about
i965 class hw. For videos with a non-even sized stride (measured in dwords)
the chroma pitch was internally incosistent and one dword off.
Fix this by using pitch2 for the chroma pitch in i965 textured video like
everywhere else.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27417
Tested-by: Nick Bowler <nbowler@draconx.ca>
Tested-by: Sven Arvidsson <sa@whiz.se>
Simply store the desired bo size in intel_xvmc_context and initialize
it in the driver's create_context function.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... by putting struct intel_xvmc_surface at the beginning. Also kill
the common context handling code and simply keep a pointer in the
surface private to the context.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's unused. Also drop all related generic code that tries to do
clever stuff with this callback. These are all remnants from a
pre-gem world.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
All of these are also stored in the context. Also kill the context
reference counting. Doesn't serve a purpose besides occupying a
pointer to the context in the private surface struct.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... by putting struct intel_xvmc_surface at the beginning. This
will allow to consolidate surface and bo handling.
Also kill some now dead code used to handle the common surface
structure.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>