As we only use the glyph cache for small glyphs, those large than 32x32
will first be copied to a bo and used as a mask in a composite
operation. We can avoid the allocation and upload per use by allocating
a bo for the over-sized glyph from the start. As the glyph is large
anyway, the excess memory allocation is less significant.
Using normal font sizes, firefox shows no change - as expected. However,
using the 36 font size traces, we see around a 10% improvement on g45.
Before:
xcb firefox-36-20090609 127.333 127.897 0.22%
xcb firefox-36-20090611 87.456 88.624 0.66%
xcb firefox-20090601 19.522 20.194 1.69%
xlib firefox-36-20090609 201.054 201.780 0.18%
xlib firefox-36-20090611 133.468 133.717 0.09%
xlib firefox-20090601 23.740 23.975 0.49%
With large glyphs in bo:
xcb firefox-36-20090609 117.256 118.254 0.42%
xcb firefox-36-20090611 79.462 79.962 0.31%
xcb firefox-20090601 19.658 20.024 0.92%
xlib firefox-36-20090609 185.645 188.202 0.68%
xlib firefox-36-20090611 123.592 124.940 0.54%
xlib firefox-20090601 23.917 24.098 0.38%
Thanks to Owain G. Ainsworth for the suggestion!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This reverts commit 6d50553e8f.
Now we have taught the fallback path not to infinitely recurse,
re-enable the accelerated path for ShmPutImage and friends.
In order to avoid an infinite recursion after enabling CopyArea to use
the put_image acceleration to either stream a blit or to copy in-place,
we cannot call CopyArea from put_image for the fallback path. Instead,
we can simply call pixman_blt directly, which coincidentally is a tiny
bit faster.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This slighlty improves xrender performance on fence reg starved
i8xx hw.
I've also changed a few function calls to the new names from the
compat ones while looking at the code.
The i915 textured video path is not converted because atm the xv
code does not use tiled surfaces.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Often, for example in the fallback for ShmPutImage, we will attempt to
use uxa_copy_area() copying to a normal pixmap from a memory buffer.
This triggers a fallback, and maps the destination pixmap back into the
GTT. The accelerated put_image path will attempt to stream a blit to the
destination pixmap if it is currently active, avoiding the stall.
We appear to have a confusion of stride in terms of pixels, pitch in
terms of bytes and the actual width of the surface.
i830_pad_drawable_width() appears to be operating aligning *pixels* to a
64 pixel boundary and has never used the chars-per-pixel causing
considerable confusion in its callers. Remove the parameter and ensure
that the callers are expecting a value in pixels returned, multiplying
by cpp where necessary to get the pitch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Caught by a malloc library assert.
Note to self: Don't just copy&paste codelines around :(
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Buzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27540
Tested-by: Nick Bowler <nbowler@draconx.ca>
Tested-by: Calvin Walton <calvin.walton@gmail.com>
For some reason I've made a mess out of the overlay stride constrains.
Fix it up.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Calvin Walton <calvin.walton@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27453
In my recent fix for the chroma pitch for i915 xvmc I've forgotten about
i965 class hw. For videos with a non-even sized stride (measured in dwords)
the chroma pitch was internally incosistent and one dword off.
Fix this by using pitch2 for the chroma pitch in i965 textured video like
everywhere else.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27417
Tested-by: Nick Bowler <nbowler@draconx.ca>
Tested-by: Sven Arvidsson <sa@whiz.se>
Simply store the desired bo size in intel_xvmc_context and initialize
it in the driver's create_context function.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... by putting struct intel_xvmc_surface at the beginning. Also kill
the common context handling code and simply keep a pointer in the
surface private to the context.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's unused. Also drop all related generic code that tries to do
clever stuff with this callback. These are all remnants from a
pre-gem world.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
All of these are also stored in the context. Also kill the context
reference counting. Doesn't serve a purpose besides occupying a
pointer to the context in the private surface struct.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... by putting struct intel_xvmc_surface at the beginning. This
will allow to consolidate surface and bo handling.
Also kill some now dead code used to handle the common surface
structure.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We only passed around and actually used the gem handle. Don't
need a struct for one field alone ...
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And kill all the static structures. This way it's clearer what's
common and what's specific. And the code is shorter too.
Also clean up src/i830_hwmc.c - kill the nonstandard surface types
for i915 and the associated code.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Doing the same with the i965 code will allow us to share the
create_context function.
src/i915_hwmc.h is now almost empty. Move the last #defines to
src/xvmv/i915_xvmc.c where they are actually used and delete the
file.
Also rename the ddx context struct to something sane.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Like for the subpicture stuff, share the "do-nothing" functions ...
And fix function name spelling, too.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Like for i915. Also drop that now totally superflous limit on the
available surfaces.
Move the surface struct into the userspace library header now that
the ddx doesn't use it anymore.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The XvMC driver api in the server is insane. Even for optional stuff
like subpicture support it doesn't check for NULL-pointers. So we
have to retain some dummy functions.
Wonder how many copies of these things exist on fdo ...
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Both xvmc are handing in the bo in the exact same way. So move the code
to src/i830_video.c and kill this great oeuvre of spaghetti-code.
The xvmc driver ini and fini also lost their last use, kill them, too.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After unifying i915 and i965, not much will be left of these files.
Therefore merge them to make the following changes easier.
This creates some warnings about some redefined macros, but when this
is all cleaned up they'll all be gone.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pauli pointed out that we take a ref on the front buffer when exchanging
but forget to release it. The ref is necessary since the set functions
will drop refs as necessary, but once we set the front buffer to point
at the back pixmap, we ned to release our private ref again, or we'll
leak buffers.
Reported-by: Pauli Nieminen <suokkos@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After reports of segmentation faults caused by
d6b7f96fde and vmware, the most obvious
cause would be illegally writing to the src data when performing the alpha
fill inline. So force the image upload to go via a fresh buffer whenever
we need to modify the incoming data.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
On memory constrained hardware, tiling is vital for good performance as
it minimizes cache misses. The downside is that for older hardware
(which often suffers from the lack of bandwidth) requires the use of
fences for many operations, which are in short supply and so may cause
shorter batchbuffers. However our batch buffers are typically short and
so this is unlikely to be a concern and not affect the performance wins.
A quick bit of testing suggests the effect is inconclusive on
firefox/i945:
linear tiled
xcb 205.470 206.219
xcb-render-0.0 404.704 388.413
xlib 166.410 170.805
A secondary effect of the patch is to workaround a G31 specific hang
when attempting to use linear 2048x2048 surfaces. Bonus!
Fixes:
Bug 25375 - Performance issue using texture from pixmap (tfp) glx extension on 945
http://bugs.freedesktop.org/show_bug.cgi?id=25375
Bug 27100 - GPU Hung copying a 2048x1152 pixmap
http://bugs.freedesktop.org/show_bug.cgi?id=27100
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: John <jvinla@gmail.com>