uxa_acquire_solid returns NULL under OOM. Thus the value of solid
must be checked before dereferencing it in the uxa_get_offscreen()
call.
Signed-off-by: Bryce Harrington <bryce@canonical.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
planemask is an unsigned long initialised to ~0, on 64-bit this is not equal
to an (unsigned int)-1.
Use the macro provided to do this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
A slight confusion in computing the correction image location resulted
in the application of the source offsets to the pixel location in the
target and not in the source as intended.
Fixes the visual corruption of the scrollbar in Chromium, and hopefully
the crash reported by Robert Hooker when starting gdm after plymouth.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Now with streaming uploads and downloads for composite operations in
place, shared memory pixmaps are no longer that dire performance wise.
With careful use these can in fact be the most efficient means of
transfer between a wholly software renderer in the client and a backing
store. For instance, Chromium renders internally to an ARGB32 image
buffer and uses a shared pixmap to composite dirty regions into the
backing store. Thereby using the GPU to either perform the blit or the
format conversion. Enabling shared pixmaps, reduces our CPU overhead
whilst scrolling by a factor of 5 or so.
And this is achieved simply by deleting obsolete code!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
All but uxa_copy_window() perform the preliminary checks for whether
acceleration is available. The simplest method for adding the fallback
for uxa_copy_window() seems to be to add it in the core copy function,
so be it.
This allows X to survive a little longer once we encounter a GPU hang.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is wildly optimistic, but it should work in a surprising number of
error situations and some output in those cases will be hopefully be
better than none...
If we submit a batchbuffer and the kernel reports the GPU is hung (which
will be caused by an earlier execbuffer, and so the kernel should have
had enough time to determine whether or not it could reset the GPU) then
disable any further attempt to accelerate gfx and force fallbacks to map
the buffers and use the CPU. We cannot normally map any more buffers if
the GPU is hung, so only those already mapped prior to the hang can be
written to, or those allocated in system memory. However, we can expect
that the framebuffer is already mapped, and so have a reasonable
expectation to continue to see the display update.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Due to the relocation overhead, using a single composite with many
rectangles outperforms many solid blits.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the destination cannot fit into the 3D pipeline when we need to
composite, we fallback to doing the operation on the CPU. This is very
slow, and quite easy to trigger on i915 by plugging in an external
display.
An alternative is to extract the extents of the operation from the
destination using the blitter which can usually handle much larger
operations. This gives us a temporary target that can fit into the 3D
pipeline and thus be accelerated, before copying back into the larger
real destination.
For x11perf this boosts glyph rendering on PineView, from 38kglyphs/s to
480kglyphs/s. Just a little shy of the native performance of 601kglyphs/s
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Use composite rather than solid blits in order to bring performance on
a par with the CPU when using GEM and relocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Allow us to check whether we can handle the operation using the blitter
prior to doing any work.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This reverts commit 6d50553e8f.
Now we have taught the fallback path not to infinitely recurse,
re-enable the accelerated path for ShmPutImage and friends.
Often, for example in the fallback for ShmPutImage, we will attempt to
use uxa_copy_area() copying to a normal pixmap from a memory buffer.
This triggers a fallback, and maps the destination pixmap back into the
GTT. The accelerated put_image path will attempt to stream a blit to the
destination pixmap if it is currently active, avoiding the stall.
Around a call to uxa_put_image() it is possible to mix both accelerated
and fallback paths, with the fallback code making the presumed
optimisation of only trying to call uxa_prepare_access() once. This
fails if the accelerated path also uses prepare/finish access on the
same drawable and then later fallback to the fallback path. This can
happen currently if an error is reported whilst attempting to accelerate
PutImage.
#0 memcpy () at ../sysdeps/x86_64/memcpy.S:162
#1 0x00007ffff43ce4bd in fbBlt (srcLine=<value optimized out>, srcStride=40, srcX=<value optimized out>, dstLine=0xffffffffffffffff, dstStride=64, dstX=0, width=<value optimized out>, height=8, alu=3, pm=4294967295, bpp=8, reverse=0, upsidedown=0) at fbblt.c:93
#2 0x00007ffff43ce740 in fbBltStip (src=0xffffffffffffffff, srcStride=156555204, srcX=34, dst=0xfffffffc, dstStride=64, dstX=40, width=304, height=8, alu=3, pm=4294967295, bpp=8) at fbblt.c:944
#3 0x00007ffff4c32c53 in uxa_do_put_image (pDrawable=0x246aa410, pGC=0x2c0a4f0, depth=8, x=0, y=0, w=38, h=8, leftPad=0, format=2, bits=0x954d7c4 "") at uxa-accel.c:196 #4 uxa_do_shm_put_image (pDrawable=0x246aa410, pGC=0x2c0a4f0, depth=8, x=0, y=0, w=38, h=8, leftPad=0, format=2, bits=0x954d7c4 "") at uxa-accel.c:223
#5 uxa_put_image (pDrawable=0x246aa410, pGC=0x2c0a4f0, depth=8, x=0, y=0, w=38, h=8, leftPad=0, format=2, bits=0x954d7c4 "") at uxa-accel.c:289
#6 0x00000000004d574f in damagePutImage (pDrawable=0x246aa410, pGC=0x2c0a4f0, depth=8, x=0, y=0, w=38, h=8, leftPad=0, format=2, pImage=0x954d7c4 "") at damage.c:905
#7 0x00000000004287db in ProcPutImage (client=0x47ca72d0) at dispatch.c:2073
#8 0x000000000042bd94 in Dispatch () at dispatch.c:445
#9 0x000000000042513a in main (argc=4, argv=0x7fffffffe2a8, envp=<value optimized out>) at main.c:285
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We've talked about doing this since the start of the project, putting it off
until "some convenient time". Just after removing a third of the driver seems
like a convenient time, when backporting's probably not happening much anyway.
This eliminates the cost of EXA migration management while providing full
pixmap allocation control to the driver. The goal is to make something
useful for UMA drivers.