Commit Graph

4927 Commits

Author SHA1 Message Date
Chris Wilson f306cd557e sna/dri: Hold a reference to the cached DRI2 buffer on the front buffer
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-26 16:03:43 +01:00
Chris Wilson a87f2b9325 sna/gen4: Check for peculiar initial values for the surface offset
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-26 16:03:43 +01:00
Chris Wilson a505015a25 sna: Force DPMS to be on following a modeset
Similarly to UXA, this papers over inconsistent behaviour in the kernel
in handling the DPMS upon a modeswitch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-25 22:50:47 +01:00
Chris Wilson b7a8c94cdb sna: remove the assert(0)s along error paths
This were there as a debugging aide to see if we ever hit unreachable
code paths - mainly along corruption inducing GPU wedged recovery paths.
They are superfluous and just scare the reader.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-25 20:52:51 +01:00
Chris Wilson 15c0ee445f sna/gen5: Tweak thread allocations
Bump the alloted number of threads to their max. Using more threads than
cores helps hide the stalls due to sampler fetch, math functions and urb
write. Specifying too many threads seems to not incur a performance
regression, suggesting that the hardware scheduler is sane enough not to
overpopulate the EU.

A small but significant boost, peak x11perf -aa10text on an i3-330m is
raised from 1.93Mglyphs/s to 2.35Mglyphs/s.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-25 12:27:57 +01:00
Chris Wilson fa10005ce3 sna/dri: Perform an exchange for a composited windowed SwapBuffers
If the front buffer is not attached to the scanout and has not been
reparented, we can simply exchange the underlying bo between the
front/back attachments and inform the compositor of the damage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-23 17:42:27 +01:00
Chris Wilson 53d735ddb1 sna/dri: Queue windowed swaps
Implement "tripple-buffering" for windowed SwapBuffers by allowing the
client to submit one extra frame before throttling. That is we emit the
vsync'ed blit and immediately unblock the client so that it renders to
the GPU (which is guaranteed to be executed after the blit so that its
Front/Back buffers are still correct) and requests another SwapBuffers.
The subsequent swapbuffers are appended to the vsync chain with the
blit/unblock then executed on the vblank following the original blit.
That is both the client and xserver render concurrently.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-23 14:08:26 +01:00
Chris Wilson 1e9319d5f5 sna: extend RandR to support super sized monitor configurations
With the introduction of the third pipe on IvyBridge it is possible to
encounter situations where the combination of the three monitors exceed
the limits of the scanout engine and so prevent them being used at their
native resolutions. (It is conceivable to hit similar issues on earlier
generation, especially gen2/3.) One workaround, this patch, is to extend
the RandR shadow support to break the extended framebuffer into per-crtc
pixmaps.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-23 14:16:50 +01:00
Chris Wilson e8b090902e sna/gen3+: Remove stale assertions for cached vbo
Following the previous commit, we reset the vbo when it becomes idle
rather than discard it. As such, the assertions to check that we are
discarding the vbo are now bogus.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-22 22:01:37 +01:00
Chris Wilson 565297e6bd sna/gen3+: Keep vbo cached
Once we switch to using a vbo, keep it cached (resetting everytime it is
idle) until we expire our caches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:32:39 +01:00
Chris Wilson d806973e21 sna: Micro-optimise search_inactive_cache
Discard the unneeded next parameter to drop a memory reference in a hot
path, and don't wait for a retirement if we are looking in a larger
bucket than suits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:32:39 +01:00
Chris Wilson d39fef0a7f sna: Tiles are only 128 bytes wide on gen2
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:32:39 +01:00
Chris Wilson 4f2dde1fa3 sna/gen7: Eliminate the pipeline stall after a non-pipelined operation
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:32:39 +01:00
Chris Wilson 3ef05a8d08 sna/gen7: Do not emit a pipeline stall after a non-pipelined command
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:32:39 +01:00
Chris Wilson 4501e131e6 sna/gen7: prefer using RENDER copy
Further testing and the balance of doubt swings in favour of using the
3D pipeline for copies.

For small copies the BLT unit is faster,
2.14M/sec vs 1.71M/sec for comppixwin10

And for large copies the RENDER pipeline is faster,
13000/sec vs 8000/sec for comppixwin500

I think the implication is that we are not efficiently utilising the EU
for small primitives - i.e. something that we might be able to improve.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:31:30 +01:00
Chris Wilson 3da56c48b7 sna/gen7: Prefer using BLT rather than redirect for copies
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:26:25 +01:00
Chris Wilson b1f8386db6 sna/gen7: Emit a pipeline flush after every render operation
For whatever reason, this produces a 30% improvement with the fish-demo
(500 -> 660 fps on i7-3730qm at 1024x768). However, it does cause about
a 5% regression in aa10text. We can appear to alleviate that by only
doing the flush when the composite op != PictOpSrc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:25:32 +01:00
Chris Wilson d02e6d8142 Encode the third pipe using the HIGH_CRTC shift for vblanks
The original vblank interface only understood 2 pipes (primary and
secondary) and so selecting the third pipe (introduced with IvyBridge)
requires use of the HIGH_CRTC. Using the second pipe where we meant the
third pipe could result in some spurious timings when waiting on the
vblank.

Reported-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 16:54:35 +01:00
Chris Wilson f8b67be8d3 sna: Don't clear the needs_flush flag after emitting a flush on the busy bo
We use that flag to check whether we need to check whether the bo is
still busy upon destruction, so only clear it if the bo is marked as
idle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-20 12:39:19 +01:00
Chris Wilson 5419bbb483 sna/gen7: Prefer BLT for copies
It's faster for where the cost of the extra batches and ring switching
do not dominate...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-20 11:45:47 +01:00
Chris Wilson 1c0bb8c4c9 sna/gen7: Keep using RENDER paths for large pixmaps
As the 3D pipeline is quite versatile and we only need to force BLT if
we cannot extract the subregion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-20 10:57:40 +01:00
Chris Wilson b238f64e8a sna/gen[67]: Prefer to not force BLT paths for large pixmaps
The sampler can in fact handler subregions of large pixmaps quite well,
and so we prefer to keep using the 3D pipeline so long as the operation
fits in. If not, then switch to the BLT in order to avoid the temporary
surface dance.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-20 10:46:59 +01:00
Chris Wilson 8141e290b1 sna: Explain why we ignore the busy status result during kgem_bo_flush()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 20:55:18 +01:00
Chris Wilson eb1d07624e sna: Ensure extents is initialised if short-circuit use-cpu-bo
As we may attempt to end up using the GPU bo is the CPU bo is busy, we
need to make sure we have initialised the damage extents first.

Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 16:00:13 +01:00
Chris Wilson 9f216e159b sna: Assert expected return values
Keep the semantic analyser happy by consuming the expected return value
with an assert.

Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 15:57:31 +01:00
Chris Wilson 2dc93b2a6c sna: Check results from syscalls
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 15:34:09 +01:00
Chris Wilson 06634604ab Initialise adaptors to 0 in case xf86XVListGenericAdaptors does not
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 15:28:43 +01:00
Chris Wilson 8bfea58dbc sna: Minor cleanups from sematic analyser in DBG
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 15:26:18 +01:00
Chris Wilson 99845dcb3b Post Damage on the Screen Pixmap after a pageflip
This issue was raised by Dave Airlie as he is trying to integrate
multiple GPUs into the xserver, and a particular setup has a slave
rendering device that copies the contents from the GPU over a
DisplayLink USB adaptor. As such the slave device is listening for
Damage on the Screen Pixmap and needs the update following pageflips.
Since we already are posting damage for all the SwapBuffers paths other
than pageflip, for consistency we should post damage along the pageflip
path as well.

Reported-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 10:43:09 +01:00
Chris Wilson 4acf727941 sna: Initialize the color value for fallback unaligned boxes
Reported-by:Zdenek Kabelac <zkabelac@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=5047
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 10:24:24 +01:00
Chris Wilson b0b2d3c966 sna: Avoid copying unintialised data during source picture upload
If we have never written to a pixmap, then there will be neither a GPU
or shadow pointer and we would attempt to copy a NULL pointer. In this
case as the user is expecting to copy unintialised data we are at
liberty to replace those undefined values with the clear color.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 00:41:35 +01:00
Chris Wilson 38472fcc53 sna: Double check that the source is busy before performing indirect reads
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 00:40:04 +01:00
Chris Wilson 8cdfb8c24c sna: Fix up the shadow pointer on the source when copying
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 00:40:04 +01:00
Chris Wilson 17f3a83fdc sna: Review sna_copy_boxes
A couple of ordering issue and more assertions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-18 23:50:04 +01:00
Chris Wilson a9045699b9 sna: Reset region after transferring to cpu
If we adjust the region for the pixmap offset, be sure that we reset it
before returning it back to the caller.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-18 23:50:03 +01:00
Chris Wilson 9f51311a7d sna: Check if the busy is truly busy before commiting to an indirect upload
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-18 23:50:03 +01:00
Chris Wilson 291b3c4367 sna: Align upload buffers to 128
This seems to be a restriction (observed on 965gm at least) that we
have incoherent sampler cache if we write within 128 bytes of a busy
buffer. This is either due to a restriction on neighbouring cachelines
(like the earlier BLT limitations) or an effect of sampler prefetch.

Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=50477
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-18 23:50:03 +01:00
Chris Wilson 39e5c74915 sna: Assert damage is valid after every addition
Even more paranoia than just checking upon migration.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-18 22:20:01 +01:00
Chris Wilson 92e1693e5f sna: Validate cpu/gpu damage never overlaps
References: https://bugs.freedesktop.org/show_bug.cgi?id=50477
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-18 21:29:51 +01:00
Chris Wilson d2312c8f95 sna: Fixup tracking of vmap upload buffers
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-18 21:30:58 +01:00
Chris Wilson 75e9eeca7e sna: Remove overlapping CPU damage when operating inplace on the GPU
Otherwise we gradually introduce garbage into the picture.

Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50477
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-18 16:39:20 +01:00
Chris Wilson a936466dd4 sna: Prefer to attempt a Composite operation rather than use pixman composite
As pixman composite performance is atrocious for anything other than
solids, prefer to upload the mask and attempt a composite operation on
the GPU unless we are forcing the fallback.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-18 11:36:53 +01:00
Chris Wilson 4b325d6e2b sna: Fix rendering of unaligned boxes through pixman
Not only do we need to make sure the source is available to the CPU, we
need to actually check the right conditions for clipping the box.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-18 11:29:56 +01:00
Chris Wilson caef27492b sna: convert another instance of applying the clear to the CPU pixmap
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-17 21:00:34 +01:00
Chris Wilson 8695c4c776 sna: Fix the blt composite op with no-ops
When returning early because the operation is a no-op, we still need to
fill in the function pointers to prevent a later NULL dereference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-17 17:14:06 +01:00
Chris Wilson 7905ddae1d sna: Further refine choice of placement when uploading source data.
The goal is cheaply spot a simple copy operation that can be performed
on the CPU without having to load both parties onto the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-17 17:14:06 +01:00
Chris Wilson 5a675b61f2 sna: Correct typo forcing everything to be clear to 0!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-17 15:42:17 +01:00
Chris Wilson b55bf1abbe sna: Fix cut'n'paste errors in tiling debug
Rename for different variables

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-17 15:05:33 +01:00
Chris Wilson 9756c60b4a sna/gen7: Enable non-rectilinear spans
Seems we have enough GPU power to overcome the clumsy shaders. Just
imagine the possibilities when we have a true shader for spans...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-17 11:39:33 +01:00
Chris Wilson 41aff56a1f sna: Add tiling for spans
Semmingly only advisable when already committed to using the GPU. This
first pass is still a little naive as it makes no attempt to avoid empty
tiles, nor aims to be efficient.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-17 10:59:55 +01:00