Commit Graph

5393 Commits

Author SHA1 Message Date
Chris Wilson e625c02e62 sna/damage: Early check for contains-box? if subtract and box outside region
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-30 09:34:21 +01:00
Chris Wilson abd7be1cee sna/dri: Prefer GPU rendering if no more CPU damage on a DRI bo
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-29 16:12:19 +01:00
Chris Wilson 67b87e4f7c sna/dri: Optimise clip reduction with copy-to-front to an unclipped Window
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-29 15:15:21 +01:00
Chris Wilson eae5e1275c sna: Install the ModeSet handler as the base handler
This way we can safely ignore it across server regen.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-29 13:12:52 +01:00
Chris Wilson 15a0761cad sna: Only consider the request list when deciding whether the GPU is busy
Micro-optimisation to overhead extra checks and to make sure an
unflushed bo doesn't prevent us from submitting more work before
sleeping.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-29 12:22:55 +01:00
Chris Wilson 4061f05dd6 sna/trapezoids: Write unaligned fallback boxes inplace
As this is a pure write operation (though we will write the edge pixels
twice) we can perform this operation inplace and incur a slightly slower
trap creation at the benefit of avoiding the later copy.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-29 11:40:18 +01:00
Chris Wilson 44e41536b7 sna/trapezoids: Render the partial left-edge of fallback unaligned boxes
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-29 11:39:47 +01:00
Chris Wilson e6f9bfe1e2 sna: Use currentTime rather than GetTimeInMillis()
The overhead of reading the hpet() on every block handler (more or less)
is exorbitant, so trust that we update currentTime frequently enough to
be a good approximation - the side effect is that we will wakeup
slightly to earlier from using an old value for the current time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-29 11:06:33 +01:00
Chris Wilson c6c4f30e19 sna: Add assertions to check that we do install the timers
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-29 10:14:52 +01:00
Chris Wilson 87c8f5a47e sna: Make the post-flip delay explictit
As the kernel is inconsistent in enforcing this across generations,
handle the synchronisation of the pageflip explicity. Ultimately this
should be replaced with a tripple buffer mechanism.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-29 09:33:09 +01:00
Chris Wilson 31caa43a21 sna/gen5: Check harder for need_upload() fallbacks
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-28 17:05:46 +01:00
Chris Wilson 7c3eb1fda9 sna: Correct inverted logic for checking xrgb drawables
Reported-by: Christoph Reiter <reiter.christoph@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51472
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-28 17:04:26 +01:00
Chris Wilson c3e2c1332d sna: Fix the application of the crtc offset for posting damage
The damage boxes are in framebuffer (source) space, so we need to apply
the offset for the boxes in crtc (destination) space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-28 12:40:59 +01:00
Chris Wilson 47e6bfa4f4 sna: Force use of per-crtc scanout if the offset is too large
On gen4+, the scanout offset into a tiled surface is specified through
the DSPTILEOFF register and limited to 12bits of precision. So if we
have a CRTC positioned in that nether-region, we need to allocate a
separate per-crtc pixmap for it and perform shadowing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-28 12:37:35 +01:00
Chris Wilson 93e77ee019 sna: Quieten kernel debug complaints when disabling crtc
Even if we are obviously turning the crtc off, it still complains if the
number of connectors is non-zero. So make it so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-28 12:02:32 +01:00
Chris Wilson 85e4f48a87 sna: Add a DBG to the periodic flush mechanism
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-28 11:57:53 +01:00
Chris Wilson 87dd6408a5 sna: Correct the reversal of the periodic flushing semantics
Regression from 1e9319d (sna: extend RandR to support super sized
monitor configurations) which tried to take into account the need to
flush the shadow CRTC bo in addition to the normal scanout bo. In the
refactoring of the need_flush(), the double negative was missed.

Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-28 10:58:55 +01:00
Chris Wilson 05f486f64b sna: Flush the per-crtc render caches for rotated scanouts
We need to manually flush the render cache in order for results to be
visible on the scanout.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-28 10:42:21 +01:00
Chris Wilson db79799810 sna: s/width/height/ cut'n'paste typo
Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-27 09:36:58 +01:00
Chris Wilson fcbbe1664a sna: Remove a trailing ';'
The unwanted ';' caused is_cpu() to always return false if a GPU bo was
attached. Not necessary a bad thing, just misses the potential
optimisation where having chosen to prefer to use the CPU path we then
have to migrate to the GPU even though the bo is undamaged or idle.

Spotted-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-27 09:30:44 +01:00
Chris Wilson a072ab5065 test: Add client side copy to FakeFront for emulating CopyBuffer correctly
The server manages FakeFront following a flip, but it the client
optimises a swap by replacing it with a CopyRegion, it is expected to
also update the FakeFront itself. Replicate that behaviour so that the
timings for the test case are consistent with mesa.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-26 17:24:37 +01:00
Chris Wilson 96804c74f8 test: FakeFront rules
Oh my, I just once again rediscovered the copy on every flip due to the
requirement for keeping FakeFront uptodate for reads after a SwapBuffers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-26 17:19:06 +01:00
Chris Wilson f306cd557e sna/dri: Hold a reference to the cached DRI2 buffer on the front buffer
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-26 16:03:43 +01:00
Chris Wilson a87f2b9325 sna/gen4: Check for peculiar initial values for the surface offset
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-26 16:03:43 +01:00
Chris Wilson 8f4221a252 test: Add a simple exercise for DRI2 swap paths
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-26 16:03:16 +01:00
Chris Wilson a505015a25 sna: Force DPMS to be on following a modeset
Similarly to UXA, this papers over inconsistent behaviour in the kernel
in handling the DPMS upon a modeswitch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-25 22:50:47 +01:00
Chris Wilson b7a8c94cdb sna: remove the assert(0)s along error paths
This were there as a debugging aide to see if we ever hit unreachable
code paths - mainly along corruption inducing GPU wedged recovery paths.
They are superfluous and just scare the reader.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-25 20:52:51 +01:00
Chris Wilson 15c0ee445f sna/gen5: Tweak thread allocations
Bump the alloted number of threads to their max. Using more threads than
cores helps hide the stalls due to sampler fetch, math functions and urb
write. Specifying too many threads seems to not incur a performance
regression, suggesting that the hardware scheduler is sane enough not to
overpopulate the EU.

A small but significant boost, peak x11perf -aa10text on an i3-330m is
raised from 1.93Mglyphs/s to 2.35Mglyphs/s.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-25 12:27:57 +01:00
Chris Wilson fa10005ce3 sna/dri: Perform an exchange for a composited windowed SwapBuffers
If the front buffer is not attached to the scanout and has not been
reparented, we can simply exchange the underlying bo between the
front/back attachments and inform the compositor of the damage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-23 17:42:27 +01:00
Chris Wilson 53d735ddb1 sna/dri: Queue windowed swaps
Implement "tripple-buffering" for windowed SwapBuffers by allowing the
client to submit one extra frame before throttling. That is we emit the
vsync'ed blit and immediately unblock the client so that it renders to
the GPU (which is guaranteed to be executed after the blit so that its
Front/Back buffers are still correct) and requests another SwapBuffers.
The subsequent swapbuffers are appended to the vsync chain with the
blit/unblock then executed on the vblank following the original blit.
That is both the client and xserver render concurrently.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-23 14:08:26 +01:00
Chris Wilson 1e9319d5f5 sna: extend RandR to support super sized monitor configurations
With the introduction of the third pipe on IvyBridge it is possible to
encounter situations where the combination of the three monitors exceed
the limits of the scanout engine and so prevent them being used at their
native resolutions. (It is conceivable to hit similar issues on earlier
generation, especially gen2/3.) One workaround, this patch, is to extend
the RandR shadow support to break the extended framebuffer into per-crtc
pixmaps.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-23 14:16:50 +01:00
Chris Wilson e8b090902e sna/gen3+: Remove stale assertions for cached vbo
Following the previous commit, we reset the vbo when it becomes idle
rather than discard it. As such, the assertions to check that we are
discarding the vbo are now bogus.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-22 22:01:37 +01:00
Chris Wilson 565297e6bd sna/gen3+: Keep vbo cached
Once we switch to using a vbo, keep it cached (resetting everytime it is
idle) until we expire our caches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:32:39 +01:00
Chris Wilson d806973e21 sna: Micro-optimise search_inactive_cache
Discard the unneeded next parameter to drop a memory reference in a hot
path, and don't wait for a retirement if we are looking in a larger
bucket than suits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:32:39 +01:00
Chris Wilson d39fef0a7f sna: Tiles are only 128 bytes wide on gen2
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:32:39 +01:00
Chris Wilson 4f2dde1fa3 sna/gen7: Eliminate the pipeline stall after a non-pipelined operation
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:32:39 +01:00
Chris Wilson 3ef05a8d08 sna/gen7: Do not emit a pipeline stall after a non-pipelined command
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:32:39 +01:00
Chris Wilson 4501e131e6 sna/gen7: prefer using RENDER copy
Further testing and the balance of doubt swings in favour of using the
3D pipeline for copies.

For small copies the BLT unit is faster,
2.14M/sec vs 1.71M/sec for comppixwin10

And for large copies the RENDER pipeline is faster,
13000/sec vs 8000/sec for comppixwin500

I think the implication is that we are not efficiently utilising the EU
for small primitives - i.e. something that we might be able to improve.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:31:30 +01:00
Chris Wilson 3da56c48b7 sna/gen7: Prefer using BLT rather than redirect for copies
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:26:25 +01:00
Chris Wilson b1f8386db6 sna/gen7: Emit a pipeline flush after every render operation
For whatever reason, this produces a 30% improvement with the fish-demo
(500 -> 660 fps on i7-3730qm at 1024x768). However, it does cause about
a 5% regression in aa10text. We can appear to alleviate that by only
doing the flush when the composite op != PictOpSrc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 20:25:32 +01:00
Chris Wilson d02e6d8142 Encode the third pipe using the HIGH_CRTC shift for vblanks
The original vblank interface only understood 2 pipes (primary and
secondary) and so selecting the third pipe (introduced with IvyBridge)
requires use of the HIGH_CRTC. Using the second pipe where we meant the
third pipe could result in some spurious timings when waiting on the
vblank.

Reported-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-21 16:54:35 +01:00
Chris Wilson f8b67be8d3 sna: Don't clear the needs_flush flag after emitting a flush on the busy bo
We use that flag to check whether we need to check whether the bo is
still busy upon destruction, so only clear it if the bo is marked as
idle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-20 12:39:19 +01:00
Chris Wilson 5419bbb483 sna/gen7: Prefer BLT for copies
It's faster for where the cost of the extra batches and ring switching
do not dominate...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-20 11:45:47 +01:00
Chris Wilson 1c0bb8c4c9 sna/gen7: Keep using RENDER paths for large pixmaps
As the 3D pipeline is quite versatile and we only need to force BLT if
we cannot extract the subregion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-20 10:57:40 +01:00
Chris Wilson b238f64e8a sna/gen[67]: Prefer to not force BLT paths for large pixmaps
The sampler can in fact handler subregions of large pixmaps quite well,
and so we prefer to keep using the 3D pipeline so long as the operation
fits in. If not, then switch to the BLT in order to avoid the temporary
surface dance.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-20 10:46:59 +01:00
Chris Wilson 38f06a351f uxa: Fix second regression in glyph fallback from 64a4bc
To complete my show of incompetence for the evening, not only do we have
to restore the original source when compositing the mask onto the
destination, we also need to restore the original dst (rather than
composite the mask onto the mask!).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 22:22:12 +01:00
Chris Wilson fda9faee75 uxa: Use the original src for fallback glyph compositing
In 64a4bcb8ce, I introduced a WHITE source for the purposes of
accumulating the glyph mask correctly. Unfortunately I neglected to
restore the original source picture for compositing the glyph mask on
the destination, resulting in a use-after-free and then corruption.

Reported-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 21:01:47 +01:00
Chris Wilson 8141e290b1 sna: Explain why we ignore the busy status result during kgem_bo_flush()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 20:55:18 +01:00
Chris Wilson eb1d07624e sna: Ensure extents is initialised if short-circuit use-cpu-bo
As we may attempt to end up using the GPU bo is the CPU bo is busy, we
need to make sure we have initialised the damage extents first.

Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 16:00:13 +01:00
Chris Wilson 9f216e159b sna: Assert expected return values
Keep the semantic analyser happy by consuming the expected return value
with an assert.

Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-06-19 15:57:31 +01:00