Filling the rings is a very unpleasant user experience, so cap the
number of batches we allow to be inflight at any one time.
Interestingly, as also found with SNA, throttling can improve
performance by reducing RSS. However, typically throughput is improved
(at the expense of latency) by oversubscribing work to the GPU and a
10-20% slowdown is commonplace for cairo-traces. Notably, x11perf is
less affected and in particular application level benchmarks show no
change.
Note that this exposes another bug in libdrm-intel 2.4.40 on gen2/3.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The normal source upload into GPU bo knows a few more tricks that we may
want to apply first before copying into the shadow of the GPU bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we have to fallback and the configuration is wonky, make sure that
all known outputs are disabled as we takeover the console.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
config->compat_output needs to be sanitized during device initialization
or we may dereference an invalid xf86OutputPtr.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Copied from commit c789d06cf8
Author: Dave Airlie <airlied@redhat.com>
Date: Mon Jan 7 13:57:21 2013 +1000
This fixes the damage posting to happen in the correct ordering,
not sure if this fixes anything, but it should make things more consistent.
This fixes the damage posting to happen in the correct ordering,
not sure if this fixes anything, but it should make things more consistent.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we have yet to update a pipe for a pageflip, then the state remains
consistent and we can fallback to a blit without disabling any pipes. If
we fail after flipping a pipe, then unless we disable an output the
state becomes inconsistent (the pipes disagree on what the attached fb
is).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The general texcoord emitter does handle solids (for the case of a
transformed mask) and so we need to be careful to setup the
VERTEX_ELEMENTS accordingly.
Fixes regression from
commit 2559cfcc4c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Jan 2 10:22:14 2013 +0000
sna/gen4+: Specialise linear vertex emissio
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
So that in the cache where we are driving multiple independent screens
each having their own device, we do not share the global reserved
request in the event of an allocation failure.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In the case where the kernel is inserting semaphores to serialise work
between rings, we want to only delay the surface that is coming from the
other ring and not interfere with work already queued.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Avoid offsetting the overhead of the render copy only to be penalised by
the overhead of the semaphore. So compromise.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Once again balancing the trade-off of faster smaller copies with the BLT
versus the faster larger copies the RENDER ring.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Notably, if everything is idle, using the BLT is a win as we can emit
them so much faster than a rendercopy, and as the target is uncached we
do not benefit as much from the rendercache.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Align surface sizes to an even number of tile rows to cater for sampler
prefetch. If we read beyond the last page we may catch the PTE in a
state of flux and trigger a GPU hang. Also detected by enabling invalid
PTE access checking.
References: https://bugs.freedesktop.org/show_bug.cgi?id=56916
References: https://bugs.freedesktop.org/show_bug.cgi?id=55984
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>