Once again balancing the trade-off of faster smaller copies with the BLT
versus the faster larger copies the RENDER ring.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Notably, if everything is idle, using the BLT is a win as we can emit
them so much faster than a rendercopy, and as the target is uncached we
do not benefit as much from the rendercache.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Align surface sizes to an even number of tile rows to cater for sampler
prefetch. If we read beyond the last page we may catch the PTE in a
state of flux and trigger a GPU hang. Also detected by enabling invalid
PTE access checking.
References: https://bugs.freedesktop.org/show_bug.cgi?id=56916
References: https://bugs.freedesktop.org/show_bug.cgi?id=55984
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we are already the scanout, then there is little point copying to
ourselves... Should be paranoia.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Otherwise we decide to use BLT when hitting the render/sampler cache
is preferrable for a source bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Avoid having to walk the full relocation array for the few entries that
need to be updated for the batch buffer offset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We repeatedly set the alignment value on the first port, rather than
once for each.
Reported-by: Jiri Slaby <jirislaby@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47597
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the user may only write to a portion of a pixmap (thus only creating
a small amount of damage) and then attempt to use the whole as a source,
we run the risk of triggering an assertion that the whole was defined.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
For gen2-5, it does not matter what mode the batch is in when we
insert the scanline wait. With the more aggressive batch flushing, and
relaxed assigned of mode for those generations, we are likely to see
that the batch is idle when we go to insert the waits.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we are asked to render immediately, then in order to pass the tests
when comparing it to target, we need to set the current_msc to the
ultimate future value, -1.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Having reduced all the vb code for these generations to the same set of
routines, we can refactor them into a single set of functions.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The idea being that when creating a surface to perform inplace
rasterisation, we won't be using the GPU for a while and so give it time
to naturally throttle.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Simply ignore the cropping and copy the whole plane rather than
complicate the computation of the packed destination pixels.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>