The spec says that they must wholly contained with the valid BorderClip
for a Window or within the Pixmap or else a BadMatch is thrown. Rely on
this behaviour and not perform the clipping ourselves.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the source is not attached to a buffer (be it a GPU bo or a CPU bo),
a temporary upload buffer would be required and so it is not worth
forcing the target to the destination in that case (should the target
not be on the GPU already).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the blitter on gen4+ does not require fence registers, it is not
restricted to operating on large objects within the mappable aperture.
As we do not need to operate on such large GPU bo in place, we can relax
the restriction on the maximum bo size for gen4+ to allocate for use
with the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the bo is larger than a quarter of the aperture, it is unlikely that
we will be able to evict enough contiguous space in the GATT to
accommodate that buffer. So don't attempt to map them and use the
indirect access instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
It is preferrable to reuse a slightly larger bo, than it is to create a
fresh one and map it into the aperture. So search the bucket above us as
well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In order to handle rotations and fractional offsets produced by the act
of downsampling, we need to compute the full affine transformation and
apply it to the vertices rather than attempt to fudge it with an integer
offset.
References: https://bugs.freedesktop.org/show_bug.cgi?id=45086
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Even on non-LLC systems if we can prevent the migration of such
objects, we can still benefit immensely from being able to map them into
the GTT as required.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Take advantage that we know we will have to clflush the unbound bo
before use by the GPU and populate it inplace.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to adjust the clip to include the border pixels when migrating
damage from the backing pixmap. This also requires relaxing the
constraint that a read must be within the drawable.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The pathological case being nx1 or 1xm resulting in an illegal allocation
request of 0 bytes.
One such example is
wolframalpha.com: x = (200 + x) / 100
which generates an approximately 8500x1 image and so needs downscaling
to fit in the render pipeline on all but IvyBridge. Bring on Ivy!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
sna_accel.c: In function 'sna_copy_plane':
sna_accel.c:5022:21: warning: 'ret' may be used uninitialized in this
function [-Wuninitialized]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Prepare the source first as this has the dual benefit of letting us
decide how best to proceed with the op (on the CPU or GPU) and prevents
modification of the damage after we have choosen our preferred path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The source window is (src->x, src->y)x(src->width, src->height) in
pixmap space. However, we then need to use this to clip against the
desination region, and so we need to translate from the source
coordinate to the destination coordinate.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This allows us to discard any busy GPU or CPU bo when we know we are
going to clear the shadow pixmap afterwards.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the pixmap is mapped to the GPU bo, we should continue to use the
current mapping rather than revoke it. Otherwise if we write to the GPU
bo inplace, thereby discarding the CPU bo, we set the pointer we are
about to copy to, to NULL.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The mi routines do not ensure that their output is suitably constrained
to the clip extents, so we must run it through the clipper.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In order to avoid using the wrong function for a scratch GC created
during the course of a MI function whilst we have a specialised GC in
use, we need to avoid modifying the original function table.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the drawable_gc_flags() operate on lower level information than the
hint, it is able to spot more oportunities to reduce the READ flags and
so the assertion was overly optimistic.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we are copying a region that does not fill its extents (i.e. is not
singular) then we must be care not to discard the CPU damage that is not
overwritten by the copy.
Fixes regression from 77ee922485
(sna: Use full usage flags for moving the dst pixmap for a copy).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The goal of the heuristic is to reduce readbacks and damage tracking on
active GPU bo whilst simultaneously offering the best performance for
small operations which would prefer to be performed on the shadow rather
than in place.
This restores ShmPutImage performance.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As proxy's are short-lived and are not used outside of the operation for
which they are created, dirtied or flushed, we can safely copy the dirty
status onto the proxy object itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>