As on gen4+, tiling increases the maximum usable pitch we can
accommodate wider pixmaps on the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Having noticed that eog was failing to perform a 8k x 8k copy with
compiz running on a 965gm, it was time the checks for batch overflow
were implemented.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we are attempting to copy between two large bo, larger than we can
fit into the aperture, break the copy into smaller steps and use an
intermediatory.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we may have a constrained aperture, we need to be careful not to
exceed our resources limits when uploading the pixel data. (For example,
fitting two of the maximum bo into a single batch may fail due to
fragmentation of the GATT.) So be cautious and use more tiles to reduce
the size of each individual batch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Make sure we prevent the readback of an active source GPU bo by always
prefering to do the copy on the GPU if the data is already resisent.
This fixes the second regression from e583af9cc, (sna: Experiment with
creating large objects as CPU bo).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This fixes the performance regression introduced with e583af9cca,
(sna: Experiment with creating large objects as CPU bo), as we ended up
creating fresh bo and incurring setup and thrashing overhead, when we
already had plenty cached.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
On gen4+ devices the maximum render pitch is much larger than is simply
required for the maximum coordinates. This makes it possible to use
proxy textures as a subimage into the oversized texture without having
to blit into a temporary copy for virtually every single bo we use.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Such large bo place extreme stress on the system, for example trying to
mmap a 1GiB into the CPU domain currently fails due to a kernel bug. :(
So if you can avoid the swap thrashing during the upload, the ddx can now
handle 16k x 16k images on gen4+ on the GPU. That is fine until you want
two such images...
The real complication comes in uploading (and downloading) from such
large textures as they are too large for a single operation with
automatic detiling via either the BLT or the RENDER ring. We could do
manual tiling/switching or, as this patch does, tile the transfer in
chunks small enough to fit into either pipeline.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The presumption that the pixmap is the scanout and so will always be
pinned is false if there is a shadow or under a compositor. In those
cases, the pixmap may be idle and so the GPU bo reaped. This was
compounded by that the video path did not mark the pixmap as busy. So
whilst watching a video under xfce4 with compositing enabled (has to be
a non-GL compositor) the video would suddenly stall.
Reported-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45279
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the render target is thin enough to fit within the 3D pipeline, but is
too tall, we can fudge the address of the origin and coordinates to fit
within the constaints of the pipeline.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the source is thin enough such that the pitch is within the sampler's
constraints and the sample size is small enough, just fudge the origin
of the bo such that it can be sampled.
This avoids having to create a temporary bo and use the BLT to extract
it and helps, for example, firefox-asteroids which uses an 64x11200
texture atlas.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Combine the two very similar routines that decided if we should render
into the GPU bo, CPU bo or shadow pixmap into a single function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the hw is wedged, then the pixmap creation routines will return an
ordinary unattached pixmap. The code presumed that it would only return
a pixmap with an attached bo, and so would segfault as it chased the
invalid pointer after a GPU hang and the server was restarted.
Considering that we already checked that the GPU wasn't wedged before we
started, this is just mild paranoia, but on a run-once piece of code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The spec says that they must wholly contained with the valid BorderClip
for a Window or within the Pixmap or else a BadMatch is thrown. Rely on
this behaviour and not perform the clipping ourselves.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the source is not attached to a buffer (be it a GPU bo or a CPU bo),
a temporary upload buffer would be required and so it is not worth
forcing the target to the destination in that case (should the target
not be on the GPU already).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the blitter on gen4+ does not require fence registers, it is not
restricted to operating on large objects within the mappable aperture.
As we do not need to operate on such large GPU bo in place, we can relax
the restriction on the maximum bo size for gen4+ to allocate for use
with the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the bo is larger than a quarter of the aperture, it is unlikely that
we will be able to evict enough contiguous space in the GATT to
accommodate that buffer. So don't attempt to map them and use the
indirect access instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
It is preferrable to reuse a slightly larger bo, than it is to create a
fresh one and map it into the aperture. So search the bucket above us as
well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In order to handle rotations and fractional offsets produced by the act
of downsampling, we need to compute the full affine transformation and
apply it to the vertices rather than attempt to fudge it with an integer
offset.
References: https://bugs.freedesktop.org/show_bug.cgi?id=45086
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Even on non-LLC systems if we can prevent the migration of such
objects, we can still benefit immensely from being able to map them into
the GTT as required.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Take advantage that we know we will have to clflush the unbound bo
before use by the GPU and populate it inplace.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to adjust the clip to include the border pixels when migrating
damage from the backing pixmap. This also requires relaxing the
constraint that a read must be within the drawable.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The pathological case being nx1 or 1xm resulting in an illegal allocation
request of 0 bytes.
One such example is
wolframalpha.com: x = (200 + x) / 100
which generates an approximately 8500x1 image and so needs downscaling
to fit in the render pipeline on all but IvyBridge. Bring on Ivy!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
sna_accel.c: In function 'sna_copy_plane':
sna_accel.c:5022:21: warning: 'ret' may be used uninitialized in this
function [-Wuninitialized]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Prepare the source first as this has the dual benefit of letting us
decide how best to proceed with the op (on the CPU or GPU) and prevents
modification of the damage after we have choosen our preferred path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The source window is (src->x, src->y)x(src->width, src->height) in
pixmap space. However, we then need to use this to clip against the
desination region, and so we need to translate from the source
coordinate to the destination coordinate.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This allows us to discard any busy GPU or CPU bo when we know we are
going to clear the shadow pixmap afterwards.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>