If we need to enlarge the sampled tile due to tiling alignments, the
resulting sample can become larger than we can accommodate through the 3D
pipeline, resulting in FAIL.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the pixmap is larger than the pipeline, but the operation extents fit
within the pipeline, we may be able to create a proxy target to
transform the operation into one that fits within the constraints of the
render pipeline.
This fixes the infinite recursion hit with partially displayed extremely
large images.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
At the moment, the jury is still out on whether freely switching rings
for fills is a Good Idea. So make it easier to turn it on and off for
testing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
After using the CPU, upload the damage and read back the pixels from the
GPU bo and verify that the two are equivalent.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The spec says to fill the characters boxes, which is what the hardware
does. The implementation fills the extents instead. rxvt expects the
former, emacs the latter. Overdraw is a nuisance, but less than leaving
glyphs behind...
Reported-by: walch.martin@web.de
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45438
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The hw routines only directly supports solid fill so fallback for the
interesting cases. An alternative would be to investigate using the
miPolyGlyph routine to convert the weird fills into spans in order to
fallback. Sounds cheaper to fallback, so wait for an actual use case.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Using the BLT is substantially faster than the current shaders for solid
fill. The downside is that it invokes more ring switching.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the BLT is far, far faster than using a shader.
Improves cairo-demos/chart from 6 to 13 fps.
Reported-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The kernel has a bug that prevents pwriting buffers large than the
aperture. Whilst waiting for the fix, limit the upload where possible to
fit within that constraint.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The only strong requirement is that to utilize large pitches, the object
must be tiled. Having it as X tiling is a pure convenience to facilitate
use of the blitter. A DRI client may want to keep using Y tiling
instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Only apply the architectural limits to enable bo creation for DRI buffers.
Reported-by: Alban Browaeys <prahal@yahoo.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45414
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the GATT is irrespective of actual RAM size, we need to be careful
not to be too generous when allocating GPU bo and their shadows. So
first of all we limit default render targets to those small enough to
fit comfortably in RAM alongside others, and secondly we try to only
keep a single copy of large objects in memory.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Whilst the old mapping is guaranteed to be larger than the requested
allocation size, keep track of the actual size allows for better packing
of future buffers. And the code also performs a sanity check that the
buffer is the size we claim it to be...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Given that we now handle uploads to and from bo that are larger than the
aperture and that usage of such large bo is rare and so unlikely to
benefit from caching, allow them to be created as render targets and
destroy as soon as they become inactive.
In principle, this finally enables GPU acceleration of ocitysmap on gen4+,
but due to the large cost of creating and destroying large bo it is
disabled on systems that require clflushing. It is, however, a
pre-requisite for exploiting the enhanced capabilities of IvyBridge.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We can juggle rendering into large bo on gen4 by redirecting the
rendering through a proxy that is tile aligned, and so the render target
may be slightly larger than the tiling step size. As that is then larger
than the maximum 3D pipeline, the trick fails and we need to resort to a
temporary render target with copies in and out. In this case, check that
the tile is aligned to the most pessimistic tiling width and reduce the
step size to accomodate the enlargement.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As on gen4+, tiling increases the maximum usable pitch we can
accommodate wider pixmaps on the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Having noticed that eog was failing to perform a 8k x 8k copy with
compiz running on a 965gm, it was time the checks for batch overflow
were implemented.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we are attempting to copy between two large bo, larger than we can
fit into the aperture, break the copy into smaller steps and use an
intermediatory.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we may have a constrained aperture, we need to be careful not to
exceed our resources limits when uploading the pixel data. (For example,
fitting two of the maximum bo into a single batch may fail due to
fragmentation of the GATT.) So be cautious and use more tiles to reduce
the size of each individual batch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Make sure we prevent the readback of an active source GPU bo by always
prefering to do the copy on the GPU if the data is already resisent.
This fixes the second regression from e583af9cc, (sna: Experiment with
creating large objects as CPU bo).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This fixes the performance regression introduced with e583af9cca,
(sna: Experiment with creating large objects as CPU bo), as we ended up
creating fresh bo and incurring setup and thrashing overhead, when we
already had plenty cached.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
On gen4+ devices the maximum render pitch is much larger than is simply
required for the maximum coordinates. This makes it possible to use
proxy textures as a subimage into the oversized texture without having
to blit into a temporary copy for virtually every single bo we use.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Such large bo place extreme stress on the system, for example trying to
mmap a 1GiB into the CPU domain currently fails due to a kernel bug. :(
So if you can avoid the swap thrashing during the upload, the ddx can now
handle 16k x 16k images on gen4+ on the GPU. That is fine until you want
two such images...
The real complication comes in uploading (and downloading) from such
large textures as they are too large for a single operation with
automatic detiling via either the BLT or the RENDER ring. We could do
manual tiling/switching or, as this patch does, tile the transfer in
chunks small enough to fit into either pipeline.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The presumption that the pixmap is the scanout and so will always be
pinned is false if there is a shadow or under a compositor. In those
cases, the pixmap may be idle and so the GPU bo reaped. This was
compounded by that the video path did not mark the pixmap as busy. So
whilst watching a video under xfce4 with compositing enabled (has to be
a non-GL compositor) the video would suddenly stall.
Reported-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45279
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the render target is thin enough to fit within the 3D pipeline, but is
too tall, we can fudge the address of the origin and coordinates to fit
within the constaints of the pipeline.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the source is thin enough such that the pitch is within the sampler's
constraints and the sample size is small enough, just fudge the origin
of the bo such that it can be sampled.
This avoids having to create a temporary bo and use the BLT to extract
it and helps, for example, firefox-asteroids which uses an 64x11200
texture atlas.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Combine the two very similar routines that decided if we should render
into the GPU bo, CPU bo or shadow pixmap into a single function.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the hw is wedged, then the pixmap creation routines will return an
ordinary unattached pixmap. The code presumed that it would only return
a pixmap with an attached bo, and so would segfault as it chased the
invalid pointer after a GPU hang and the server was restarted.
Considering that we already checked that the GPU wasn't wedged before we
started, this is just mild paranoia, but on a run-once piece of code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The spec says that they must wholly contained with the valid BorderClip
for a Window or within the Pixmap or else a BadMatch is thrown. Rely on
this behaviour and not perform the clipping ourselves.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the source is not attached to a buffer (be it a GPU bo or a CPU bo),
a temporary upload buffer would be required and so it is not worth
forcing the target to the destination in that case (should the target
not be on the GPU already).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the blitter on gen4+ does not require fence registers, it is not
restricted to operating on large objects within the mappable aperture.
As we do not need to operate on such large GPU bo in place, we can relax
the restriction on the maximum bo size for gen4+ to allocate for use
with the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the bo is larger than a quarter of the aperture, it is unlikely that
we will be able to evict enough contiguous space in the GATT to
accommodate that buffer. So don't attempt to map them and use the
indirect access instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>