Commit Graph

4732 Commits

Author SHA1 Message Date
Zhigang Gong bf3518ea91 uxa/glamor/dri: Fix a typo bug when fixup glamor pixmap.
Should modify the old pixmap's header not the new one which
was already destroyed.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-07 08:43:08 +00:00
Chris Wilson 1467a4ba1a sna: Use the proper sna_picture_is_solid() test
Rather than the specialised routines that assumed pDrawable was
non-NULL, which was no longer true after f30be6f743.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-06 21:10:35 +00:00
Chris Wilson ef335a65a9 sna: Search all active buckets for a temporary allocation
Reduce the need for creating a new object if we only need the allocation
for a single operation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-06 21:10:35 +00:00
Chris Wilson b7e3aaf773 sna: Use the clipped end-point for recomputing segment length after clipping
References: https://bugs.freedesktop.org/show_bug.cgi?id=45673
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-06 18:08:19 +00:00
Chris Wilson f30be6f743 sna/gen2+: Exclude solids from being classed as requiring an upload
We treat any pixmap that is not attached to either a CPU or GPU bo as
requiring the pixel data to be uploaded to the GPU before we can
composite. Normally this is true, except for the solid cache.

References: https://bugs.freedesktop.org/show_bug.cgi?id=45672
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-06 15:59:21 +00:00
Chris Wilson f009386de8 sna: If we have a CPU bo, do not assert we have shadow pixels
When transferring damage to the GPU, on SNB it is not necessarily true
that we have a shadow pixmap, we may instead have drawn onto an unmapped
CPU bo and now simply need to copy from that bo onto the GPU. Move the
assertion onto the path where it truly matters.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45672
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-06 09:50:03 +00:00
Chris Wilson 22e452ebe0 sna: Disable use of xvmc for SNB+
Not yet implemented, so don't bother setting it to fail.

References: https://bugs.freedesktop.org/show_bug.cgi?id=44874
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-06 09:19:56 +00:00
Chris Wilson a8ed1a02ad sna: Discard the redundant clear of the unbounded area if already clear
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-04 20:13:07 +00:00
Chris Wilson b899a4b696 sna: Always pass the clear colour for PictOpClear
Having made that optimisation for Composite, and then made the
assumption that it is always true in the backends, we failed to clear
the unbounded area outside of a trapezoid since we passed in the
original colour and the operation was optimised as a continuation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-04 20:07:49 +00:00
Chris Wilson c107b90a44 sna/gen6: Reduce PictOpClear to PictOpSrc (with blending disabled)
The advantage of PictOpSrc is that it writes its results directly to
memory bypassing the blend unit.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-04 20:07:45 +00:00
Chris Wilson 4baa2806bc sna: Check if the damage reduces to all before performing the migration
An assert exposed a situation where we had accumulated an unreduced
damage-all and so we were taking the slow path only to discover later
that it was a damage-all and that we had performed needless checks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-04 15:19:37 +00:00
Chris Wilson 2653524dff sna: Reduce the downsample tile size to accommodate alignment
If we need to enlarge the sampled tile due to tiling alignments, the
resulting sample can become larger than we can accommodate through the 3D
pipeline, resulting in FAIL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-04 15:19:37 +00:00
Chris Wilson 93a0b10f16 sna: Apply redirection for the render copy into large pixmaps
If the pixmap is larger than the pipeline, but the operation extents fit
within the pipeline, we may be able to create a proxy target to
transform the operation into one that fits within the constraints of the
render pipeline.

This fixes the infinite recursion hit with partially displayed extremely
large images.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-04 15:19:05 +00:00
Chris Wilson 4774c6b833 sna: Add a couple of sanity checks that the CPU drawable is on the CPU
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-03 09:54:25 +00:00
Chris Wilson 418cd98db7 sna/gen6: Ring switching outweighs the benefits for cairo-traces
At the moment, the jury is still out on whether freely switching rings
for fills is a Good Idea. So make it easier to turn it on and off for
testing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-03 09:53:29 +00:00
Chris Wilson 2d0e7c7ecd sna: Search again for a just-large-enough mapping for inplace uploads
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-01 14:52:56 +00:00
Chris Wilson 55c7088f54 sna: Add debugging code to verify damage extents of fallback paths
After using the CPU, upload the damage and read back the pixels from the
GPU bo and verify that the two are equivalent.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-01 09:19:03 +00:00
Chris Wilson c8fc2cde53 sna: Fill extents for ImageGlyphs
The spec says to fill the characters boxes, which is what the hardware
does. The implementation fills the extents instead. rxvt expects the
former, emacs the latter. Overdraw is a nuisance, but less than leaving
glyphs behind...

Reported-by: walch.martin@web.de
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45438
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-01 09:19:03 +00:00
Chris Wilson 13508ab5ea sna: PolyGlyph supports all of fill/tile/stipple rules
The hw routines only directly supports solid fill so fallback for the
interesting cases. An alternative would be to investigate using the
miPolyGlyph routine to convert the weird fills into spans in order to
fallback. Sounds cheaper to fallback, so wait for an actual use case.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-01 09:19:03 +00:00
Chris Wilson df4e1059a4 sna/gen6: Prefer to do fills using the BLT
Using the BLT is substantially faster than the current shaders for solid
fill. The downside is that it invokes more ring switching.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-01 09:07:13 +00:00
Chris Wilson 8b012de0a1 sna/gen5: Always prefer to emit solid fills using the BLT
As the BLT is far, far faster than using a shader.

Improves cairo-demos/chart from 6 to 13 fps.

Reported-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-31 20:30:40 +00:00
Chris Wilson 0a748fc49d sna: Split the tiling limits between upload and copying
The kernel has a bug that prevents pwriting buffers large than the
aperture. Whilst waiting for the fix, limit the upload where possible to
fit within that constraint.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-31 11:21:42 +00:00
Chris Wilson 9c1f8a768c sna: Avoid converting requested Y to X tiling for large pitches on gen4+
The only strong requirement is that to utilize large pitches, the object
must be tiled. Having it as X tiling is a pure convenience to facilitate
use of the blitter. A DRI client may want to keep using Y tiling
instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-31 10:29:02 +00:00
Chris Wilson e872c1011f sna/dri: We need to reduce tiling on older gen if we cannot fence
Only apply the architectural limits to enable bo creation for DRI buffers.

Reported-by: Alban Browaeys <prahal@yahoo.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45414
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-31 10:26:42 +00:00
Chris Wilson a4caf67d8d sna: Trim tile sizes to fit into bo cache
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-31 00:55:12 +00:00
Chris Wilson 3f7c1646c7 sna: Check that the intermediate IO buffer can also be used for blitting
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-31 00:31:21 +00:00
Chris Wilson e504fab6c5 sna: Discard the cleared GPU buffer upon PutImage to the CPU buffer
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-30 23:49:18 +00:00
Chris Wilson ed1c1a7468 sna: Track large objects and limit prefer-gpu hint to small objects
As the GATT is irrespective of actual RAM size, we need to be careful
not to be too generous when allocating GPU bo and their shadows. So
first of all we limit default render targets to those small enough to
fit comfortably in RAM alongside others, and secondly we try to only
keep a single copy of large objects in memory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-30 15:44:44 +00:00
Chris Wilson d53d93ffa6 sna: Update the partial buffer allocation size when reusing an old mapping
Whilst the old mapping is guaranteed to be larger than the requested
allocation size, keep track of the actual size allows for better packing
of future buffers. And the code also performs a sanity check that the
buffer is the size we claim it to be...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-30 15:39:30 +00:00
Chris Wilson 6f99555b6b sna: Allow the creation of render targets larger than the maximum bo cache
Given that we now handle uploads to and from bo that are larger than the
aperture and that usage of such large bo is rare and so unlikely to
benefit from caching, allow them to be created as render targets and
destroy as soon as they become inactive.

In principle, this finally enables GPU acceleration of ocitysmap on gen4+,
but due to the large cost of creating and destroying large bo it is
disabled on systems that require clflushing. It is, however, a
pre-requisite for exploiting the enhanced capabilities of IvyBridge.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-30 12:05:51 +00:00
Chris Wilson c65ec096e7 sna: Decrease tiling step size in case we need to enlarge the box later
We can juggle rendering into large bo on gen4 by redirecting the
rendering through a proxy that is tile aligned, and so the render target
may be slightly larger than the tiling step size. As that is then larger
than the maximum 3D pipeline, the trick fails and we need to resort to a
temporary render target with copies in and out. In this case, check that
the tile is aligned to the most pessimistic tiling width and reduce the
step size to accomodate the enlargement.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-30 12:05:51 +00:00
Chris Wilson 95f3734dd6 sna: Allow creation of proxies to proxies
Just update the offset of the new bo by the offset of the existing
proxy.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-30 12:05:51 +00:00
Chris Wilson 488937edb6 sna: Base prefer-gpu hint on default tiling choice
As on gen4+, tiling increases the maximum usable pitch we can
accommodate wider pixmaps on the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-30 12:05:48 +00:00
Chris Wilson ca252e5b51 sna: Detect batch overflow and fallback rather an risk an ENOSPC
Having noticed that eog was failing to perform a 8k x 8k copy with
compiz running on a 965gm, it was time the checks for batch overflow
were implemented.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-29 20:12:39 +00:00
Chris Wilson 3aee521bf2 sna: Add a tiled fallback for large BLT copies
If we are attempting to copy between two large bo, larger than we can
fit into the aperture, break the copy into smaller steps and use an
intermediatory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-29 20:12:39 +00:00
Chris Wilson df148c9621 sna: Limit the tile size for uploading into large pixmaps
As we may have a constrained aperture, we need to be careful not to
exceed our resources limits when uploading the pixel data. (For example,
fitting two of the maximum bo into a single batch may fail due to
fragmentation of the GATT.) So be cautious and use more tiles to reduce
the size of each individual batch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-29 20:12:39 +00:00
Chris Wilson e1e67e8f39 sna: Fix the "trivial" fix to improve error handling
The logic was just backwards and we tried to upload a shadowless GPU
pixmap.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-29 15:43:42 +00:00
Chris Wilson d3fb1e1e89 sna: Handle GPU creation failure when uploading subtexture
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-29 14:47:12 +00:00
Chris Wilson 518a99ea34 sna: Always create a GPU bo for copying from an existent source GPU bo
Make sure we prevent the readback of an active source GPU bo by always
prefering to do the copy on the GPU if the data is already resisent.
This fixes the second regression from e583af9cc, (sna: Experiment with
creating large objects as CPU bo).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-29 14:47:12 +00:00
Chris Wilson 624d9843ab sna: Ignore map status and pick the first inactive bo for reuse
This fixes the performance regression introduced with e583af9cca,
(sna: Experiment with creating large objects as CPU bo), as we ended up
creating fresh bo and incurring setup and thrashing overhead, when we
already had plenty cached.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-29 14:47:12 +00:00
Chris Wilson 5c6255ba2f sna: Determine whether to use a partial proxy based on the pitch
On gen4+ devices the maximum render pitch is much larger than is simply
required for the maximum coordinates. This makes it possible to use
proxy textures as a subimage into the oversized texture without having
to blit into a temporary copy for virtually every single bo we use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-29 14:47:12 +00:00
Chris Wilson 65466f8626 sna: Allow ridiculously large bo, up to half the total GATT
Such large bo place extreme stress on the system, for example trying to
mmap a 1GiB into the CPU domain currently fails due to a kernel bug. :(
So if you can avoid the swap thrashing during the upload, the ddx can now
handle 16k x 16k images on gen4+ on the GPU. That is fine until you want
two such images...

The real complication comes in uploading (and downloading) from such
large textures as they are too large for a single operation with
automatic detiling via either the BLT or the RENDER ring. We could do
manual tiling/switching or, as this patch does, tile the transfer in
chunks small enough to fit into either pipeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-29 14:47:12 +00:00
Chris Wilson 03211f4b0b sna: Guard against the upload buffer growing past the maximum bo size
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-27 23:20:36 +00:00
Chris Wilson 2afd49a284 sna: Limit inplace upload buffers to maximum mappable size
References: https://bugs.freedesktop.org/show_bug.cgi?id=45323
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-27 23:11:07 +00:00
Chris Wilson 8f4bae01e3 sna/video: Ensure the video pixmap is on the GPU
The presumption that the pixmap is the scanout and so will always be
pinned is false if there is a shadow or under a compositor. In those
cases, the pixmap may be idle and so the GPU bo reaped. This was
compounded by that the video path did not mark the pixmap as busy. So
whilst watching a video under xfce4 with compositing enabled (has to be
a non-GL compositor) the video would suddenly stall.

Reported-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45279
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-27 20:34:30 +00:00
Chris Wilson d02bd80b2f sna: Use a proxy rather than a temporary bo for too-tall but thin targets
If the render target is thin enough to fit within the 3D pipeline, but is
too tall, we can fudge the address of the origin and coordinates to fit
within the constaints of the pipeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-27 20:34:30 +00:00
Chris Wilson ea433995a3 sna: Experiment with a partial source
If the source is thin enough such that the pitch is within the sampler's
constraints and the sample size is small enough, just fudge the origin
of the bo such that it can be sampled.

This avoids having to create a temporary bo and use the BLT to extract
it and helps, for example, firefox-asteroids which uses an 64x11200
texture atlas.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-27 19:34:39 +00:00
Chris Wilson ad910949be sna: Mark diagonal lines as partial write
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-27 18:37:39 +00:00
Chris Wilson b9c83e0b2c sna/video: Add some DBG messages to track the error paths
References: https://bugs.freedesktop.org/show_bug.cgi?id=45279
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-27 18:25:01 +00:00
Chris Wilson 45d831c8b1 sna: Consolidate routines to choice destination bo
Combine the two very similar routines that decided if we should render
into the GPU bo, CPU bo or shadow pixmap into a single function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-27 18:25:01 +00:00