Commit Graph

4754 Commits

Author SHA1 Message Date
Chris Wilson 52b11f63d7 sna: Upconvert fallback trapezoids to a8
Since the hardware only handles a8 without tricky emulation and pixman
insists on using a1 for sharp trapezoids we need to ensure that we
convert the a1 to a8 for our trapezoidal mask.

More worryingly, this path should never be hit...

References: https://bugs.freedesktop.org/show_bug.cgi?id=46156
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-16 11:24:21 +00:00
Chris Wilson 8050ced620 sna/dri: Mark bo as reusable after completion of a flip-event
After the flip chain is completed, any residual buffers are no longer in
use and so available for reuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-15 23:40:02 +00:00
Chris Wilson fc046aabde sna/dri: Don't attempt to change tiling if it is a no-op
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-15 16:08:23 +00:00
Chris Wilson 66cc9c6965 Be paranoid about the definition of container_of
Replace any existing definition with a correct version, since there are
broken container_of macros floating around the xorg includes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-15 11:58:42 +00:00
Chris Wilson c0376b7f7b Add a missing macro for old xorg/list.h
list_last_entry() needs to be defined if we are including the xorg
list.h as opposed to our standalone variant.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-13 00:48:15 +00:00
Chris Wilson 87bed52180 Include a local copy of list.h
In 1.11.903, the list.h was renamed to xorg-list.h with a corresponding
change to all structures. As we carried local fixes to list.h and
extended functionality, just create our own list.h with a bit of
handwaving to protect us for the brief existence of xorg/include/list.h.

Reported-by: Armin K <krejzi@email.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45938
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-11 21:02:22 +00:00
Chris Wilson c64ebee5fd sna/gen6: Prefer the render ring for copies
Slower for fills, but on the current stack faster for copies, both large
and small. Hopefully, when we write some good shaders for SNB, we will
not only improve performance for copies but also make fills faster on
the render ring than the blt?

As the BLT copy routine is GPU bound for copywinpix10, and the RENDER
copy routine is CPU bound and faster, I believe that we have reached the
potential of the BLT ring and not yet saturated the GPU using the render
copy.

Note that we still do not casually switch rings, so the actual routine
chosen will still be selected by the preceeding operations, so is
unlikely to have any effect in practice during, for example, cairo-traces.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-11 13:34:44 +00:00
Chris Wilson 6a9b501774 sna/gen6: Suppress the CS stall for the first command in the batch
The batch emission serves as a full stall, so we do not need to incur a
second before our first rendering.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-11 11:02:53 +00:00
Chris Wilson cbe8bed83f sna/gen7: Mention the depth-stall required before changing VS state
Because one day we may actually start using VS! Copied from the addition
of the w/a to Mesa by Kenneth Graunke.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-11 10:58:05 +00:00
Chris Wilson 6193f2f00f sna: Fix retire after readback
Upon reading, we encounter a serialisation point and so can retire all
requests. However, kgem_bo_retire() wasn't correctly detecting that
barrier and so we continued to using GPU detiling thinking the target
was still busy.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-09 14:16:17 +00:00
Chris Wilson 4d8369f8e6 sna/gen2+: Force upload rather than perform source transformations on the CPU
If both the source and destination is on the CPU, then the thinking was
it would be quicker to operate on those on the CPU rather than copy both
to the GPU and then perform the operation. This turns out to be a false
assumption if transformation is involved -- something to be reconsidered
if pixman should ever be improved.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-08 13:15:46 +00:00
Chris Wilson 8634d461bd sna: Limit max CPU bo size to prevent aperture thrashing on upload
Copying between two objects that consume more than the available GATT
space is a painful experience due to the forced use of an intermediatory
and eviction on every batch. The tiled upload paths are in comparison
remarkably efficient, so favour their use when handling extremely large
buffers.

This reverses the previous idea in that we now prefer large GPU bo
rather than large CPU bo, as the render pipeline is far more flexible
for handling those than the blitter is for handling the CPU bo (at least
for gen4+).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-08 09:30:12 +00:00
Chris Wilson 5b16972d78 sna: Check that we successfully retired an active linear buffer
If we go to the trouble of running retire before searching, we may as
well check that we retired something before proceeding to check all the
inactive lists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-08 09:16:47 +00:00
Chris Wilson 207b4d4482 sna: Relax must-be-blittable rules for gen4+
The render pipeline is actually more flexible than the blitter for
dealing with large surfaces and so the BLT is no longer the limiting
factor on gen4+.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-08 09:16:47 +00:00
Zhigang Gong 13c960db9e uxa/glamor: Use a macro to specify module name.
This depends upon glamor commit b5f8d, just after the 0.3.0 tag.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-08 09:07:42 +00:00
Zhigang Gong 70092bfbc5 uxa/glamor: Refine CloseScreen and InitScreen process.
The previous version calls glamor_egl_close_screen and
glamor_egl_free_screen manually which is not align with
standard process. Now glamor change the way to follow
standard method:

glamor layer and glamor egl layer both have their internal
CloseScreens. The correct sequence is after the I830CloseScreen
is registered, then register glamor_egl_close_screen and
the last one is glamor_close_screen. So we move out the
intel_glamor_init from the intel_uxa_init to I830ScreenInit
and just after the registration of I830CloseScreen.

As the glamor interfaces changed, we need to check the
glamor version when load the glamor egl module to make
sure we are loading the right glamor module. If
failed, it will switch back to UXA path.

This depends upon glamor commit 1bc8bf tagged with version 0.3.0.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-08 09:07:42 +00:00
Chris Wilson 798aad6c95 sna/gen[4-7]: Fix erroneous scale factor for partial large bo render copies
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-07 20:16:48 +00:00
Chris Wilson ea65887261 sna: Apply offsets correctly for partial src/dst in large copy boxes
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-07 15:32:31 +00:00
Chris Wilson 14c91e1084 sna/tiling: Request Y-tiles if we know we cannot BLT to either the src or dst
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-07 15:32:31 +00:00
Chris Wilson 3131217e3e sna: Mark up the temporary allocations
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-07 14:36:21 +00:00
Chris Wilson ec1ccb6bf6 sna: Set the damage for render->copy_boxes to NULL before use
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-07 13:37:52 +00:00
Chris Wilson 58f634b792 sna: Handle tile alignment for untiled large bo more carefully
We ended up trying to align the upper bound to zero as the integer
divsion of the tile width by pixel was zero.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-07 13:32:20 +00:00
Zhigang Gong bf3518ea91 uxa/glamor/dri: Fix a typo bug when fixup glamor pixmap.
Should modify the old pixmap's header not the new one which
was already destroyed.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-07 08:43:08 +00:00
Chris Wilson 1467a4ba1a sna: Use the proper sna_picture_is_solid() test
Rather than the specialised routines that assumed pDrawable was
non-NULL, which was no longer true after f30be6f743.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-06 21:10:35 +00:00
Chris Wilson ef335a65a9 sna: Search all active buckets for a temporary allocation
Reduce the need for creating a new object if we only need the allocation
for a single operation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-06 21:10:35 +00:00
Chris Wilson b7e3aaf773 sna: Use the clipped end-point for recomputing segment length after clipping
References: https://bugs.freedesktop.org/show_bug.cgi?id=45673
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-06 18:08:19 +00:00
Chris Wilson f30be6f743 sna/gen2+: Exclude solids from being classed as requiring an upload
We treat any pixmap that is not attached to either a CPU or GPU bo as
requiring the pixel data to be uploaded to the GPU before we can
composite. Normally this is true, except for the solid cache.

References: https://bugs.freedesktop.org/show_bug.cgi?id=45672
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-06 15:59:21 +00:00
Chris Wilson f009386de8 sna: If we have a CPU bo, do not assert we have shadow pixels
When transferring damage to the GPU, on SNB it is not necessarily true
that we have a shadow pixmap, we may instead have drawn onto an unmapped
CPU bo and now simply need to copy from that bo onto the GPU. Move the
assertion onto the path where it truly matters.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45672
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-06 09:50:03 +00:00
Chris Wilson 22e452ebe0 sna: Disable use of xvmc for SNB+
Not yet implemented, so don't bother setting it to fail.

References: https://bugs.freedesktop.org/show_bug.cgi?id=44874
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-06 09:19:56 +00:00
Chris Wilson a8ed1a02ad sna: Discard the redundant clear of the unbounded area if already clear
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-04 20:13:07 +00:00
Chris Wilson b899a4b696 sna: Always pass the clear colour for PictOpClear
Having made that optimisation for Composite, and then made the
assumption that it is always true in the backends, we failed to clear
the unbounded area outside of a trapezoid since we passed in the
original colour and the operation was optimised as a continuation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-04 20:07:49 +00:00
Chris Wilson c107b90a44 sna/gen6: Reduce PictOpClear to PictOpSrc (with blending disabled)
The advantage of PictOpSrc is that it writes its results directly to
memory bypassing the blend unit.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-04 20:07:45 +00:00
Chris Wilson 4baa2806bc sna: Check if the damage reduces to all before performing the migration
An assert exposed a situation where we had accumulated an unreduced
damage-all and so we were taking the slow path only to discover later
that it was a damage-all and that we had performed needless checks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-04 15:19:37 +00:00
Chris Wilson 2653524dff sna: Reduce the downsample tile size to accommodate alignment
If we need to enlarge the sampled tile due to tiling alignments, the
resulting sample can become larger than we can accommodate through the 3D
pipeline, resulting in FAIL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-04 15:19:37 +00:00
Chris Wilson 93a0b10f16 sna: Apply redirection for the render copy into large pixmaps
If the pixmap is larger than the pipeline, but the operation extents fit
within the pipeline, we may be able to create a proxy target to
transform the operation into one that fits within the constraints of the
render pipeline.

This fixes the infinite recursion hit with partially displayed extremely
large images.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-04 15:19:05 +00:00
Chris Wilson 4774c6b833 sna: Add a couple of sanity checks that the CPU drawable is on the CPU
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-03 09:54:25 +00:00
Chris Wilson 418cd98db7 sna/gen6: Ring switching outweighs the benefits for cairo-traces
At the moment, the jury is still out on whether freely switching rings
for fills is a Good Idea. So make it easier to turn it on and off for
testing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-03 09:53:29 +00:00
Chris Wilson 2d0e7c7ecd sna: Search again for a just-large-enough mapping for inplace uploads
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-01 14:52:56 +00:00
Chris Wilson 55c7088f54 sna: Add debugging code to verify damage extents of fallback paths
After using the CPU, upload the damage and read back the pixels from the
GPU bo and verify that the two are equivalent.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-01 09:19:03 +00:00
Chris Wilson c8fc2cde53 sna: Fill extents for ImageGlyphs
The spec says to fill the characters boxes, which is what the hardware
does. The implementation fills the extents instead. rxvt expects the
former, emacs the latter. Overdraw is a nuisance, but less than leaving
glyphs behind...

Reported-by: walch.martin@web.de
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45438
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-01 09:19:03 +00:00
Chris Wilson 13508ab5ea sna: PolyGlyph supports all of fill/tile/stipple rules
The hw routines only directly supports solid fill so fallback for the
interesting cases. An alternative would be to investigate using the
miPolyGlyph routine to convert the weird fills into spans in order to
fallback. Sounds cheaper to fallback, so wait for an actual use case.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-01 09:19:03 +00:00
Chris Wilson df4e1059a4 sna/gen6: Prefer to do fills using the BLT
Using the BLT is substantially faster than the current shaders for solid
fill. The downside is that it invokes more ring switching.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-02-01 09:07:13 +00:00
Chris Wilson 8b012de0a1 sna/gen5: Always prefer to emit solid fills using the BLT
As the BLT is far, far faster than using a shader.

Improves cairo-demos/chart from 6 to 13 fps.

Reported-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-31 20:30:40 +00:00
Chris Wilson 0a748fc49d sna: Split the tiling limits between upload and copying
The kernel has a bug that prevents pwriting buffers large than the
aperture. Whilst waiting for the fix, limit the upload where possible to
fit within that constraint.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-31 11:21:42 +00:00
Chris Wilson 9c1f8a768c sna: Avoid converting requested Y to X tiling for large pitches on gen4+
The only strong requirement is that to utilize large pitches, the object
must be tiled. Having it as X tiling is a pure convenience to facilitate
use of the blitter. A DRI client may want to keep using Y tiling
instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-31 10:29:02 +00:00
Chris Wilson e872c1011f sna/dri: We need to reduce tiling on older gen if we cannot fence
Only apply the architectural limits to enable bo creation for DRI buffers.

Reported-by: Alban Browaeys <prahal@yahoo.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45414
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-31 10:26:42 +00:00
Chris Wilson a4caf67d8d sna: Trim tile sizes to fit into bo cache
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-31 00:55:12 +00:00
Chris Wilson 3f7c1646c7 sna: Check that the intermediate IO buffer can also be used for blitting
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-31 00:31:21 +00:00
Chris Wilson e504fab6c5 sna: Discard the cleared GPU buffer upon PutImage to the CPU buffer
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-30 23:49:18 +00:00
Chris Wilson ed1c1a7468 sna: Track large objects and limit prefer-gpu hint to small objects
As the GATT is irrespective of actual RAM size, we need to be careful
not to be too generous when allocating GPU bo and their shadows. So
first of all we limit default render targets to those small enough to
fit comfortably in RAM alongside others, and secondly we try to only
keep a single copy of large objects in memory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-30 15:44:44 +00:00