Commit Graph

4587 Commits

Author SHA1 Message Date
Chris Wilson fc9531fc2d sna: Move the flush to the backends
This allows us to implement backend specific workarounds and use the
more appropriate device specific flushing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-20 00:02:05 +00:00
Chris Wilson 2e0a534a88 sna/gen7: Forward port recent changes from gen6
Fixes for resubmitting batches after running out of space for vertex
buffers and also a couple of trivial spans functions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-19 19:08:36 +00:00
Chris Wilson 5caf806d42 sna: BLT use dword pitch only for tiled surfaces
The gen4+ spec is a little misleading as states that all BLT pitches for
the XY commands are in dwords. Apparently not, as the upload/download
functions were already demonstrating. This only became apparent when
accelerating core text routines to offscreen pixmaps, such as composited
windows.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-19 17:35:09 +00:00
Chris Wilson dbc75532d5 sna: Tweak move-to-cpu to ignore inplace hint if its already on the CPU
If we test the area to be drawn against the existing CPU damage and find
it is already on the CPU, we may as well continue to utilize that
damaged region.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-19 14:38:57 +00:00
Chris Wilson 7ad4a0c942 sna: Only use the blitter to emit wide spans if we cannot stream the updates
If either the region is busy on the gpu or if we need to read the
destination then we would incur penalties for trying to perform the
operation through the GTT. However, if we are simply streaming pixels to
an unbusy bo then we can do so inplace faster than computing the
corresponding GPU commands and uploading them.

Note: currently it is universally slower to use the GPU here (the
computation of the spans is too slow). However that is only according to
micro-benchmarks, avoiding the readback is likely to be more efficient
in practice.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-19 12:32:59 +00:00
Chris Wilson 9db6b9fad8 sna: Also check for the inplace hint when migrating the whole pixmap
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-19 12:00:13 +00:00
Chris Wilson d3f7d5d614 sna: Only use the blitter to emit spans if we cannot stream the updates
If either the region is busy on the gpu or if we need to read the
destination then we would incur penalties for trying to perform the
operation through the GTT. However, if we are simply streaming pixels to
an unbusy bo then we can do so inplace faster than computing the
corresponding GPU commands and uploading them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-19 11:59:53 +00:00
Chris Wilson ff2eb116ef sna: Micro-optimise line extents for zero line width
Handling zero line widths is the common case, so avoid the extra work.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-19 09:54:44 +00:00
Chris Wilson 3c01074507 sna: filter out degenerate segments whilst drawing unclipped PolySegment
The damage layer was detecting that we were asking it to accumulate a
degenerate box emanating from PolySegment, as the unclipped paths made
the fatal assumption that it would not need to filter out degenerate
boxes. However, a degenerate line becomes a point, does the same apply
to a degenerate segment?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-19 09:54:06 +00:00
Chris Wilson 35f81005f9 sna/damage: Always mark the damage as dirty when recording new boxes
A few of the create_elts() routines missed marking the damage as dirty
so that if only part of the emebbed box was used (i.e. the damage
contained less than 8 rectangles that needed to included in the damage
region) then those were being ignored during migration and testing.

Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=44682
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-19 00:45:09 +00:00
Chris Wilson 36e691ea90 sna: Demote MOVE_READ if the GC operation covers the clip
If the write operation fills the entire clip, then we can demote and
possible avoid having to read back the clip from the GPU provided that
we do not need the destination data due to arithmetic operation or mask.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-19 00:45:09 +00:00
Chris Wilson 17efdbc48c sna: Clip damage area with source extents for fallback
The damage tracking code asserts that it only handles clip regions.
However, sna_copy_area() was failing to ensure that its damage region
was being clipped by the source drawable, leading to out of bounds reads
during forced fallback.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-19 00:45:08 +00:00
Chris Wilson fb07243c9a sna: Fine grained fallback debugging for core drawing routines
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-19 00:45:08 +00:00
Chris Wilson 05f9764a88 sna/damage: Fast path singular regions
Mainly for consistency, so that we treat it like the other damage
addition functions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-19 00:45:08 +00:00
Chris Wilson 96529e345d sna: Make sure we create a mappable GPU bo when streaming writes
If we decide to do the CPU fallback inplace on the GPU bo through a WC
mapping (because it is a large write-only operation), make sure that
the new GPU bo we create is not active and so will not^W^W is less likely
to cause a stall when mapped.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-19 00:45:08 +00:00
Chris Wilson efce896e1d sna: Check number of boxes to migrate during move-to-cpu
When reducing the damage we may find that it is actually empty and so
sna_damage_get_boxes() returns 0, be prepared.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-18 20:53:55 +00:00
Chris Wilson 334f3f70a8 sna/gen3: Set the batch mode for emitting video state
The lack of kgem_set_mode() here is causing some recently added
assertions to fail.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-18 20:09:26 +00:00
Chris Wilson 76203b7070 sna: Almagamate writes based on the total number of bytes written
Cachelines will only be dirtied for the bytes accessed so a better
metric would based on the total number of pages brought into the TLB
and the total number of cachelines used. Base the decision on whether
to try and amalgamate the upload with others on the number of bytes
copied rather than the overall extents.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-18 18:49:42 +00:00
Chris Wilson 470741e84c sna: Debug uploads
All of the asserts and debug options that lead me to believe that the
tiling was completely screwy for some writes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-18 18:49:42 +00:00
Chris Wilson ab387a89cf sna: Update bo->tiling during search_linear_cache
search_linear_cache() was updated to track the first good match whilst it
continued to search for a better match. This resulted in the first good
bo being modified and a record of those modifications lost, in
particular the change in tiling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-18 18:49:42 +00:00
Chris Wilson 4b893ab081 sna: Remove defunct debugging option
FORCE_GPU_ONLY now has no effect except for marking the initial pixmap
as all-damaged on the GPU, and so not testing the paths for which it was
originally introduction.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-18 18:49:42 +00:00
Chris Wilson 965586544a sna/gen6: Don't assume that a batch mode implies a non-empty batch
Just in case we set a mode then fail to emit any dwords. Sounds
inefficient and woe betide the culprit when I find it...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-18 18:49:28 +00:00
Chris Wilson d2e0575036 sna: Fix some tracking of to-be-flushed dri pixmaps
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-18 18:39:29 +00:00
Chris Wilson 1ad5320fd4 sna: Add valgrind markup for tracking CPU mmaps
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-18 15:39:35 +00:00
Chris Wilson f3da610ead sna: Prevent switching rings with render disabled
We fudge forced used of the BLT ring unless we install a render backend
and so we must also prevent the ring from being reset when the GPU is
idle. Therefore we make handing the ring status a backend function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-18 15:27:40 +00:00
Chris Wilson 6d31cb2d94 sna: Restore use of shadow pixmaps by default without RENDER support
If we do not have access to an accelerated render backend, only create
GPU buffers for the scanout and use an accelerated blitter for
upload/download and operating inplace on the scanout.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-18 13:43:20 +00:00
Chris Wilson 15a150579c intel: Trivially remove a piece of XAA dependency for shadow
The wolves are gathering at the door baying for the removal of XAA from
Xorg.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-18 10:27:17 +00:00
Chris Wilson 850495f956 sna: Fix increment of damage boxes after updating for rectangles
Found by valgrind:
==13639== Conditional jump or move depends on uninitialised value(s)
==13639==    at 0x5520B1E: pixman_region_init_rects (in
/usr/lib/x86_64-linux-gnu/libpixman-1.so.0.24.0)
==13639==    by 0x89E6ED7: __sna_damage_reduce (sna_damage.c:489)
==13639==    by 0x89E7FEC: _sna_damage_contains_box (sna_damage.c:1161)
==13639==    by 0x89CFCD9: sna_drawable_use_gpu_bo (sna_damage.h:175)
==13639==    by 0x89D52DA: sna_poly_segment (sna_accel.c:6130)
==13639==    by 0x21F87E: damagePolySegment (damage.c:1096)
==13639==    by 0x1565A2: ProcPolySegment (dispatch.c:1771)
==13639==    by 0x159FB0: Dispatch (dispatch.c:437)
==13639==    by 0x1491D9: main (main.c:287)
==13639==  Uninitialised value was created by a heap allocation
==13639==    at 0x4028693: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13639==    by 0x89E6BFB: _sna_damage_create_boxes (sna_damage.c:205)
==13639==    by 0x89E78F0: _sna_damage_add_rectangles (sna_damage.c:327)
==13639==    by 0x89CD32D: sna_poly_fill_rect_blt.isra.65
(sna_damage.h:68)
==13639==    by 0x89DE23F: sna_poly_fill_rect (sna_accel.c:8366)
==13639==    by 0x21E9C8: damagePolyFillRect (damage.c:1309)
==13639==    by 0x26DD3F: miPaintWindow (miexpose.c:674)
==13639==    by 0x18370A: ChangeWindowAttributes (window.c:1553)
==13639==    by 0x154500: ProcChangeWindowAttributes (dispatch.c:696)
==13639==    by 0x159FB0: Dispatch (dispatch.c:437)
==13639==    by 0x1491D9: main (main.c:287)
==13639==

Use 'count' everywhere for consistency.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 23:12:49 +00:00
Chris Wilson 4b5c9affd4 sna: Restore orginal shadow pointer before uploading CPU damage
Detected by valgrind:
==22012== Source and destination overlap in memcpy(0xd101000, 0xd101000,
783360)
==22012==    at 0x402A180: memcpy (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22012==    by 0x89BD4ED: memcpy_blt (blt.c:209)
==22012==    by 0x89F2921: sna_write_boxes (sna_io.c:364)
==22012==    by 0x89CFABF: sna_pixmap_move_to_gpu (sna_accel.c:1900)
==22012==    by 0x89F49B0: sna_render_pixmap_bo (sna_render.c:571)
==22012==    by 0x8A268CE: gen5_composite_picture (gen5_render.c:1908)
==22012==    by 0x8A29B8A: gen5_render_composite (gen5_render.c:2252)
==22012==    by 0x89E6762: sna_composite (sna_composite.c:485)
==22012==    by 0x21D3C3: damageComposite (damage.c:569)
==22012==    by 0x215963: ProcRenderComposite (render.c:728)
==22012==    by 0x159FB0: Dispatch (dispatch.c:437)
==22012==    by 0x1491D9: main (main.c:287)
==22012==

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 23:12:49 +00:00
Eugeni Dodonov bbd6c81236 sna: check for LLC support
Instead of checking for CPU generation, use the libdrm-provided
I915_PARAM_HAS_LLC instead.

v2: use a define check to verify if we have I915_PARAM_HAS_LLC.

Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 18:40:04 +00:00
Chris Wilson e4efde920b sna: Track whether damage is a complete representation of the dirt
The previous commit undoes a premature optimisation that assumed that
the current damage captured all pixels written. However, it happens to
be a useful optimisation along that path (tracking upload of partial
images), so add the necessary booking that watches for when the union
of cpu and gpu damage is no longer the complete set of all pixels
written, that is if we either migrate from one pixmap to the other, the
undamaged region goes untracked. We also take advantage of whenever we
damage the whole pixel to restore knowledge that our tracking of all
pixels written is complete.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 18:23:43 +00:00
Chris Wilson a9b705f9a7 sna: Mark GPU as all-damaged discarding the CPU bo to prevent stalls
If we discard the CPU bo, we lose knowledge of whatever regions had been
initialised but no longer dirty on the GPU, but instead must assume that
the entirety of the GPU bo is dirty.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 12:58:43 +00:00
Chris Wilson 9d631e26d7 sna: Mark the freshly allocated CPU bo as in the CPU domain
As we immediately use it after creation, we need to inform GEM of the
domain transfer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 12:58:03 +00:00
Chris Wilson dfbf02b877 sna: Add some DBG breadcrumbs to put_image upload paths
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 12:57:55 +00:00
Chris Wilson da90afc32f sna: Add DBG breadcrumbs to gradient initialisation
Put some markers into the debug log as those functions create many
proxies causing a lot of debug noise.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 11:51:25 +00:00
Chris Wilson d14341cb22 sna: Add a render ring detiling read path
For SNB, in case you really, really want to use GPU detiling and not
incur the ring switch. Tweaking when to just mmap the target seems to
gain most anyway...

The ulterior motive is that this provides fallback paths for avoiding
the use of TILING_Y with GTT mmaps which is broken on 855gm.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 08:22:22 +00:00
Chris Wilson 3620f9ca45 sna: Cap pwrite buffer alignment to 64k
We only want to create huge pwrite buffers when populating the inactive
cache for mmapped uploads. In the absence of using mmap for upload, be
more conservative with the alignment value so as not to simply waste
valuable aperture and memory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 00:24:16 +00:00
Chris Wilson b9f59b1099 sna: correct adjust of a stolen 2d read buffer
If we steal a write buffer for creating a pixmap for read back, then we
need to be careful as we will have set the used amount to 0 and then try
to incorrectly decrease by the last row. Fortunately, we do not yet have
any code that attempts to create a 2d buffer for reading.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 00:22:25 +00:00
Chris Wilson 6fc4cdafeb sna: Correct assertion for a partial read buffer
The batch may legitimately be submitted prior to the attachment of the
read buffer, if, for example, we need to switch rings. Therefore update
the assertion to only check that the bo remains in existence via either
a reference from the exec or from the user

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 21:36:04 +00:00
Chris Wilson 377f5e16cd sna/gen[45]: clear the state tracker before setting the formats
When backporting the patches from gen6, I didn't notice the memset that
came later, and this wasn't along the paths checked by rendercheck.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 16:09:57 +00:00
Chris Wilson 6387f2fb8a sna/gen[4567]: x1r5g5b5 is only a render target, not sampler
Whilst we can render to and blend with an depth 15 target, we cannot use
it as a texture with the sampling engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 15:39:42 +00:00
Chris Wilson 8b2bb66666 sna/gen6: Restore the non-pipelined op after every WM binding table update
The hw wants it as demonstrated by the '>' in KDE's menus. Why is it
always KDE that demonstrates coherency problems...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 13:37:45 +00:00
Chris Wilson a11b22d172 sna/gen[23]: Remark the destination bo as dirty after flushing
One of the side-effects of emitting the composite state is that it
tags the destination surface as dirty as a result of the *forthcoming*
operation. So emitting the flush after emitting the composite state
clears that tag, so we need to restore it for future coherency.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 13:37:45 +00:00
Zhigang Gong 2f09363a6e uxa/glamor: Create glamor pixmap by default.
When creating native glamor pixmaps we will get much better performance
than using the textured-drm pixmap, this commit is to make that the
default behaviour when configured to use glamor. Another advantage
of this commit is that  we reduce the risk of encountering the
"incompatible region exists for this name" and the associated
render corruption. And since we now never intentionally allocate
a reusable pixmap we could just make all (intel_glamor) allocations
non-reusable without incurring too great an overhead.

A side effect is that those glamor pixmaps do not have a
valid BO attached to them and thus it fails to get a DRI drawable. This
commit also fixes that problem by adjusting the fixup_shadow mechanism
to recreate a textured-drm pixmap from the native glamor pixmap. I tested
this with mutter, and it works fine.

The performance gain to apply this patch is about 10% to 20% with
different workload.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 10:49:21 +00:00
Chris Wilson fd4c139a39 sna: On LLC systems quietly replace all linear mmappings using the CPU
If the GPU and CPU caches are shared and coherent, we can use a cached
mapping for linear bo in the CPU domain with no penalty and so avoid the
penalty of using WC/UC mappings through the GTT (and any aperture
pressure). We presume that the bo for such mappings are indeed LLC
cached...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 01:30:13 +00:00
Chris Wilson c20a729d0a sna/gen6: Force a batch submission after allocation failure during composite
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 01:30:13 +00:00
Chris Wilson 380a2fca3c sna: Optimise call to composite with single box
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 01:30:13 +00:00
Chris Wilson 9f89250de1 sna: Use the prefer-GPU hint for forcing allocation for core drawing
Similar to the render paths and simpler than the current look up tiling
method.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 01:30:13 +00:00
Chris Wilson 8652bf7a19 sna: Don't track an unmatching tiled bo when searching the linear cache
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 19:56:35 +00:00
Chris Wilson cc4b616990 sna/video: Increase the level of paranoia
In how many different ways can we check that the scanout is allocated
before we start decoding video?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 19:55:50 +00:00