Commit Graph

4553 Commits

Author SHA1 Message Date
Chris Wilson da90afc32f sna: Add DBG breadcrumbs to gradient initialisation
Put some markers into the debug log as those functions create many
proxies causing a lot of debug noise.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 11:51:25 +00:00
Chris Wilson d14341cb22 sna: Add a render ring detiling read path
For SNB, in case you really, really want to use GPU detiling and not
incur the ring switch. Tweaking when to just mmap the target seems to
gain most anyway...

The ulterior motive is that this provides fallback paths for avoiding
the use of TILING_Y with GTT mmaps which is broken on 855gm.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 08:22:22 +00:00
Chris Wilson 3620f9ca45 sna: Cap pwrite buffer alignment to 64k
We only want to create huge pwrite buffers when populating the inactive
cache for mmapped uploads. In the absence of using mmap for upload, be
more conservative with the alignment value so as not to simply waste
valuable aperture and memory.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 00:24:16 +00:00
Chris Wilson b9f59b1099 sna: correct adjust of a stolen 2d read buffer
If we steal a write buffer for creating a pixmap for read back, then we
need to be careful as we will have set the used amount to 0 and then try
to incorrectly decrease by the last row. Fortunately, we do not yet have
any code that attempts to create a 2d buffer for reading.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-17 00:22:25 +00:00
Chris Wilson 6fc4cdafeb sna: Correct assertion for a partial read buffer
The batch may legitimately be submitted prior to the attachment of the
read buffer, if, for example, we need to switch rings. Therefore update
the assertion to only check that the bo remains in existence via either
a reference from the exec or from the user

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 21:36:04 +00:00
Chris Wilson 377f5e16cd sna/gen[45]: clear the state tracker before setting the formats
When backporting the patches from gen6, I didn't notice the memset that
came later, and this wasn't along the paths checked by rendercheck.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 16:09:57 +00:00
Chris Wilson 6387f2fb8a sna/gen[4567]: x1r5g5b5 is only a render target, not sampler
Whilst we can render to and blend with an depth 15 target, we cannot use
it as a texture with the sampling engine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 15:39:42 +00:00
Chris Wilson 8b2bb66666 sna/gen6: Restore the non-pipelined op after every WM binding table update
The hw wants it as demonstrated by the '>' in KDE's menus. Why is it
always KDE that demonstrates coherency problems...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 13:37:45 +00:00
Chris Wilson a11b22d172 sna/gen[23]: Remark the destination bo as dirty after flushing
One of the side-effects of emitting the composite state is that it
tags the destination surface as dirty as a result of the *forthcoming*
operation. So emitting the flush after emitting the composite state
clears that tag, so we need to restore it for future coherency.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 13:37:45 +00:00
Zhigang Gong 2f09363a6e uxa/glamor: Create glamor pixmap by default.
When creating native glamor pixmaps we will get much better performance
than using the textured-drm pixmap, this commit is to make that the
default behaviour when configured to use glamor. Another advantage
of this commit is that  we reduce the risk of encountering the
"incompatible region exists for this name" and the associated
render corruption. And since we now never intentionally allocate
a reusable pixmap we could just make all (intel_glamor) allocations
non-reusable without incurring too great an overhead.

A side effect is that those glamor pixmaps do not have a
valid BO attached to them and thus it fails to get a DRI drawable. This
commit also fixes that problem by adjusting the fixup_shadow mechanism
to recreate a textured-drm pixmap from the native glamor pixmap. I tested
this with mutter, and it works fine.

The performance gain to apply this patch is about 10% to 20% with
different workload.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 10:49:21 +00:00
Chris Wilson fd4c139a39 sna: On LLC systems quietly replace all linear mmappings using the CPU
If the GPU and CPU caches are shared and coherent, we can use a cached
mapping for linear bo in the CPU domain with no penalty and so avoid the
penalty of using WC/UC mappings through the GTT (and any aperture
pressure). We presume that the bo for such mappings are indeed LLC
cached...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 01:30:13 +00:00
Chris Wilson c20a729d0a sna/gen6: Force a batch submission after allocation failure during composite
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 01:30:13 +00:00
Chris Wilson 380a2fca3c sna: Optimise call to composite with single box
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 01:30:13 +00:00
Chris Wilson 9f89250de1 sna: Use the prefer-GPU hint for forcing allocation for core drawing
Similar to the render paths and simpler than the current look up tiling
method.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-16 01:30:13 +00:00
Chris Wilson 8652bf7a19 sna: Don't track an unmatching tiled bo when searching the linear cache
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 19:56:35 +00:00
Chris Wilson cc4b616990 sna/video: Increase the level of paranoia
In how many different ways can we check that the scanout is allocated
before we start decoding video?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 19:55:50 +00:00
Chris Wilson 7f480ba02c sna: Tidy search through active bo cache
Perform the assertions upon cache consistency upfront, and tidy the
indentation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 19:53:39 +00:00
Chris Wilson 6f7bc35d7f sna: Use indirect uploads rather than teardown existing CPU maps
Allow the snoopable CPU mapping to be used inplace of the GTT map for
untiled bo.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 18:14:24 +00:00
Chris Wilson 475fa67ed3 sna: Fast path move-area-to-cpu when the pixmap is already on the cpu
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 17:30:00 +00:00
Chris Wilson 37ced44a53 sna: Be a little more lenient wrt damage migration if we have CPU bo
The idea being that they facilitate copying to and from the CPU, but
also we want to avoid stalling on any pixels help by the CPU bo.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 15:35:57 +00:00
Chris Wilson e3732a6f7f sna: Defer ring switching until after a period of idleness
Similar to the desire to flush the next batch after an overflow, we do
not want to incur any lag in the midst of drawing, even if that lag is
mitigated by GPU semaphores.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 11:06:59 +00:00
Chris Wilson 5df7147b09 sna: Restore the kgem_create_map() symbol
As the stub is exported to the driver even in the absence of vmapping.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 10:28:00 +00:00
Chris Wilson be53740c6f sna: Various DBG typos
Fix some mispellings inside the DBG messages

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 10:16:13 +00:00
Chris Wilson 349e9a7b94 sna: Prefer read-boxes inplace again
Using the gpu to do the detiling just incurs extra latency and an extra
copy, so go back to using a fence and GTT mapping for the common path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 10:06:01 +00:00
Chris Wilson 09dc8b1b35 sna/gen7: Check reused source for validity
Be sure the mask picture has a valid format even though it points to the
same pixels as the valid source. And also be wary if the source was
converted to a solid, but the mask is not.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 09:48:53 +00:00
Chris Wilson d9871f01d8 sna/gen6: Check reused source for validity
Be sure the mask picture has a valid format even though it points to the
same pixels as the valid source. And also be wary if the source was
converted to a solid, but the mask is not.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 09:48:52 +00:00
Chris Wilson 1d6030322e sna/gen5: Check reused source for validity
Be sure the mask picture has a valid format even though it points to the
same pixels as the valid source. And also be wary if the source was
converted to a solid, but the mask is not.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 09:48:52 +00:00
Chris Wilson 0e4a24ef6c sna/gen4: Check reused source for validity
Be sure the mask picture has a valid format even though it points to the
same pixels as the valid source. And also be wary if the source was
converted to a solid, but the mask is not.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 09:48:52 +00:00
Chris Wilson ea299f2523 sna/gen3: Check reused source for validity
Be sure the mask picture has a valid format even though it points to the
same pixels as the valid source. And also be wary if the source was
converted to a solid, but the mask is not.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 09:48:52 +00:00
Chris Wilson 007da2f978 sna/gen2: Check reused source for validity
Be sure the mask picture has a valid format even though it points to the
same pixels as the valid source. And also be wary if the source was
converted to a solid, but the mask is not.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 09:48:52 +00:00
Chris Wilson 1d55b42fbd sna: Fix read back of partial mmapped buffers
Do not move a GTT mapped buffer into the CPU domain, it causes untold
pain for no benefit!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 09:48:52 +00:00
Chris Wilson 046e945173 sna: Discard read buffers after use
Rather than pollute the cache with bo that are known not to be in the
GTT and are no longer useful, drop the bo after we read from it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 09:48:52 +00:00
Chris Wilson 421ee0bb53 sna: Do not assume that the mappable aperture size is a power of two
And instead derive a power-of-two alignment value for partial buffer
sizes from the mappable aperture size and use that during
kgem_create_buffer()

Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=44682
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-15 09:48:52 +00:00
Chris Wilson 5d5da35c9f sna/gen[23]: Check for room in the batch before emitting pipeline flushes
Use a single idiom and reuse the check built into the state emission,
for both spans/boxes paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 22:07:57 +00:00
Chris Wilson f7e4799687 sna/gen6: Allow greater use of BLT
Typically we will be bound to the RENDER ring as once engaged we try not
to switch. However, with semaphores enabled we may switch more freely
and there it is advantageous to use as much of the faster BLT as is
feasible.

The most contentious point here is the choice of whether to use BLT for
copies by default. microbenchmarks (compwinwin) benefit from the
coallescing performed in the render batch, but the more complex traces
seem to prefer utilizing the blitter. The debate will continue...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 22:06:33 +00:00
Chris Wilson c1ce34d450 sna/gen6: Tidy markup for when using the BLT is truly preferrable
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:48 +00:00
Chris Wilson b64751dbdb sna: Be more lenient wrt switching rings if the kernel supports semaphores
If the kernel uses GPU semaphores for its coherency mechanism between
rings rather than CPU waits, allow the ring to be chosen on the basis
of the subsequent operation following a submission of batch. (However,
since batches are likely to be submitted in the middle of a draw, then
the likelihood is for ddx to remain on one ring until forced to switch
for an operation or idle, which is the same situation as before and so
the difference is miniscule.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:48 +00:00
Chris Wilson 295a22d270 sna: Ensure that the batch mode is always declared before emitting dwords
Initially, the batch->mode was only set upon an actual mode switch,
batch submission would not reset the mode. However, to facilitate fast
ring switching with semaphores, reseting the mode upon batch submission
is desired which means that if we submit the batch in the middle of an
operation we must redeclare its mode before continuing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:48 +00:00
Chris Wilson 0d2a507722 sna/glyphs: Cache the glyph image on the fallback path as well
The glyph cache grew to accommodate the fallback pixman image for mask
generation, and is equally applicable along the full fallback path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:48 +00:00
Chris Wilson f3e0ba4f65 sna/gen5: Disable render glyphs_to_dst
Processing more than a single rectangle using the CA path on ILK is
extremely hit-or-miss, often resulting in the absence of the second
primitive (ie. the glyphs are cleared but not added.) This is
reminiscent of the complete breakage of the BRW shaders, none of which
can handle more than a single rectangle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson fb92818ba4 sna: Pass render operation to flush and avoid the implicit flush-on-batch-end
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson a62429a1f7 sna: Upload continuation vertices into mmapped buffers
In the common case, we expect a very small number of vertices which will
fit into the batch along with the commands. However, in full flow we
overflow the on-stack buffer and likely several continuation buffers.
Streaming those straight into the GTT seems like a good idea, with the
usual caveats over aperture pressure. (Since these are linear we could
use snoopable bo for the architectures that support such for vertex
buffers and if we had kernel support.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson 24df8ab974 sna: Reverse the chronological sort order of inactive vma entries
The goal is to reuse the most recently bound GTT mapping in the hope
that is still mappable at the time of reuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson 2f26bbe3dd sna: Remove the short-circuiting of all-damage in move_to_cpu
To allow a replacement of the complete pixmap to be performed in place.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson c81dba18e6 sna: Hint whether we prefer to use the GPU for a pixmap
This includes the condition where the pixmap is too large, as well as
being too small, to be allocatable on the GPU. It is only a hint set
during creation, and may be overridden if required.

This fixes the regression in ocitysmap which decided to render glyphs
into a GPU mask for a destination that does not fit into the aperture.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson 2bd942d553 sna/trapezoids: Quieten the debugging of the gory details of the rasteriser
Hide the noise under another level of debugging so that hopefully the
reason why it chose a particular path become clear.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson 5dbcfc2ee3 sna: Be more lenient in not forcing to the GPU if the sources have CPU damage
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson 20ff4a1d73 sna: Use top_srcdir to detect .git rather than top_builddir
For srcdir != builddir builds, we need to be searching the source tree
for the git id.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson a4d5d72599 sna: Experiment with GTT mmapped upload buffers
In a few places, we can stream the source into the GTT and so upload in
place through the WC mapping. Notably, in many other places we want to
rasterise on a partial in cacheable memory. So we need to notify the
backend of the intended usage for the buffer and when we think it is
appropriate we can allocate a GTT mapped pointer for zero-copy upload.

The biggest improvement tends to be in the PutComposite style of
microbenchmark, yet throughput for trapezoid masks seems to suffer (e.g.
swfdec-giant-steps on i3 and gen2 in general). As expected, the culprit
of the regression is the aperture pressure causing eviction stalls, which
the pwrite paths sidesteps by doing a cached copy when there is no GTT
space. This could be alleviated with an is-mappable ioctl predicting when
use of the buffer would block and so falling back in those cases to
pwrite. However, I suspect that this will improve dispatch latency in
the common idle case for which I have no good metric.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson 252f381825 sna: Relinquish the GTT mmap on inactive buffers if moved out of the aperture
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00