Commit Graph

4520 Commits

Author SHA1 Message Date
Chris Wilson 5d5da35c9f sna/gen[23]: Check for room in the batch before emitting pipeline flushes
Use a single idiom and reuse the check built into the state emission,
for both spans/boxes paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 22:07:57 +00:00
Chris Wilson f7e4799687 sna/gen6: Allow greater use of BLT
Typically we will be bound to the RENDER ring as once engaged we try not
to switch. However, with semaphores enabled we may switch more freely
and there it is advantageous to use as much of the faster BLT as is
feasible.

The most contentious point here is the choice of whether to use BLT for
copies by default. microbenchmarks (compwinwin) benefit from the
coallescing performed in the render batch, but the more complex traces
seem to prefer utilizing the blitter. The debate will continue...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 22:06:33 +00:00
Chris Wilson c1ce34d450 sna/gen6: Tidy markup for when using the BLT is truly preferrable
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:48 +00:00
Chris Wilson b64751dbdb sna: Be more lenient wrt switching rings if the kernel supports semaphores
If the kernel uses GPU semaphores for its coherency mechanism between
rings rather than CPU waits, allow the ring to be chosen on the basis
of the subsequent operation following a submission of batch. (However,
since batches are likely to be submitted in the middle of a draw, then
the likelihood is for ddx to remain on one ring until forced to switch
for an operation or idle, which is the same situation as before and so
the difference is miniscule.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:48 +00:00
Chris Wilson 295a22d270 sna: Ensure that the batch mode is always declared before emitting dwords
Initially, the batch->mode was only set upon an actual mode switch,
batch submission would not reset the mode. However, to facilitate fast
ring switching with semaphores, reseting the mode upon batch submission
is desired which means that if we submit the batch in the middle of an
operation we must redeclare its mode before continuing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:48 +00:00
Chris Wilson 0d2a507722 sna/glyphs: Cache the glyph image on the fallback path as well
The glyph cache grew to accommodate the fallback pixman image for mask
generation, and is equally applicable along the full fallback path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:48 +00:00
Chris Wilson f3e0ba4f65 sna/gen5: Disable render glyphs_to_dst
Processing more than a single rectangle using the CA path on ILK is
extremely hit-or-miss, often resulting in the absence of the second
primitive (ie. the glyphs are cleared but not added.) This is
reminiscent of the complete breakage of the BRW shaders, none of which
can handle more than a single rectangle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson fb92818ba4 sna: Pass render operation to flush and avoid the implicit flush-on-batch-end
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson a62429a1f7 sna: Upload continuation vertices into mmapped buffers
In the common case, we expect a very small number of vertices which will
fit into the batch along with the commands. However, in full flow we
overflow the on-stack buffer and likely several continuation buffers.
Streaming those straight into the GTT seems like a good idea, with the
usual caveats over aperture pressure. (Since these are linear we could
use snoopable bo for the architectures that support such for vertex
buffers and if we had kernel support.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson 24df8ab974 sna: Reverse the chronological sort order of inactive vma entries
The goal is to reuse the most recently bound GTT mapping in the hope
that is still mappable at the time of reuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson 2f26bbe3dd sna: Remove the short-circuiting of all-damage in move_to_cpu
To allow a replacement of the complete pixmap to be performed in place.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson c81dba18e6 sna: Hint whether we prefer to use the GPU for a pixmap
This includes the condition where the pixmap is too large, as well as
being too small, to be allocatable on the GPU. It is only a hint set
during creation, and may be overridden if required.

This fixes the regression in ocitysmap which decided to render glyphs
into a GPU mask for a destination that does not fit into the aperture.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson 2bd942d553 sna/trapezoids: Quieten the debugging of the gory details of the rasteriser
Hide the noise under another level of debugging so that hopefully the
reason why it chose a particular path become clear.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson 5dbcfc2ee3 sna: Be more lenient in not forcing to the GPU if the sources have CPU damage
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson 20ff4a1d73 sna: Use top_srcdir to detect .git rather than top_builddir
For srcdir != builddir builds, we need to be searching the source tree
for the git id.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson a4d5d72599 sna: Experiment with GTT mmapped upload buffers
In a few places, we can stream the source into the GTT and so upload in
place through the WC mapping. Notably, in many other places we want to
rasterise on a partial in cacheable memory. So we need to notify the
backend of the intended usage for the buffer and when we think it is
appropriate we can allocate a GTT mapped pointer for zero-copy upload.

The biggest improvement tends to be in the PutComposite style of
microbenchmark, yet throughput for trapezoid masks seems to suffer (e.g.
swfdec-giant-steps on i3 and gen2 in general). As expected, the culprit
of the regression is the aperture pressure causing eviction stalls, which
the pwrite paths sidesteps by doing a cached copy when there is no GTT
space. This could be alleviated with an is-mappable ioctl predicting when
use of the buffer would block and so falling back in those cases to
pwrite. However, I suspect that this will improve dispatch latency in
the common idle case for which I have no good metric.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson 252f381825 sna: Relinquish the GTT mmap on inactive buffers if moved out of the aperture
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-14 18:13:47 +00:00
Chris Wilson 9c73dd91e9 Include <xorgVersion.h> to repair build
intel_module.c:41:48: error: missing binary operator before token "("
2012-01-14 17:00:41 +00:00
Stefan Dirsch b213f6e876 Make driver backwards compatible for server 1.6.x.
Signed-off-by: Stefan Dirsch <sndirsch@suse.de>
2012-01-14 05:43:33 +01:00
Chris Wilson 94217a4dd9 sna: Decouple dirty pixmaps from list if we fail to upload them
Rather than iterate endlessly trying to upload the same pixmap when
failing to flush dirty CPU damage, try again on the next flush.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-13 10:15:56 +00:00
Chris Wilson 65ef369c73 sna: Decouple from CPU dirty list after removing all CPU damage
In the paths where we discard CPU damage, we also need to remove it
from the dirty list so that we do not iterate over it during flush.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-13 10:12:52 +00:00
Chris Wilson 0845fcef9e sna: Correct iteration counter for stippled blits
==7215== Invalid read of size 2
==7215==    at 0x51A72F3: sna_poly_fill_rect_stippled_8x8_blt
(sna_accel.c:7340)
==7215==    by 0x51A9CDF: sna_poly_fill_rect_stippled_blt
(sna_accel.c:8163)
==7215==    by 0x51A3878: sna_poly_segment (sna_accel.c:6090)
==7215==    by 0x216C02: damagePolySegment (damage.c:1096)
==7215==    by 0x13F6E8: ProcPolySegment (dispatch.c:1771)
==7215==    by 0x1436B4: Dispatch (dispatch.c:437)
==7215==    by 0x131279: main (main.c:287)
==7215==  Address 0x6f851e8 is 0 bytes after a block of size 32 alloc'd
==7215==    at 0x4825DEC: malloc (vg_replace_malloc.c:261)
==7215==    by 0x51A3558: sna_poly_segment (sna_accel.c:6049)
==7215==    by 0x216C02: damagePolySegment (damage.c:1096)
==7215==    by 0x13F6E8: ProcPolySegment (dispatch.c:1771)
==7215==    by 0x1436B4: Dispatch (dispatch.c:437)
==7215==    by 0x131279: main (main.c:287)

An example being the stippled outline in gimp, the yellow marching ants,
would randomly walk over the entire image.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 23:45:03 +00:00
Chris Wilson 5c2c6474ef sna/dri: Hook up a compile option to switch colour buffers to Y-tiling
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 23:12:52 +00:00
Chris Wilson 59b79e5952 sna: Reorder composite-done to destroy mask bo before source bo
Just in the unlikely event that we hit the delete-partial-upload path
which prefers destroying the last bo first.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 23:12:39 +00:00
Chris Wilson 983b755313 sna/damage: Fix union of extents with dirty damage but no region
By failing to account for certain paths which would create a damage elt
without fully initialisating the damage region (only the damage extents),
we would later overwrite the damage extents with only the extents for
this operation (rather than the union of this operation with the current
damage). This fixes a regression from 098592ca5d,
(sna: Remove the independent tracking of elts from boxes).

Include the associated damage migration debugging code of the callers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 21:15:58 +00:00
Chris Wilson 8d2f1eefe1 sna: Pass a hint that we may like to perform the fallback in place
If we do not read back from the destination, we may prefer to utilize a
GTT mapping and perform the fallback inplace. For the rare event that we
wish to fallback and do not already have a shadow...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 21:15:58 +00:00
Chris Wilson 48ab72754d sna: Use the GPU bo if it is all damaged
By marking the scratch upload pixmap as damaged in both domains, we
confused the texture upload path and made it upload the pixmap a second
time. If either bo is all-damaged, use it!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 21:15:58 +00:00
Chris Wilson 20a4d71819 sna: Dump batch contents for debugging before modification
We need to dump the batch contents before the maps are made by the
construction of the batch itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 21:15:58 +00:00
Chris Wilson 7932a2a259 sna: Update for removal of backwards compatible miWideDash
miWideDash() no longer calls miZeroLineDash() when called with
gc->lineWidth==0, we need to do so ourselves.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 14:06:22 +00:00
Chris Wilson b7cefddd46 sna: Re-enable min-alignment workaround on pre-SNB hw
Confirmed as still being required for both gen3 and gen4. One day I will
get single-stream mode working, just not today apparently.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 13:17:43 +00:00
Chris Wilson 978e1aecea sna: Only shrink a partial buffer if it is no longer used.
The condition on being able to shrink a buffer is more severe than just
whether we are reading from the buffer, but also we cannot swap the
handles if the existing handle remains exposed via a proxy.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 12:23:34 +00:00
Chris Wilson d3169154d1 sna: Improve a DBG message
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 12:23:34 +00:00
Chris Wilson 2a22990968 sna: Prevent 60Hz wakeups if the client stops in mid-render
Only continue to wake up if the scanout remains active.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 12:23:34 +00:00
Chris Wilson 1c0e9916ca sna: Align the partial buffer contents to cachelines
To enable Daniel's faster pwrite paths. Only one step removed from using
whole page alignment...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 12:23:34 +00:00
Chris Wilson 1e4080318f sna: Replace the open-coded bubble sort of the partial list
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 10:47:19 +00:00
Chris Wilson 7290ced579 sna/video: Fix for changes in damage api
We can avoid both calls to modify the damage with one simple check.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 02:59:43 +00:00
Chris Wilson 87e6dcb3b0 sna: Don't call RegionIntersect for the trivial PutImage
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 02:21:29 +00:00
Chris Wilson 1bd6665093 sna: Disable the min alignment workaround
Allow all generations to use the minimum alignment of 4 bytes again as
it appears to be working for me... Or at least what remains broken seems
to be irrespective of this alignment.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 02:21:29 +00:00
Chris Wilson 112b895926 sna: Prevent shrinking a partial buffer stolen for a read
If we reuse a partial buffer for a read, we cannot shrink it during
upload to the device as we do not track how many bytes we actually need
for the read operation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 02:21:26 +00:00
Chris Wilson b09ae4c203 sna: Don't drop expired partial bo immediately, wait until dispatch
As the partial bo may be coupled into the execlist, we may as well hang
onto the memory to service the next partial buffer request until it
expires in the next dispatch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 02:16:49 +00:00
Chris Wilson a3c42565a8 sna: Store damage-all in the low bit of the damage pointer
Avoid the function call overhead by inspecting the low bit to see if it
is all-damaged already.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-12 02:16:49 +00:00
Chris Wilson c64a9d0683 sna: Choose a stride for the indirect replacement
Don't blithely assume that the incoming bytes are appropriately aligned
for the destination buffer. Indeed we may be replacing the destination
bo with the shadow bytes out of another,larger, pixmap, in which case we
do need to create a stride that is appropriate for the upload an
perform the 2D copy.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-11 19:54:12 +00:00
Chris Wilson b82851e74d sna: Mark upload pixmaps as being wholly GPU damaged
So that subsequent code resists performing CPU operations with them
(after they have been populated.)

Marking both sides as wholly damaged breaks the rules, but should work
out so long as we check whether we can perform the operation within the
target damage first.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-11 15:54:16 +00:00
Chris Wilson 2a5ab05f16 sna: Use a minimum alignment of 64
We should be able to reduce this by disabling dual-stream mode of the
GPU (which we want to achieve any way for 2D performance). Artefacts
in small uploads demonstrate that we fail to do.

References: https://bugs.freedesktop.org/show_bug.cgi?id=44150
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-11 15:29:08 +00:00
Chris Wilson e94807759e sna/gen6: Special case spans with no transform
As the no transform is a special case of affine, we were attempting to
deference the NULL transform in order to determine if it was a simple
no-rotation matrix. As the operation is extremely simple, add a special
case vertex program to speed it up.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-11 12:13:18 +00:00
Chris Wilson 0a5313900e sna: Explicitly retire the bo following a serialisation point
This is to keep the sanity checks in order, but conceptually should be
useful as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-11 12:10:18 +00:00
Chris Wilson 2add5991a7 sna: Mark the bo as no longer in the GPU domain after clearing needs_flush
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-11 11:35:42 +00:00
Chris Wilson fec7098571 sna: Add assertions to track requests
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-11 11:33:14 +00:00
Chris Wilson a93c93be76 sna/gen6: Add a vertex program for a simple (affine, no rotation) spans
I long for the day when this code is obsolete... Until then, this gives
a nice boost in the fishtank.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-11 00:44:27 +00:00
Chris Wilson 3cf5da1090 sna: Amalgamate small replacements into upload buffers
Similar for the standard io paths, try to reuse an upload buffer for a
small replacement pixmap.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2012-01-10 23:39:33 +00:00