The idea being that they facilitate copying to and from the CPU, but
also we want to avoid stalling on any pixels help by the CPU bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Similar to the desire to flush the next batch after an overflow, we do
not want to incur any lag in the midst of drawing, even if that lag is
mitigated by GPU semaphores.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Using the gpu to do the detiling just incurs extra latency and an extra
copy, so go back to using a fence and GTT mapping for the common path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Be sure the mask picture has a valid format even though it points to the
same pixels as the valid source. And also be wary if the source was
converted to a solid, but the mask is not.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Be sure the mask picture has a valid format even though it points to the
same pixels as the valid source. And also be wary if the source was
converted to a solid, but the mask is not.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Be sure the mask picture has a valid format even though it points to the
same pixels as the valid source. And also be wary if the source was
converted to a solid, but the mask is not.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Be sure the mask picture has a valid format even though it points to the
same pixels as the valid source. And also be wary if the source was
converted to a solid, but the mask is not.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Be sure the mask picture has a valid format even though it points to the
same pixels as the valid source. And also be wary if the source was
converted to a solid, but the mask is not.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Be sure the mask picture has a valid format even though it points to the
same pixels as the valid source. And also be wary if the source was
converted to a solid, but the mask is not.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than pollute the cache with bo that are known not to be in the
GTT and are no longer useful, drop the bo after we read from it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
And instead derive a power-of-two alignment value for partial buffer
sizes from the mappable aperture size and use that during
kgem_create_buffer()
Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=44682
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Use a single idiom and reuse the check built into the state emission,
for both spans/boxes paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Typically we will be bound to the RENDER ring as once engaged we try not
to switch. However, with semaphores enabled we may switch more freely
and there it is advantageous to use as much of the faster BLT as is
feasible.
The most contentious point here is the choice of whether to use BLT for
copies by default. microbenchmarks (compwinwin) benefit from the
coallescing performed in the render batch, but the more complex traces
seem to prefer utilizing the blitter. The debate will continue...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the kernel uses GPU semaphores for its coherency mechanism between
rings rather than CPU waits, allow the ring to be chosen on the basis
of the subsequent operation following a submission of batch. (However,
since batches are likely to be submitted in the middle of a draw, then
the likelihood is for ddx to remain on one ring until forced to switch
for an operation or idle, which is the same situation as before and so
the difference is miniscule.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Initially, the batch->mode was only set upon an actual mode switch,
batch submission would not reset the mode. However, to facilitate fast
ring switching with semaphores, reseting the mode upon batch submission
is desired which means that if we submit the batch in the middle of an
operation we must redeclare its mode before continuing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The glyph cache grew to accommodate the fallback pixman image for mask
generation, and is equally applicable along the full fallback path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Processing more than a single rectangle using the CA path on ILK is
extremely hit-or-miss, often resulting in the absence of the second
primitive (ie. the glyphs are cleared but not added.) This is
reminiscent of the complete breakage of the BRW shaders, none of which
can handle more than a single rectangle.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In the common case, we expect a very small number of vertices which will
fit into the batch along with the commands. However, in full flow we
overflow the on-stack buffer and likely several continuation buffers.
Streaming those straight into the GTT seems like a good idea, with the
usual caveats over aperture pressure. (Since these are linear we could
use snoopable bo for the architectures that support such for vertex
buffers and if we had kernel support.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The goal is to reuse the most recently bound GTT mapping in the hope
that is still mappable at the time of reuse.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This includes the condition where the pixmap is too large, as well as
being too small, to be allocatable on the GPU. It is only a hint set
during creation, and may be overridden if required.
This fixes the regression in ocitysmap which decided to render glyphs
into a GPU mask for a destination that does not fit into the aperture.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Hide the noise under another level of debugging so that hopefully the
reason why it chose a particular path become clear.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In a few places, we can stream the source into the GTT and so upload in
place through the WC mapping. Notably, in many other places we want to
rasterise on a partial in cacheable memory. So we need to notify the
backend of the intended usage for the buffer and when we think it is
appropriate we can allocate a GTT mapped pointer for zero-copy upload.
The biggest improvement tends to be in the PutComposite style of
microbenchmark, yet throughput for trapezoid masks seems to suffer (e.g.
swfdec-giant-steps on i3 and gen2 in general). As expected, the culprit
of the regression is the aperture pressure causing eviction stalls, which
the pwrite paths sidesteps by doing a cached copy when there is no GTT
space. This could be alleviated with an is-mappable ioctl predicting when
use of the buffer would block and so falling back in those cases to
pwrite. However, I suspect that this will improve dispatch latency in
the common idle case for which I have no good metric.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than iterate endlessly trying to upload the same pixmap when
failing to flush dirty CPU damage, try again on the next flush.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In the paths where we discard CPU damage, we also need to remove it
from the dirty list so that we do not iterate over it during flush.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
==7215== Invalid read of size 2
==7215== at 0x51A72F3: sna_poly_fill_rect_stippled_8x8_blt
(sna_accel.c:7340)
==7215== by 0x51A9CDF: sna_poly_fill_rect_stippled_blt
(sna_accel.c:8163)
==7215== by 0x51A3878: sna_poly_segment (sna_accel.c:6090)
==7215== by 0x216C02: damagePolySegment (damage.c:1096)
==7215== by 0x13F6E8: ProcPolySegment (dispatch.c:1771)
==7215== by 0x1436B4: Dispatch (dispatch.c:437)
==7215== by 0x131279: main (main.c:287)
==7215== Address 0x6f851e8 is 0 bytes after a block of size 32 alloc'd
==7215== at 0x4825DEC: malloc (vg_replace_malloc.c:261)
==7215== by 0x51A3558: sna_poly_segment (sna_accel.c:6049)
==7215== by 0x216C02: damagePolySegment (damage.c:1096)
==7215== by 0x13F6E8: ProcPolySegment (dispatch.c:1771)
==7215== by 0x1436B4: Dispatch (dispatch.c:437)
==7215== by 0x131279: main (main.c:287)
An example being the stippled outline in gimp, the yellow marching ants,
would randomly walk over the entire image.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Just in the unlikely event that we hit the delete-partial-upload path
which prefers destroying the last bo first.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
By failing to account for certain paths which would create a damage elt
without fully initialisating the damage region (only the damage extents),
we would later overwrite the damage extents with only the extents for
this operation (rather than the union of this operation with the current
damage). This fixes a regression from 098592ca5d,
(sna: Remove the independent tracking of elts from boxes).
Include the associated damage migration debugging code of the callers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we do not read back from the destination, we may prefer to utilize a
GTT mapping and perform the fallback inplace. For the rare event that we
wish to fallback and do not already have a shadow...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
By marking the scratch upload pixmap as damaged in both domains, we
confused the texture upload path and made it upload the pixmap a second
time. If either bo is all-damaged, use it!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to dump the batch contents before the maps are made by the
construction of the batch itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
miWideDash() no longer calls miZeroLineDash() when called with
gc->lineWidth==0, we need to do so ourselves.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Confirmed as still being required for both gen3 and gen4. One day I will
get single-stream mode working, just not today apparently.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The condition on being able to shrink a buffer is more severe than just
whether we are reading from the buffer, but also we cannot swap the
handles if the existing handle remains exposed via a proxy.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
To enable Daniel's faster pwrite paths. Only one step removed from using
whole page alignment...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>