The goal is to reuse the most recently bound GTT mapping in the hope
that is still mappable at the time of reuse.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This includes the condition where the pixmap is too large, as well as
being too small, to be allocatable on the GPU. It is only a hint set
during creation, and may be overridden if required.
This fixes the regression in ocitysmap which decided to render glyphs
into a GPU mask for a destination that does not fit into the aperture.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Hide the noise under another level of debugging so that hopefully the
reason why it chose a particular path become clear.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In a few places, we can stream the source into the GTT and so upload in
place through the WC mapping. Notably, in many other places we want to
rasterise on a partial in cacheable memory. So we need to notify the
backend of the intended usage for the buffer and when we think it is
appropriate we can allocate a GTT mapped pointer for zero-copy upload.
The biggest improvement tends to be in the PutComposite style of
microbenchmark, yet throughput for trapezoid masks seems to suffer (e.g.
swfdec-giant-steps on i3 and gen2 in general). As expected, the culprit
of the regression is the aperture pressure causing eviction stalls, which
the pwrite paths sidesteps by doing a cached copy when there is no GTT
space. This could be alleviated with an is-mappable ioctl predicting when
use of the buffer would block and so falling back in those cases to
pwrite. However, I suspect that this will improve dispatch latency in
the common idle case for which I have no good metric.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than iterate endlessly trying to upload the same pixmap when
failing to flush dirty CPU damage, try again on the next flush.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In the paths where we discard CPU damage, we also need to remove it
from the dirty list so that we do not iterate over it during flush.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
==7215== Invalid read of size 2
==7215== at 0x51A72F3: sna_poly_fill_rect_stippled_8x8_blt
(sna_accel.c:7340)
==7215== by 0x51A9CDF: sna_poly_fill_rect_stippled_blt
(sna_accel.c:8163)
==7215== by 0x51A3878: sna_poly_segment (sna_accel.c:6090)
==7215== by 0x216C02: damagePolySegment (damage.c:1096)
==7215== by 0x13F6E8: ProcPolySegment (dispatch.c:1771)
==7215== by 0x1436B4: Dispatch (dispatch.c:437)
==7215== by 0x131279: main (main.c:287)
==7215== Address 0x6f851e8 is 0 bytes after a block of size 32 alloc'd
==7215== at 0x4825DEC: malloc (vg_replace_malloc.c:261)
==7215== by 0x51A3558: sna_poly_segment (sna_accel.c:6049)
==7215== by 0x216C02: damagePolySegment (damage.c:1096)
==7215== by 0x13F6E8: ProcPolySegment (dispatch.c:1771)
==7215== by 0x1436B4: Dispatch (dispatch.c:437)
==7215== by 0x131279: main (main.c:287)
An example being the stippled outline in gimp, the yellow marching ants,
would randomly walk over the entire image.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Just in the unlikely event that we hit the delete-partial-upload path
which prefers destroying the last bo first.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
By failing to account for certain paths which would create a damage elt
without fully initialisating the damage region (only the damage extents),
we would later overwrite the damage extents with only the extents for
this operation (rather than the union of this operation with the current
damage). This fixes a regression from 098592ca5d,
(sna: Remove the independent tracking of elts from boxes).
Include the associated damage migration debugging code of the callers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we do not read back from the destination, we may prefer to utilize a
GTT mapping and perform the fallback inplace. For the rare event that we
wish to fallback and do not already have a shadow...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
By marking the scratch upload pixmap as damaged in both domains, we
confused the texture upload path and made it upload the pixmap a second
time. If either bo is all-damaged, use it!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to dump the batch contents before the maps are made by the
construction of the batch itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
miWideDash() no longer calls miZeroLineDash() when called with
gc->lineWidth==0, we need to do so ourselves.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Confirmed as still being required for both gen3 and gen4. One day I will
get single-stream mode working, just not today apparently.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The condition on being able to shrink a buffer is more severe than just
whether we are reading from the buffer, but also we cannot swap the
handles if the existing handle remains exposed via a proxy.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
To enable Daniel's faster pwrite paths. Only one step removed from using
whole page alignment...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Allow all generations to use the minimum alignment of 4 bytes again as
it appears to be working for me... Or at least what remains broken seems
to be irrespective of this alignment.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we reuse a partial buffer for a read, we cannot shrink it during
upload to the device as we do not track how many bytes we actually need
for the read operation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the partial bo may be coupled into the execlist, we may as well hang
onto the memory to service the next partial buffer request until it
expires in the next dispatch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Don't blithely assume that the incoming bytes are appropriately aligned
for the destination buffer. Indeed we may be replacing the destination
bo with the shadow bytes out of another,larger, pixmap, in which case we
do need to create a stride that is appropriate for the upload an
perform the 2D copy.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
So that subsequent code resists performing CPU operations with them
(after they have been populated.)
Marking both sides as wholly damaged breaks the rules, but should work
out so long as we check whether we can perform the operation within the
target damage first.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We should be able to reduce this by disabling dual-stream mode of the
GPU (which we want to achieve any way for 2D performance). Artefacts
in small uploads demonstrate that we fail to do.
References: https://bugs.freedesktop.org/show_bug.cgi?id=44150
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the no transform is a special case of affine, we were attempting to
deference the NULL transform in order to determine if it was a simple
no-rotation matrix. As the operation is extremely simple, add a special
case vertex program to speed it up.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I long for the day when this code is obsolete... Until then, this gives
a nice boost in the fishtank.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Similar for the standard io paths, try to reuse an upload buffer for a
small replacement pixmap.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we can create the read buffer from an active cached bo, it may
already be in the GPU domain by the time we first finish it, so fix the
broken assertion.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
So that the relocation entries point into the contiguous surface/batch
and can be trivially fixed up.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we have to upload the dirty data anyway, setting the
alpha-channel to 0xff should be free. Not so for firefox-asteroids on
Atom at least.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we are forced to perform a render operation to a bo too large to fit
in the pipeline, copy to an intermediate and split the operation into
tiles rather than fallback.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>