If the operation does not replace existing CPU damage, we are likely to
want to reuse the pixmap again on the CPU, so avoid mixing CPU/GPU
operations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
On systems that incur painful overhead for ring switches, it is usually
better to create a large buffer and perform a sparse copy on the same
ring than create a compact buffer and use the BLT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we decide to defer the upload for this instance of the source pixmap,
mark it so. Then if we do use it again we will upload it to a GPU bo and
hopefully reuse those pixels.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the pixmap is entirely within the current CPU damage, we can forgo
reducing either the GPU or CPU damage when checking whether we need to
upload dirty pixels for a source texture.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we think that the operation is better performed on the CPU, avoid the
overhead of manipulating our privates.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As demonstrated with oversized glyphs and a chain of catastrophy, when
attaching our private to a pixmap after creation we need to mark the
entire CPU pixmap as dirty as we never tracked exactly which bits were
dirtied.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Glyphs, even large ones, we suspect will be reused and so the deferred
upload is counterproductive. Upload them immediately and mark them as
special creatures for later debugging.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we explicitly create CPU bo when wanted, we no longer desire to
spontaneously create vmaps for simply uploading to the GPU bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to be carefully to copy the boxes in a strict lifo order so as
to avoid overwritting the last boxes when reusing the array allocations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Avoid the overhead of tracking damage on small pixmaps when using CPU
rasterisation; the extra cost of sending the whole pixmap compared to
the damage is negligble should it ever be required on the GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As rasterisation will be performed upon the CPU we need to avoid the
readbacks form uncached memory and so we should restrict ourselves to
only create further damage within the CPU pixmap.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we did not allocate the pixel data, such as for wedged pixmaps or
scratch buffers, then we cannot perform the pointer dance nor do we want
to create the GPU buffer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the pixmap was intended for scanout, then the GPU bo will be created
upon attachment to the fb.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the contents of the pixmap are now rubbish, we need to manually
destroy it rather than pass it to the normal sna_pixmap_destroy()
routines.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The fast version of damage checking assumes that the damage has already
been determined to be non-NULL, so make sure it is.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We have to route all the drawing function to glamor first, when
glamor is enabled. This adds a few more functions that were previously
just falling back to swrast and passes them to glamor instead.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we are using GLAMOR, then a tile pixmap or stipple pixmap
may be pure glamor pixmap and thus UXA will not know how to
render to them, and we need to let glamor do the validation.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When glamor is enabled, a pixmap will not be accessed by UXA's
accelerated functions. Only unaccelerated functions may access those
pixmaps, and before each unaccelerated rendering, it calls
uxa_prepare_access which will do a glFlush. Combined with a flush before
sending to DRI clients, we no longer need to flush after every
operation.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
valgrind was complaining about an overlapping memcpy on a 64-bit
platform as gcc padded the sna_damage_box to 28 bytes...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
For gen3, we may reduce a source into a constant operator and so
dispense with keeping a bo. When duplicated into the mask channel, we
then need to be careful not to dereference the NULL pointer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As this causes a significant regression when benchmarking firefox on SNB
with firefox-planet-gnome if we already have CPU buffers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This reverts commit 2934e778f0. The actual
cause of the bug I was seeing on my PNV box turned out to be
a1f585a3d0, so time to reinvestigate the alignment issues.