This reduces the amount of dancing required to call into the span
functions as we can pass the arguments in both the integer and floating
point registers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Sandybdrige requires an elaborate dance to flush caches without
hanging the gpu. See public docs Vol2Part1 1.7.4.1 PIPE_CONTROL
or the corrensponding code in mesa/kernel.
This (together with the corresponding patch for the kernel) seems to
fix the hangs in cairo-perf-traces I'm seeing on my snb machine.
v2: Incorporate review from Chris Wilson. For paranoia keep all three
PIPE_CONTROL cmds in the same batchbuffer to avoid upsetting the gpu.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We attempt to skip upload a source pixmap to the GPU in the event it is
used only once (for example during image upload by firefox). However, if
we continue to use the CPU source pixmap then it obviously was worth
uploading to the GPU. So if we use the CPU pixmap a second time, do the
upload and then blit.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Using spans has a tremendous effect (~100x) on x11perf, some good but
mostly bad. However, in reality operations are mixed and so preventing
migration on alternate opertaions is a win. In the x11perf slowdowns, it
appears to be CPU bound and so it seems like there should be plenty of
scope for recovering the lost performance.
However, for the time being, just go back to the old fallbacks.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Previously we ignored updating the scanout in place, and so we were not
amoritizing the shadow cost of common core rendering operations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the span code does not yet handle plane masks or stippling, it is
disadvantageous to convert to spans only to fallback.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is actually tricker than it looks since miPolyArc() sometimes uses
an intermediate bitmap which performs worse than the fbPolyArc() fallback.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In the beginning, I did perform a retire after ever batch. Then I
decided that it was too much CPU overhead for too little gain. On
reflection, i.e. further benchmarking, we do see a performance
improvement for recycling active buffers faster.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The actual bug is a little involved as we don't damage the temporary
glyph mask correctly presuming that we only hit GPU paths. However,
should we fail to prepare the composite operation that paints the mask
on to the destination, things fail horribly.
One particular example is that wine like to create its own temporary a1
buffer for the glyphs (which we render to via another temporary mask...)
which triggers the delayed fallback and then sw compositing with a random
buffer.
Reported-by: Roman Jarosz <kedgedev@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41165
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>