Track the most recent ring each bo is executed on, and prefer to keep it
on that ring for the next operation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Oops, we never managed to reuse the cached location of the target
surface as we entered it into the cache with the wrong key.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
cell_list_alloc() is only called from one place, and the compiler should
already be inlining it - but does not appear to be. Hint harder.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The goal is to reduce the preference of rendering to a SHM pixmap - only
if it is already active, will we consider continuing to use it on the
GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
So that we can prevent feeding back a stale bo when the DRI2 client
tries to swap an old buffer.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57212
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Further restrict the amount of fenced bo we try to fit into the batch to
make it easier for the kernel to accommodate the request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In case we hit a path were we avoid reusing the source for the mask and
leave is_affine unset for a solid mask.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Simplify the later checks by always populating the lists with a single,
albeit unpinned, bo in the case we fail to create pinned batches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the output is unscaled, then we do not require pixel interpolation
(and planar formats are exactly subsampled).
References: https://bugs.freedesktop.org/show_bug.cgi?id=58185
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The DRI2 protocol is inherently racy. Fortuituously, this can be swept
under the carpet by forcing the serialisation between the DRI2 clients
by using a blit for the SwapBuffers.
References: https://bugs.freedesktop.org/show_bug.cgi?id=58005
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As Jesse pointed out, it is legal for the client to request that the
flip be some frame in the future even with no divisor.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If divisor is 0 but the current MSC is behind the target, we shouldn't
schedule a flip (which will occur at the next vblank) or we'll end up
displaying it early and returning the wrong timestamp.
Preserve the optimization though by allowing us to schedule a flip if
both the divisor is 0 and the current MSC is equal to or ahead of the
target; this avoids a round trip through the kernel.
Reported-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This can happen naturally for 3-pipe config on Ivybridge or if the
outputs are rearranged whilst we slept. Instead of failing to change the
display on the VT, install at least a fb on the CompatOutput so that
hopefully the DE can take over, or give some control to the user.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Oops, I thought the 'busy' bit was now used and apparently forgot it is
used to control the periodic flushing...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we submit a batch early (for example if the GPU is idle), then submit
whatever else the client drew immediately upon completion of its
blockhandler. This is required to prevent flashing due to visible delay
between the clear at the start of the cycle and then the overdraw later.
References: https://bugs.freedesktop.org/show_bug.cgi?id=51718
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The aim is to improve GPU concurrency by keeping it busy. The possible
complication is that we incur more overhead due to small batches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Previously, before every operation we would look to see if the GPU was
idle and we were running under a DRI compositor. If the GPU was idle, we
would flush the batch in the hope that we reduce the cost of the context
switch and copy from the compositor (by completing the work earlier).
However, we would complete the work far too earlier and as a result
would need to flush the batch before every single operation resulting in
extra overhead and reduced performance. For example, the gtkperf
circles benchmark under gnome-shell/compiz would be 2x slower on
Ivybridge.
Reported-by: Michael Larabel <michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>