Fortunately nobody had yet noticed that all videos were assumed to play
with a matching src/dst origin.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Keeping a set of pinned batches in userspace is considerably faster as
we can avoid the blit overhead. However, combining the two approaches
yields even greater performance, as fast as without either w/a, and yet
stable.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Ease debugging by allowing all acceleration or render acceleration to be
disabled through AccelMethod:
Option "AccelMethod" "off" -> disable all acceleration
Option "AccelMethod" "blt" -> disable render acceleration (only use BLT)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we will undoubtably flush and sync upon the SHM request very shortly
afterwards, we only want to use the GPU for the SHM upload iff it is
currently busy.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Split the decision between where it is imperative to use the BLT to
avoid TLB misses and the second case where it is merely preferential to
witch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Track the most recent ring each bo is executed on, and prefer to keep it
on that ring for the next operation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Oops, we never managed to reuse the cached location of the target
surface as we entered it into the cache with the wrong key.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
cell_list_alloc() is only called from one place, and the compiler should
already be inlining it - but does not appear to be. Hint harder.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The goal is to reduce the preference of rendering to a SHM pixmap - only
if it is already active, will we consider continuing to use it on the
GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
So that we can prevent feeding back a stale bo when the DRI2 client
tries to swap an old buffer.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57212
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Further restrict the amount of fenced bo we try to fit into the batch to
make it easier for the kernel to accommodate the request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In case we hit a path were we avoid reusing the source for the mask and
leave is_affine unset for a solid mask.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Simplify the later checks by always populating the lists with a single,
albeit unpinned, bo in the case we fail to create pinned batches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the output is unscaled, then we do not require pixel interpolation
(and planar formats are exactly subsampled).
References: https://bugs.freedesktop.org/show_bug.cgi?id=58185
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The DRI2 protocol is inherently racy. Fortuituously, this can be swept
under the carpet by forcing the serialisation between the DRI2 clients
by using a blit for the SwapBuffers.
References: https://bugs.freedesktop.org/show_bug.cgi?id=58005
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As Jesse pointed out, it is legal for the client to request that the
flip be some frame in the future even with no divisor.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If divisor is 0 but the current MSC is behind the target, we shouldn't
schedule a flip (which will occur at the next vblank) or we'll end up
displaying it early and returning the wrong timestamp.
Preserve the optimization though by allowing us to schedule a flip if
both the divisor is 0 and the current MSC is equal to or ahead of the
target; this avoids a round trip through the kernel.
Reported-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This can happen naturally for 3-pipe config on Ivybridge or if the
outputs are rearranged whilst we slept. Instead of failing to change the
display on the VT, install at least a fb on the CompatOutput so that
hopefully the DE can take over, or give some control to the user.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Oops, I thought the 'busy' bit was now used and apparently forgot it is
used to control the periodic flushing...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we submit a batch early (for example if the GPU is idle), then submit
whatever else the client drew immediately upon completion of its
blockhandler. This is required to prevent flashing due to visible delay
between the clear at the start of the cycle and then the overdraw later.
References: https://bugs.freedesktop.org/show_bug.cgi?id=51718
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>