Skip the filling of the whole pixmap if we have a small read and we
know the GPU bo is clear. Also choose to operate inplace on the GPU bo
if we meet the usual criteria.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
It is useful to know and to receive confirmation that you have
successfully compiled and executed the driver with debugging enabled.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Part of the large buffer handling was to move the decision making about
whether to create GPU bo for a pixmap to creation time. The single
instance where we change our minds later is involving large glyphs which
we choose not to cache.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Ordinarily if the GPU is wedged, we just want to create a shadow buffer.
Except that we must ensure that we do allow a bo to be created for
attaching to the scanout.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The joy of conditional compiles masked this compilation failure when
testing.
Reported-by: Reinhard Karcher <reinhard.karcher@gmx.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Do not attempt to further reduce the operator locally in each backend as
the reduction is already performed in the upper layer.
References: https://bugs.freedesktop.org/show_bug.cgi?id=42606
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Instead of keeping a virgin partial buffer around on its inactive list,
just transfer it to the global bo cache (in actuality destroy it since
it is just a kmalloc with no pages bound).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The root pixmap, for instance, may have unique DRI2Drawables for each
inferior window. We only want to clear the flush flag on the last
release, so we need to keep a count of how many DRI drawables remain
attached rather than a solitary flag.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As if we try to perform the operation with outstanding operations on the
source pixmaps, we will stall waiting for them to complete.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we may be holding on to them as an active mapping whilst they are
executing; reseting the used counter back to zero in this case can cause
corruption.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Whilst iterating the partial list and uploading the buffers, we need to
avoid trigger a recursive call into retire should we attempt to shrink a
buffer. Such a recursive call will modify the list beneath us so that we
chase a stale pointer and wreak havoc with memory corruption.
Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47061
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In the refactoring to avoid repeatedly applying the singular
pCompositeClip, the check for the all-clipped state was lost.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The idea was to reduce the number of unnecessary flushes by checking for
outgoing damage (could be refined further by inspecting the reply/event
callback for a XDamageNotifyEvent). However, it does not flush
sufficiently for the compositors' liking. As it doesn't appear to restore
performance to near uncomposited levels anyway, remove the complication.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Try to reduce the amount of Add/Delete ping-pong, in particular around
the recreation of the DRI2 attachment to the scanout after pageflipping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
After we are no longer sharing the bo with foreign clients, we no longer
need to keep flushing before every X_Reply and so we can remove the
callbacks to remove the overhead of having to check every time.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The goal is to simply avoid the flush before going to sleep when we have
no pending events. That is we only want to flush when we know there will
be at least on X_Reply sent to a Client. (Preferably, it would a Damage
reply!) We can safe assume that every WriteToClient marks the beginning
of a new reply added to the Client output queue and thus know that upon
the next flush event we will emitting a Reply and so need to submit our
batches.
Second attempt to fix a438e4ac.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This fixes the regression in performance of fishietank on gen2. As
the texture atlas is too large to be tiled, one might presume that it
has the same performance characteristics as the snooped linear CPU
buffer. It does not. Therefore if we attempt to reuse a vmap bo, promote
it to a full GPU bo. This hopefully gains the benefit of avoiding the
copy for single shot sources, but still gives us the benefit of avoiding
the clflushes.
On the plus side, it does prove that gen2 handles snoopable memory from
both the blitter and the sampler!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A magic number required for so many functions of the GPU. In this
particular case it is likely to be that the offset of a texture in the
GTT has to have a minimum alignment of 64 bytes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46415
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I skipped a GCC warning about the implicit function declaration, which
of course results in a runtime silent death. Oops.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We used to allow the backing pixmap to be created later in order to
accommodate ShmPixmaps and ShmPutImage. However, they are now correctly
handled upfront if we choose to accelerate those paths, and so all
choice over whether to attach to a pixmap are made during creation and
are invariant.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The sampler just dies if it encounters a snoopable page, for no apparent
reason. Whilst I encountered the bug on Crestline, disable it for the
rest of gen4 just to be safe.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we wish to immediate map the vertices buffers, it is beneficial to
search the linear cache for an existing mapping to reuse first.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
KGEM_BUFFER_WRITE_INPLACE is WRITE | INPLACE and so the typo prevented
uploading of partial data through the pwrite paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When moving only a region to the CPU and we detect a pending clear, we
transform the operation into a move whole pixmap. In such situations, we
only have a partial damage area and so need to or in MOVE_READ to
prevent the pending clear of the whole pixmap from being discarded.
References: https://bugs.freedesktop.org/show_bug.cgi?id=46792
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>