The idea was to reduce the number of unnecessary flushes by checking for
outgoing damage (could be refined further by inspecting the reply/event
callback for a XDamageNotifyEvent). However, it does not flush
sufficiently for the compositors' liking. As it doesn't appear to restore
performance to near uncomposited levels anyway, remove the complication.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Try to reduce the amount of Add/Delete ping-pong, in particular around
the recreation of the DRI2 attachment to the scanout after pageflipping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
After we are no longer sharing the bo with foreign clients, we no longer
need to keep flushing before every X_Reply and so we can remove the
callbacks to remove the overhead of having to check every time.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The goal is to simply avoid the flush before going to sleep when we have
no pending events. That is we only want to flush when we know there will
be at least on X_Reply sent to a Client. (Preferably, it would a Damage
reply!) We can safe assume that every WriteToClient marks the beginning
of a new reply added to the Client output queue and thus know that upon
the next flush event we will emitting a Reply and so need to submit our
batches.
Second attempt to fix a438e4ac.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This fixes the regression in performance of fishietank on gen2. As
the texture atlas is too large to be tiled, one might presume that it
has the same performance characteristics as the snooped linear CPU
buffer. It does not. Therefore if we attempt to reuse a vmap bo, promote
it to a full GPU bo. This hopefully gains the benefit of avoiding the
copy for single shot sources, but still gives us the benefit of avoiding
the clflushes.
On the plus side, it does prove that gen2 handles snoopable memory from
both the blitter and the sampler!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A magic number required for so many functions of the GPU. In this
particular case it is likely to be that the offset of a texture in the
GTT has to have a minimum alignment of 64 bytes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46415
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I skipped a GCC warning about the implicit function declaration, which
of course results in a runtime silent death. Oops.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We used to allow the backing pixmap to be created later in order to
accommodate ShmPixmaps and ShmPutImage. However, they are now correctly
handled upfront if we choose to accelerate those paths, and so all
choice over whether to attach to a pixmap are made during creation and
are invariant.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The sampler just dies if it encounters a snoopable page, for no apparent
reason. Whilst I encountered the bug on Crestline, disable it for the
rest of gen4 just to be safe.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we wish to immediate map the vertices buffers, it is beneficial to
search the linear cache for an existing mapping to reuse first.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
KGEM_BUFFER_WRITE_INPLACE is WRITE | INPLACE and so the typo prevented
uploading of partial data through the pwrite paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When moving only a region to the CPU and we detect a pending clear, we
transform the operation into a move whole pixmap. In such situations, we
only have a partial damage area and so need to or in MOVE_READ to
prevent the pending clear of the whole pixmap from being discarded.
References: https://bugs.freedesktop.org/show_bug.cgi?id=46792
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Debug builds are excruitatingly slow as the compiler doesn't store the
temporary in a register but uses an uncached readback instead. Maybe
this will help...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we now attempt to keep retain partial buffers after execution, we can
end up will lots of inactive buffers sitting on the partial buffer list.
In any one batch, we wish to minimise the number of buffers used, so
keep all the inactive buffers on a seperate list and only pull from them
as required.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Dust off the kernel patches and update to reflect the changes made to
support LLC CPU bo, in particular to support the unsynchronized shadow
buffers.
However, due to the forced synchronisation required for strict client
coherency we prefer not to use the vmap for shared pixmaps unless we are
already busy (i.e. sync afterwards rather than before in the hope that
we can squash a few operations into one). Being able to block the reply
to the client until the request is actually complete and so avoid the
sync remains a dream.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the buffer is cache-coherent, we can read as well as write to any
partial buffer so the distinction is irrelevant.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This reverts commit 4adb6967a8.
Oops, this debugging commit was not intended to be pushed along with the
bugfix. :(
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
An artefact of retaining the mmapped partial buffers is that it
magnified the effect of stealing those for readback, causing extra
writes on non-llc platforms.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bo used for batch buffers are handled differently and not tracked
through the active cache, so we failed to notice when we might be able
to run retire and recover a suitable buffer for reuse. So simply always
run retire when we might need to create a new linear buffer.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we change tiling on a bo, we are effectively discarding the cached
mmap so it is preferable to look for another.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We use the XF86DRI as a user configurable option to control whether to
build DRI support for i810, but it is also used internally within xorg
and there exists a public define in xorg-server.h which overrides our
configure option. So rename our define to HAVE_DRI1 to avoid the
conflict.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46590
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If you are suffering from regular X crashes and rendering corruption
with a flood of ENOSPC or even EFILE reported in the Xorg.log, try
adding this snippet to your xorg.conf:
Section "Driver"
Option "BufferCache" "False"
EndSection
References: https://bugs.freedesktop.org/show_bug.cgi?id=39552
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This reverts commit 9184af921b.
All X.Org modules must be able to be configured with autoconf 2.60.
In addition, version 2.63 has GPL licensing issues which prevents
some vendor to release software based on it.
The AM_SILENT_RULES are already handled by XORG_DEFAULT_OPTIONS.
All X.Org modules must be able to be configured with libtool 1.5.
AM_MAINTAINER_MODE default value is "enabled" already.
We use the same autogen script for all x.org modules.
There are proposals for changes which should be reviewed and eventually
applied to all modules together.
The lt*.m4 patterns are already included in the root .gitignore file.
This can be proposed as a change to all modules, but it invloves
changing the topvel .gitignore, the m4/.gitignore, the ACLOCAL_AMFLAGS
and the AC_CONFIG_MACRO_DIR together.
For more information on project wide configuration guidelines,
consult http://www.x.org/wiki/ModularDevelopersGuide
and http://www.x.org/wiki/NewModuleGuidelines.
Acked-by: Matthieu Herrb <matthieu.herrb@laas.fr>
Signed-off-by: Gaetan Nadon <memsize@videotron.ca>
As gen3 only uses the single state emission block, and uniformly calls
get_rectangles(), we can move that caller protocol into the callee.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we prematurely end the batch if we bail on extending the vbo for CA
glyphs, we need to force the flush.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The heuristic of using the mapping only before the first use in an
execbuffer was suboptimal and broken by the change in bo initialisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Due to the w/a for its buggy shaders, gen4 is significantly different
that backporting the simple patch from gen5 was prone to failure. We
need to check that the vertices have not already been flushed prior to
flushing again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>