As we prematurely end the batch if we bail on extending the vbo for CA
glyphs, we need to force the flush.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The heuristic of using the mapping only before the first use in an
execbuffer was suboptimal and broken by the change in bo initialisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Due to the w/a for its buggy shaders, gen4 is significantly different
that backporting the simple patch from gen5 was prone to failure. We
need to check that the vertices have not already been flushed prior to
flushing again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Or upon actually closing the vertex buffer.
However, the underlying issue remains. That is we are failing to re-emit
the first-pass for CA text after flushing the vertex buffer (and so
emitting the second-pass for the flushed vertices).
Reported-by: lemens Eisserer <linuxhippy@gmail.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=42891
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we may wait upon the bo having finished rendering before we can
execute the flip, flushing the render cache as early as possible is
beneficial
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Before blocking and waiting for further input, we need to make sure that
we have not developed too large a queue of outstanding rendering. As we
rendering to the front-buffer with no natural throttling and allow X
clients to render as fast as they wish, it is entirely possible for a
large queue of outstanding rendering to develop. For such an example,
watch firefox rendering the fishietank demo and notice the delay that
can build up before the tooltips appear.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we change the Screen pixmap due to a change of mode, we lose the
flag that we've attached a DRI2 buffer to it. So the next time we try to
copy from/to it, reassert its DRI2 status.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
i965_video.c: In function 'gen6_create_cc_state':
i965_video.c:1374:12: warning: passing argument 4 of
'intel_bo_alloc_for_data' discards 'const' qualifier from pointer target
type [enabled by default]
Repeated ad nauseam.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
For a singular region, we want to use a value for nboxes of 0 not 1,
fortunately if you pass in a box, it ignores the value of nboxes.
RegionInit() is a most peculiar API!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Two fixes in this commit, first we only need to check the
front left buffer, for other attachment we don't need to
check them. The second is, we should fixup the pixmap's
drawable not the original drawable.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
To support easy buffer exchange at glamor layer, glamor
added a new API glamor_egl_exchange_buffers() to exchange
two pixmaps' EGL image and fbos and textures without
recreating any of them. But this simple method's requirement
is that there are two pixmaps. A exceptional case is:
If we are using triple buffer when do page flipping, we
will have an extra back_buffer which doesn't have a pixmap
attached to it. Then each time we set that buffer to a
pixmap, we will have to call the create_egl_textured_pixmap
to create the corresponding EGL image and fbo and texture
for it. This is not efficient.
To fix this issue, this commit introduces a new back_pixmap
to intel structure to hold the back buffer and corresponding
glamor resources. Then we will just need to do the light
weight buffer exchanging at both DDX and glamor layer.
As the new back pixmap is similar to the screen pixmap
and need to be handled carefully when close screen. As the
glamor data structure is a per screen data, and will be
released at its close screen method. The glamor's close
screen method must cleanup the screen pixmap and back
pixmap's glamor resources. screen pixmap is easy to get,
but there is no good way to store the back pixmap.
So the glamor add a new API glamor_egl_create_textured_screen_ext
function to pass the back pixmap's pointer to glamor layer.
This commit make us depend on glamor commit: 4e58c4f.
And we increased the required glamor version from 0.3.0 to 0.3.1
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Add a new element back_name to intel structure to track
the back bo's name then avoid flink every time.
And at function I830DRI2ExchangeBuffers, after finish
the BO exchange between info's front and back pixmap,
it set the new front bo to the screen pixmap. But the
screen pixmap should be the same as front's pixmap,
so this is a duplicate operation and can be removed.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
In order to prevent walking upwards off the top of the pixmap when
rendering a clipped vertical edge, we need to tweak the boundary
conditions for the vertical edge walker.
Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=46261
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
On gen4, the tiling/fence constraints are fairly lax, only requiring
page alignment of the object and its size, and so we can switch
tiling modes without incurring a GPU stall on active bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The idea was that we could afford to allocate an active CPU bo for
copying to from using the GPU and later sync just before we need to
write to the shadow pixels. Having the sync inside the allocation
function potentially causes an unwanted stall.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Since the hardware only handles a8 without tricky emulation and pixman
insists on using a1 for sharp trapezoids we need to ensure that we
convert the a1 to a8 for our trapezoidal mask.
More worryingly, this path should never be hit...
References: https://bugs.freedesktop.org/show_bug.cgi?id=46156
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
After the flip chain is completed, any residual buffers are no longer in
use and so available for reuse.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Replace any existing definition with a correct version, since there are
broken container_of macros floating around the xorg includes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
list_last_entry() needs to be defined if we are including the xorg
list.h as opposed to our standalone variant.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In 1.11.903, the list.h was renamed to xorg-list.h with a corresponding
change to all structures. As we carried local fixes to list.h and
extended functionality, just create our own list.h with a bit of
handwaving to protect us for the brief existence of xorg/include/list.h.
Reported-by: Armin K <krejzi@email.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45938
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Slower for fills, but on the current stack faster for copies, both large
and small. Hopefully, when we write some good shaders for SNB, we will
not only improve performance for copies but also make fills faster on
the render ring than the blt?
As the BLT copy routine is GPU bound for copywinpix10, and the RENDER
copy routine is CPU bound and faster, I believe that we have reached the
potential of the BLT ring and not yet saturated the GPU using the render
copy.
Note that we still do not casually switch rings, so the actual routine
chosen will still be selected by the preceeding operations, so is
unlikely to have any effect in practice during, for example, cairo-traces.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The batch emission serves as a full stall, so we do not need to incur a
second before our first rendering.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Because one day we may actually start using VS! Copied from the addition
of the w/a to Mesa by Kenneth Graunke.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Upon reading, we encounter a serialisation point and so can retire all
requests. However, kgem_bo_retire() wasn't correctly detecting that
barrier and so we continued to using GPU detiling thinking the target
was still busy.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If both the source and destination is on the CPU, then the thinking was
it would be quicker to operate on those on the CPU rather than copy both
to the GPU and then perform the operation. This turns out to be a false
assumption if transformation is involved -- something to be reconsidered
if pixman should ever be improved.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Copying between two objects that consume more than the available GATT
space is a painful experience due to the forced use of an intermediatory
and eviction on every batch. The tiled upload paths are in comparison
remarkably efficient, so favour their use when handling extremely large
buffers.
This reverses the previous idea in that we now prefer large GPU bo
rather than large CPU bo, as the render pipeline is far more flexible
for handling those than the blitter is for handling the CPU bo (at least
for gen4+).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we go to the trouble of running retire before searching, we may as
well check that we retired something before proceeding to check all the
inactive lists.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The render pipeline is actually more flexible than the blitter for
dealing with large surfaces and so the BLT is no longer the limiting
factor on gen4+.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This depends upon glamor commit b5f8d, just after the 0.3.0 tag.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>