We were underestimating the height of X-tiled surfaces (and less
harmfully overestimating the height of Y-tiled surfaces.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Whilst searching for available space on the active partial buffer list,
if we discover an unreferenced one, reset its used counter to zero.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we allocate a partial buffer and then fallback for the operation, the
buffer would remain on the partial list waiting for another user.
Discard any unused partials at the next batch submission or expiration
point.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
One deletion too many, unnoticed until the next reboot. Besides the
failure to disable logic op and enable colour buffer blending which
causes a hang if you subsequently try to enable both, you also need
to request texture caching...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(Note this only applies to 2D pixmaps.)
The rationale, borne out by experimentation with cairo-perf-trace, is
that on the pre-G33 devices we always need a fence region region
for tiled surfaces, i.e. at least .5/1MiB in size, and that combined
with the smaller GTT on those devices, we loose the benefit of tiling to
the excessive GTT thrashing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This was trying to workaround a kernel bug, and instead causes a
performance cliff for textures that *need* to be tiled.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the surface is so big that the 2x2 texel sampling will cause a TLB
miss everytime, i.e. the row pitch exceeeds 4096, then we need to
encourage tiling to prevent attrocious performance.
For example, try downscaling a 2560x1600 background image on a gen3
device using I915_TILING_NONE...
Using slideshow-demo /usr/share/backgrounds/cosmos/whirlpool.jpg, on a
PineView netbook, fps goes from under 4 to over 40.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We don't need to warn the user that their hardware does not support the
video overlay plane (but Jesse is working on it!), but merely inform
them that its presence is lacking.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is quite trivial to hit given the 2k limits on gen2/gen3. We
compromise on image quality by pre-downscaling the source by a fixed
factor to make it fit into the pipeline in preference to performing the
entire operation on the CPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Remove the PCI ID device checks by using the simpler check on the
generation id for errata pertaining to 830/845.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
...so that CPU pixmap is correctly invalidated for the next readback.
For instance, if you were to take a screenshot on a composited destkop.
Reported-by: Sitosfe Wheeler <sitsofe@yahoo.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we use GTT mappings if writing directly into the tiled buffer and the
available aperture is reported by the kernel as the total GTT and not
limited to the fenceable/mappable region, we need to manually probe this
value and ensure that our creation and fenced routines observe this
distinct limit.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Eeek, the wait-for-target-msc was using the immediate swap path, meaning
that for copy-swaps the copy was submitting immediately but the client
throttled waiting upon the target vblank. What is actually intended is
for the presentation to be delayed until the target_msc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The secret is not to cheat and render directly to the front buffer, but
remember to mark the Window as damaged.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The front-buffer of a DRI2 drawable, may not in fact be pointing to the
scanout pixmap. So override the destination for swapbuffers to update
the scanout.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We either conflated bpp (which fails given a mixture of depth-24 and
depth-30 pixmaps) or neglected to check at all.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
It is to prepare for Xv on Ivybridge. The difference from Sandybridge
is that all message payload must be in GRF registers instead of MRF registers
on Ivybridge. We will only redefine some M4 macros for Ivybridge
Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Fixes a regression from d0362a. In bypassing the is_wedged checked, we
also ended up bypassing the checks that we could indeed render to the
target bo. With the result that we were creating GPU buffers for SHM
surfaces, something that requires Xserver fixes before we can actually
enable...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>