If after projection onto the Imprecise fast sample grid, the trapezoid
becomes a pixel-aligned box, treat it as such and send it down the fast
paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We cannot assume that the edge lies completely within the target, so we
must make sure that the initial prev_x is truly less than any possible
value whilst sorting intersections.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is optimising for the x11perf putimage benchmark, but nevertheless,
uploading the PutImage directly into the uncached scanout is between
2-20x slower than making a temporary copy in the shaodw buffer and
doing a deferred update. Most of the overhead is in the kernel, and
should be addressed there (rather than worked around) and a portion is
due to the overdraw in the benchmark (which is not likely to be
realistic, but then again neither should PutImage be!).
The argument for uploading inplace when possible is that given that the
buffer already exists on the GPU implies that is likely to be used again
in future by the GPU and so we will be uploading it at some point.
Deferring that upload incurs an extra copy. The putimage benchmark does
not actually use the pixel data and so that extra cost is not being
measured.
Reported-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As demonstrated by the all-important trap300, using the BLT is 2x faster
than the RENDER ring for the simple case of solid fills. (Though note
that performing the relocations costs 3x as much CPU for 2x GPU
performance.) One case that may regress from this change is copywinpix
which should benefit from the batching in the RENDER commands, and might
warrant revisiting in the future (with realistic and synthetic
benchmarks in hand!)
However, due to the forced stall when switching rings, we still want to
perform RENDER copies on behalf of DRI clients and before page-flips.
Checking against cairo-perf-trace indicated no major impact -- I had
worried that setting the BLT flag for some clears might have had a
knock-on effect causing too many operations that could be pipelined on
the RENDER ring to be sent to the BLT ring instead.
Reported-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If either of the edges are degenerate on the sample grid, then the trap
has zero height and must be skipped. (Otherwise if just one edge becomes
degenerate than the polygon becomes unbalanced and the rasteriser will
implode.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Oops, a silly cut'n'paste from caused us to allocate an A1 pixmap for
mono traps instead of the A8 pixmap that we tried to write to; mayhem
ensued.
Reported-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In order to retain the GTT space without keeping hold of the memory used
for the upload buffer, we have to create a new bo and copy the relevant
details across.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
They do not appear to have been leaked per-se, but we end up
accumulating the unused buffers. A more complicated solution would be to
reallocate the handle for retained buffers so that the GTT region can be
reused.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39184
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Similar to the previous commit, check that the Screen Pixmap is bound to
a bo before proceeding.
[Note that in this case, the absence of the bo would have been picked
up much later after doing all of the setup...]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Now, the pixmap being used is meant to the Screen pixmap and by rights
that has to exists in a GPU buffer! Evidence contrary to the above
exists and so we had better check that we have a bo before using...
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40439
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Take of the advantage of the faster mask computation available using the
imprecise tor scan converter for chipsets non yet supporting spans.
In doing so, limit the ability to full step only for vertical only rows
as the small sample grid reduces the benefits of the computationally
more expensive full-step.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Note this also revealed a subtle bug in the handling of degenerate
trapezoids after shrinking to the raster grid.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The single pixel case is usually assocated with synchronisation of perf
clients and so we do not want to incur extra complication along that
path. Also the cost of tracking a single pixel of non-damage outweighs
its benefit.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Or in the case where a second command is received prior to the batch
being flushed, the vertex data is not flushed and leads to the a
miscompution of the number of vertices emitted.
Reported-by: Elias Probst <mail@eliasprobst.eu>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40332
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Under certain circumstances the shadow can be destroy after being
allocated but before being created. The pixmap is a NULL pointer at that
time, but we know that its value should be data, so just use the data
pointer instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Otherwise we use the stale value when rendering CA glyphs directly to
the front-buffer and subsequent rendering have a tendency to become
invisible. (Rendering via a temporary glyph mask has a fortunate
side-effect of reseting sufficient state to force the re-emission of the
blend state.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When clipping the sample region to the edge of the texture we can also
allow the GPU to use CLAMP_TO_EDGE (as well as CLAMP_TO_BORDER) to
emulate the RepeatPad mode of the parent texture. (Only the
RepeatNormal, RepeatReflect need special treatment with regard to tiling
that is not yet handled.)
This fixes the recent performance regression due to a slight change in
the fish benchmark that caused it to sample outside of the texture atlas
for one of its little fish.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Return early from adding new damage regions if we know that we have
already marked it as all-damaged.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In reality, Mesa will be treating it as W-tiling, only we have no way of
communicating that requirement to the kernel (as not only does the
kernel not understand W-tiling, but also the GTT is incapable of fencing
a W-tiled region.).
Ported from Chad Versace's 3e55f3e88.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Slightly generalize the shared SF and CC code to accomodate both.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
While we're at it, make the functions simply take an intel_screen_private
pointer directly instead of having to fetch it from ScrnInfoPtr.
Also coalesce some gen6/gen7 functions that were 98% identical.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
These are exactly the same as the ones for Sandybridge, but with message
registers translated (hopefully) in the same way as Haihao's new
programs (m1 == g65).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Hanging the machine does indeed prevent video tearing. Just not quite
what the user expected...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39497
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Until now, the stencil buffer was allocated as a Y tiled buffer, because
in several locations the PRM states that it is. However, it is actually
W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
4.5.2.1 W-Major Format:
W-Major Tile Format is used for separate stencil.
The GTT is incapable of W fencing, so we allocate the stencil buffer with
I915_TILING_NONE and decode the tile's layout in software.
This commit mutually depends on the mesa commit:
intel: Fix stencil buffer to be W tiled
Author: Chad Versace <chad@chad-versace.us>
Date: Mon Jul 18 00:37:45 2011 -0700
Signed-off-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Ian Romanick <ian.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This is causing a hard hang with 2.6.39+, we don't know why so play safe
and disable for the time being.
References: https://bugs.freedesktop.org/show_bug.cgi?id=38012
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
These are very common when compositing unclipped trapezoids, and the
majority of the overhead is in handling the arbitrary number of boxes
and misses out on the constant folding the compiler can do if it is
known we have just one box.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>