Julien Cristau disliked my broadcasting of the git tree used to build
his distribution package as it bore little relevance to his users. As it
is only useful for people installing their own drivers (as a means of
sanity checking that they are running the right driver), we introduce
the --with-builderstring idiom borrowed from the xserver. This allows
the builder to override the use of `git describe` and either leave it
blank or to fill it with something useful for their own purposes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Every time I do a transformation into pixmap space I like to include one
of these copy'n'paste errors.
Reported-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40850
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Another piece of state we zap without marking as dirty when playing
video.
Reported-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40842
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Or else we may try to clear the new framebuffer with an invalid batch,
because it will reuse the same bo as last time and that bo may still
think it is part of the old batch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We have yet to implement a yuv-shader that applies
contrast/brightness/saturation and so should not advertise such
features, potentially allowing the client to fallback and perform the
adjustments itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
For mono-rasteriser we can simply emit the composite spans without
requiring an opacity shader.
For single trapezoids, it will be more interesting to emit triangles
directly. However, we still need to compute the union of many
trapezoids, and this builds upon existing code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to mirror the kernel active lists in order to predict when it
will stall upon an access to a bo, and so we cannot clear the
needs_flush for our own MI_FLUSH.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Based on the patch by Konstantin Belousov.
Reported-by: Konstantin Belousov <konstantin.belousov@zoral.com.ua>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The whole purpose for that little dance was so that we could reuse the
bo. However, we left it marked as non-reusable in order for us not to
tie up memory with too many buffers and so defeated the purpose of
trying to place it into the inactive cache.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
And Ironlake also fails to live up to the promise that its GPU is fast
enough to run simple programs at memory rates.
x11perf -trap300 5x fold improvement. No obvious improvement elsewhere
yet.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We know we have compatible formats since we have a gpu_bo attached to
the pixmap, so we can use the simpler direct memcpy rather than calling
fbPutZImage/fbBlt.
On my i3-330m, this improves putimage500 from 730 to 1100 ops/s.
Reported-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Hopefully, I have all the dependencies correct for auto-updating and
should continue to work with tarballs...
The next step is to perhaps include it in the usual version number,
perhaps as patch level?
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Usually this will be to CPU-only pixmap, but just on the off-chance that
we are stalling for a GPU pixmap just the faster path developed for
Trapezoids.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In case the video is running async, then there may be subsequent
instructions within the batch and so we do need to mark the clobbered
state as dirty when setting up the video frame.
Reported-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40693
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Unlike the later gen, we do not yet use a separate vertex buffer and so
when can no longer fit a rectangle (and its CA ghost) we must flush the
batch. Due to the duplication required for the CA pass, the normal
checks to see whether we had sufficient space to add the new command
were passing as they failed to take into account the need to submit the
whole primitive again.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If after projection onto the Imprecise fast sample grid, the trapezoid
becomes a pixel-aligned box, treat it as such and send it down the fast
paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We cannot assume that the edge lies completely within the target, so we
must make sure that the initial prev_x is truly less than any possible
value whilst sorting intersections.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is optimising for the x11perf putimage benchmark, but nevertheless,
uploading the PutImage directly into the uncached scanout is between
2-20x slower than making a temporary copy in the shaodw buffer and
doing a deferred update. Most of the overhead is in the kernel, and
should be addressed there (rather than worked around) and a portion is
due to the overdraw in the benchmark (which is not likely to be
realistic, but then again neither should PutImage be!).
The argument for uploading inplace when possible is that given that the
buffer already exists on the GPU implies that is likely to be used again
in future by the GPU and so we will be uploading it at some point.
Deferring that upload incurs an extra copy. The putimage benchmark does
not actually use the pixel data and so that extra cost is not being
measured.
Reported-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As demonstrated by the all-important trap300, using the BLT is 2x faster
than the RENDER ring for the simple case of solid fills. (Though note
that performing the relocations costs 3x as much CPU for 2x GPU
performance.) One case that may regress from this change is copywinpix
which should benefit from the batching in the RENDER commands, and might
warrant revisiting in the future (with realistic and synthetic
benchmarks in hand!)
However, due to the forced stall when switching rings, we still want to
perform RENDER copies on behalf of DRI clients and before page-flips.
Checking against cairo-perf-trace indicated no major impact -- I had
worried that setting the BLT flag for some clears might have had a
knock-on effect causing too many operations that could be pipelined on
the RENDER ring to be sent to the BLT ring instead.
Reported-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If either of the edges are degenerate on the sample grid, then the trap
has zero height and must be skipped. (Otherwise if just one edge becomes
degenerate than the polygon becomes unbalanced and the rasteriser will
implode.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Oops, a silly cut'n'paste from caused us to allocate an A1 pixmap for
mono traps instead of the A8 pixmap that we tried to write to; mayhem
ensued.
Reported-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In order to retain the GTT space without keeping hold of the memory used
for the upload buffer, we have to create a new bo and copy the relevant
details across.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
They do not appear to have been leaked per-se, but we end up
accumulating the unused buffers. A more complicated solution would be to
reallocate the handle for retained buffers so that the GTT region can be
reused.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39184
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Similar to the previous commit, check that the Screen Pixmap is bound to
a bo before proceeding.
[Note that in this case, the absence of the bo would have been picked
up much later after doing all of the setup...]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Now, the pixmap being used is meant to the Screen pixmap and by rights
that has to exists in a GPU buffer! Evidence contrary to the above
exists and so we had better check that we have a bo before using...
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40439
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Take of the advantage of the faster mask computation available using the
imprecise tor scan converter for chipsets non yet supporting spans.
In doing so, limit the ability to full step only for vertical only rows
as the small sample grid reduces the benefits of the computationally
more expensive full-step.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Note this also revealed a subtle bug in the handling of degenerate
trapezoids after shrinking to the raster grid.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The single pixel case is usually assocated with synchronisation of perf
clients and so we do not want to incur extra complication along that
path. Also the cost of tracking a single pixel of non-damage outweighs
its benefit.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>