Commit Graph

3725 Commits

Author SHA1 Message Date
Chris Wilson 72fafdfd37 gitignore: add git_version.h
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-09 23:15:58 +01:00
Chris Wilson ac4d4cdbc1 sna: Mark the bo as reusable after extracting the handle from the buffer
The whole purpose for that little dance was so that we could reuse the
bo. However, we left it marked as non-reusable in order for us not to
tie up memory with too many buffers and so defeated the purpose of
trying to place it into the inactive cache.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-09 23:15:14 +01:00
Chris Wilson b3429cf12d sna/gen3: Use a clear pattern for ill-defined radial gradients
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-09 18:09:55 +01:00
Chris Wilson b6837c21b4 sna/gen5: Prefer BLT for solids
And Ironlake also fails to live up to the promise that its GPU is fast
enough to run simple programs at memory rates.

x11perf -trap300 5x fold improvement. No obvious improvement elsewhere
yet.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-09 15:48:30 +01:00
Chris Wilson 27e42b4e12 sna: Prefer memcpy_blt over fbBlt
We know we have compatible formats since we have a gpu_bo attached to
the pixmap, so we can use the simpler direct memcpy rather than calling
fbPutZImage/fbBlt.

On my i3-330m, this improves putimage500 from 730 to 1100 ops/s.

Reported-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-09 15:26:04 +01:00
Chris Wilson 2e1bf7e1b4 sna: Record git-tree used for compilation
Hopefully, I have all the dependencies correct for auto-updating and
should continue to work with tarballs...

The next step is to perhaps include it in the usual version number,
perhaps as patch level?

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-09 15:26:04 +01:00
Chris Wilson f73cd955e7 sna/trapezoids: Hook up Imprecise AddTraps in lieu of spans
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-09 11:37:32 +01:00
Chris Wilson adde6eab5d sna/trapezoids: Fast upload path for gpu busy bo
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-08 23:35:33 +01:00
Chris Wilson e9ca05331d sna/traps: Use the trapezoid path for AddTraps
Usually this will be to CPU-only pixmap, but just on the off-chance that
we are stalling for a GPU pixmap just the faster path developed for
Trapezoids.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-08 22:54:22 +01:00
Chris Wilson afdb8aa89a sna/gen3: Do not assume video updates are always vsync'ed
In case the video is running async, then there may be subsequent
instructions within the batch and so we do need to mark the clobbered
state as dirty when setting up the video frame.

Reported-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40693
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-07 20:27:36 +01:00
Chris Wilson 6aee152cb8 sna/gen2: Flush the batch when we run out of vertex space
Unlike the later gen, we do not yet use a separate vertex buffer and so
when can no longer fit a rectangle (and its CA ghost) we must flush the
batch. Due to the duplication required for the CA pass, the normal
checks to see whether we had sufficient space to add the new command
were passing as they failed to take into account the need to submit the
whole primitive again.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-04 12:57:17 +01:00
Chris Wilson 48bfe4e6de sna/gen2: Improve batch decoder.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-04 12:46:32 +01:00
Chris Wilson 2cda0aaf39 sna/trapezoids: Check for alignment after projection
If after projection onto the Imprecise fast sample grid, the trapezoid
becomes a pixel-aligned box, treat it as such and send it down the fast
paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-04 11:20:07 +01:00
Chris Wilson db0fb368c1 sna: Add missing implementation for Triangles
Feed both into spans and as a mask fallback.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-04 11:19:31 +01:00
Chris Wilson 695e7115ef sna/trapezoids: Edges may lie out of bounds
We cannot assume that the edge lies completely within the target, so we
must make sure that the initial prev_x is truly less than any possible
value whilst sorting intersections.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-01 15:51:31 +01:00
Chris Wilson 9a563ea03b sna: Use the shadow buffer for PutImage
This is optimising for the x11perf putimage benchmark, but nevertheless,
uploading the PutImage directly into the uncached scanout is between
2-20x slower than making a temporary copy in the shaodw buffer and
doing a deferred update. Most of the overhead is in the kernel, and
should be addressed there (rather than worked around) and a portion is
due to the overdraw in the benchmark (which is not likely to be
realistic, but then again neither should PutImage be!).

The argument for uploading inplace when possible is that given that the
buffer already exists on the GPU implies that is likely to be used again
in future by the GPU and so we will be uploading it at some point.
Deferring that upload incurs an extra copy. The putimage benchmark does
not actually use the pixel data and so that extra cost is not being
measured.

Reported-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-31 23:58:39 +01:00
Chris Wilson 32fc0c896e sna/gen6: Prefer the BLT ring, except for copies on behalf of DRI
As demonstrated by the all-important trap300, using the BLT is 2x faster
than the RENDER ring for the simple case of solid fills. (Though note
that performing the relocations costs 3x as much CPU for 2x GPU
performance.) One case that may regress from this change is copywinpix
which should benefit from the batching in the RENDER commands, and might
warrant revisiting in the future (with realistic and synthetic
benchmarks in hand!)

However, due to the forced stall when switching rings, we still want to
perform RENDER copies on behalf of DRI clients and before page-flips.

Checking against cairo-perf-trace indicated no major impact -- I had
worried that setting the BLT flag for some clears might have had a
knock-on effect causing too many operations that could be pipelined on
the RENDER ring to be sent to the BLT ring instead.

Reported-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-31 23:58:39 +01:00
Chris Wilson 5586dd729b sna/trapezoids: Refactor to project the trapezoid only once
And doing so means that we can go back to using the common validity
check.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-31 09:55:05 +01:00
Chris Wilson 3507437cdb sna/trapezoids: Reject invalid traps after projecting onto the sample grid
If either of the edges are degenerate on the sample grid, then the trap
has zero height and must be skipped. (Otherwise if just one edge becomes
degenerate than the polygon becomes unbalanced and the rasteriser will
implode.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-30 22:13:37 +01:00
Chris Wilson 150a0612d5 sna/trapezoids: Allocate sufficient space for a8 mask for mono traps
Oops, a silly cut'n'paste from caused us to allocate an A1 pixmap for
mono traps instead of the A8 pixmap that we tried to write to; mayhem
ensued.

Reported-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-30 19:49:47 +01:00
Chris Wilson aafe03d3d1 sna: Retain the GTT space used for an upload buffer
In order to retain the GTT space without keeping hold of the memory used
for the upload buffer, we have to create a new bo and copy the relevant
details across.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-29 16:50:12 +01:00
Chris Wilson 28c8c5ca14 sna: Free the buffers immediately upon release
They do not appear to have been leaked per-se, but we end up
accumulating the unused buffers. A more complicated solution would be to
reallocate the handle for retained buffers so that the GTT region can be
reused.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39184
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-29 16:49:09 +01:00
Chris Wilson 4f2fc00944 sna: Cleanup up the cache upon close
To help with leak-chasing under valgrind.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-29 15:14:41 +01:00
Chris Wilson 0ac4b974b9 sna/video: Defend against PutImage to a broken screen
Similar to the previous commit, check that the Screen Pixmap is bound to
a bo before proceeding.

[Note that in this case, the absence of the bo would have been picked
up much later after doing all of the setup...]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-29 10:47:45 +01:00
Chris Wilson 0a74cd77a3 video: check that the pixmap exists before use
Now, the pixmap being used is meant to the Screen pixmap and by rights
that has to exists in a GPU buffer! Evidence contrary to the above
exists and so we had better check that we have a bo before using...

Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40439
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-29 10:41:26 +01:00
Chris Wilson 8216c92d5c sna/trapezoids: Use the tor scan converter to compute the low precision mask
Take of the advantage of the faster mask computation available using the
imprecise tor scan converter for chipsets non yet supporting spans.
In doing so, limit the ability to full step only for vertical only rows
as the small sample grid reduces the benefits of the computationally
more expensive full-step.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-29 10:32:53 +01:00
Chris Wilson aeee6db798 sna/trapezoids: Reduce imprecise sampling to 4x4
Note this also revealed a subtle bug in the handling of degenerate
trapezoids after shrinking to the raster grid.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-27 11:44:13 +01:00
Chris Wilson ac1b83240e sna/accel: Simplify single pixel read-back
The single pixel case is usually assocated with synchronisation of perf
clients and so we do not want to incur extra complication along that
path. Also the cost of tracking a single pixel of non-damage outweighs
its benefit.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-27 11:44:13 +01:00
Chris Wilson 786a770f52 sna/video: Flush the video state at the end of the operation
Or in the case where a second command is received prior to the batch
being flushed, the vertex data is not flushed and leads to the a
miscompution of the number of vertices emitted.

Reported-by: Elias Probst <mail@eliasprobst.eu>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40332
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-25 19:55:49 +01:00
Chris Wilson bd98001a49 sna: Clear structures across server reset
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-25 14:50:37 +01:00
Chris Wilson 0865acb3ad sna/dri2: Add some debug around the use of the Resource database
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-25 14:50:37 +01:00
Chris Wilson 98b67457ca sna/display: Destroy shadow data
Under certain circumstances the shadow can be destroy after being
allocated but before being created. The pixmap is a NULL pointer at that
time, but we know that its value should be data, so just use the data
pointer instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-25 14:50:33 +01:00
Chris Wilson b9ae4e7e71 sna/gen3: reset blend state after applying CA pass
Otherwise we use the stale value when rendering CA glyphs directly to
the front-buffer and subsequent rendering have a tendency to become
invisible. (Rendering via a temporary glyph mask has a fortunate
side-effect of reseting sufficient state to force the re-emission of the
blend state.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-24 21:38:54 +01:00
Chris Wilson ef52f6c8c3 sna/render: allow CLAMP_TO_EDGE for outside samples of extract regions as well
When clipping the sample region to the edge of the texture we can also
allow the GPU to use CLAMP_TO_EDGE (as well as CLAMP_TO_BORDER) to
emulate the RepeatPad mode of the parent texture. (Only the
RepeatNormal, RepeatReflect need special treatment with regard to tiling
that is not yet handled.)

This fixes the recent performance regression due to a slight change in
the fish benchmark that caused it to sample outside of the texture atlas
for one of its little fish.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-24 11:07:23 +01:00
Stefan Dirsch d330f3751e Fix array size calculation for intel_pci_probe(). 2011-08-18 08:10:52 -07:00
Chris Wilson ccddff087d sna/trapezoids: Speedup tor rasteriser
Faster sorts for the win.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-11 19:42:42 +01:00
Chris Wilson bfbe36cfea sna/gradient: Use a high-precision ramp for a color step rather than fallback
Slightly less precise, but the difference should not be observable in
practice...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-11 19:42:42 +01:00
Chris Wilson 0e61e235bf sna/damage: Take advantage of marking all-damaged
Return early from adding new damage regions if we know that we have
already marked it as all-damaged.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-11 19:42:42 +01:00
Chris Wilson 3a81bb6baf NEWS: 2.16.0 release
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-09 09:42:40 +01:00
Edward Sheldrake f4bbbd1dfe Fix man page formatting
Two option sections were not starting at the beginning of a new line.
2011-08-01 15:37:29 +01:00
Chris Wilson 63518c4223 dri: Build fix for xserver-1.7.7
Back in the olden days before the introduction of dixRegisterPrivate().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-08-01 13:37:31 +01:00
Chris Wilson 7976f5144d NEWS: 2.15.901 snapshot
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-30 09:26:23 +01:00
Chris Wilson 2cfb703bbe Fix typos for distcheck
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-30 09:26:23 +01:00
Chris Wilson 6f919264da sna: Mark the stencil as untiled
In reality, Mesa will be treating it as W-tiling, only we have no way of
communicating that requirement to the kernel (as not only does the
kernel not understand W-tiling, but also the GTT is incapable of fencing
a W-tiled region.).

Ported from Chad Versace's 3e55f3e88.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-30 09:06:47 +01:00
Chris Wilson 326a84e832 sna: Port IVB acceleration code (Xrender + Xv)
Based on the superlative work by Kenneth Graunke and Xiang, Haihao.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-30 08:50:19 +01:00
Chris Wilson 1079092157 sna: Include the pixmap size in the debug info for moving to cpu
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-30 08:47:20 +01:00
Kenneth Graunke 5691c8cdec render: Enable RENDER acceleration on Ivybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-28 15:01:09 -07:00
Kenneth Graunke 0d92612b2a render: Update pixel shader state for Ivybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-28 15:01:07 -07:00
Kenneth Graunke 7460ee73d1 render: Use Ivybridge variants for 3D pipeline setup.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-28 15:01:06 -07:00
Kenneth Graunke e3a0960871 render: Refactor to use newly shared pipeline setup code in i965_3d.c.
Slightly generalize the shared SF and CC code to accomodate both.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-28 15:01:03 -07:00