Commit Graph

3148 Commits

Author SHA1 Message Date
Chris Wilson dc402334f4 i915: Centre sampling.
Use centre sampling of textures to match pixman, and remove numerous
off-by-one and visual artefacts when rendering. The classic example for
this is cairo/text/xcomposite-projection where the edge of the rotated
rectangle is jaggy due to the incorrect sample position.

Fixes:

  Bug 16917  - [i915] Blur on y-axis also when only x-axis is scaled
               billiear
  https://bugs.freedesktop.org/show_bug.cgi?id=16917

And about 15 tests from the Cairo test suite.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-06-01 23:15:02 +01:00
Chris Wilson f74b3f82ba i915; Avoid the implicit flush on changing BUF_INFO
3DSTATE_BUF_INFO is an implicit flush of the piepline, so avoid emitting
that and associated state unless the destination pixmap has actually
changed. This is a win of around 3-5% for cairo-perf-trace, notably for
firefox.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-06-01 23:15:02 +01:00
Jesse Barnes f227240203 DRI2: fix new buffer exchange check
Chris's new buffer exchange check is a good one, but we don't want to
hit the immediate blit fallback path if it fails.  We still want to
schedule a blit for sometime in the future, and we need to use it
wherever an exchange might occur (like the secondary flip check or the
currently disabled CanExchange check).

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=28252.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-06-01 13:46:15 -07:00
Chris Wilson a386a003e7 uxa: Spans, try again to get the early break correct.
Trigger happy bug fixing. The sign *was* right, the endpoint was wrong.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-31 22:19:49 +01:00
Chris Wilson 1672ee0421 uxa: Sign reversal on early break from spans passing the YXband
Introduced with e5c971e763.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-31 22:08:43 +01:00
Chris Wilson cd38b705be Disable acceleration if we detect a hardware error.
This is wildly optimistic, but it should work in a surprising number of
error situations and some output in those cases will be hopefully be
better than none...

If we submit a batchbuffer and the kernel reports the GPU is hung (which
will be caused by an earlier execbuffer, and so the kernel should have
had enough time to determine whether or not it could reset the GPU) then
disable any further attempt to accelerate gfx and force fallbacks to map
the buffers and use the CPU. We cannot normally map any more buffers if
the GPU is hung, so only those already mapped prior to the hang can be
written to, or those allocated in system memory. However, we can expect
that the framebuffer is already mapped, and so have a reasonable
expectation to continue to see the display update.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-31 18:00:11 +01:00
Chris Wilson 5fff430046 uxa: Mega-Glyphs!
Rewrite glyph rendering to avoid the intermediate buffer, accumulating
the glyph rectangles directly in the backend composite routines. And
modify the glyph cache routines to fully utilise the allocated size of
the tiled buffer on older hardware. To do this we alias all glyph sizes
into the same texture using a technique suggested by Keith Packard.

PineView:
  885/856-> 1150/1110 kglyph/s (aa/rgb)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-31 14:03:42 +01:00
Chris Wilson d31abccd41 i915: Support textured video on an extended desktop.
Handle rendering textured video onto an extended desktop (>2048) by
using a temporary pixmap. Note that we still cannot handle rendering to
a greater than 2048 destination region, for that we will need to tile.
Hmm, time to request a 2560x1600, 10bpc monitor...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-31 12:23:29 +01:00
Chris Wilson 2cfd5bc134 dri: Compilation fix.
17:53 < arekm> ickle: i830_dri.c:630:28: error: ‘DrawableRec’ has no member named ‘bpp’
17:53 < arekm> ickle: i830_dri.c:630:57: error: ‘DrawableRec’ has no member named ‘bpp’

* sigh. I need to fix this machine to have the right version of the
* headers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-29 17:55:19 +01:00
Chris Wilson e2615cdeef dri: Only flip if the front and back pixmaps match.
An unredirected window (thanks Michel for the reminder) is backed by the
Screen pixmap, and so uses a reference of that as its front buffer. The
back buffer is a pixmap appropriately sized for the drawable. When the
application requests to swap its buffers, obviously we cannot simply
exchange the front and back buffer as they do not match, but need to copy
the appropriate region from the back to the front.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-29 16:40:06 +01:00
Chris Wilson 8b2039187f Revert "dri: Use size from backing pixmap when creating buffers."
This reverts commit 44d45d3fa5.

Michel Dänzer pointed out the flaw in using the pixmap size instead of
the drawable size:

  Using the backing pixmap dimensions for this is not desirable. In
  particular, it means that the DRI2 buffers of non-redirected windows
  always have the same size as the screen. But even for redirected windows
  it wastes some graphics memory with a re-parenting window manager, that
  is if it doesn't break in various ways due to the top left corner of the
  DRI2 buffers no longer corresponding to the top left corner of the window.
2010-05-29 12:14:55 +01:00
Chris Wilson 44d45d3fa5 dri: Use size from backing pixmap when creating buffers.
This avoid using the garbage values stored in the Screen drawable,
instead of the true values which are only maintained in its backing
pixmap. The consequence of using the wrong size was to hand a 1x1
pixmap to metacity/mutter and have it believe it was a full screen
drawable; GPU hangs ensued if using page flipping.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-29 10:39:28 +01:00
Chris Wilson 90c74a4314 i915: Don't re-emit vertex size unless it has changed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-28 21:50:04 +01:00
Eric Anholt a94ae175d6 uxa: Fix prepare_solid being called without check_solid first.
Fixes GPU hang on gen6.
2010-05-28 12:40:46 -07:00
Chris Wilson 66c90158e4 uxa: Skip the redundant miComputeCompositeRects() when adding to the mask
As we are in full control of the destination (the temporary glyph mask)
and the source (the glyph cache) we know that there are no clip regions
on either and so can skip computing the composite rectangles. (We trust
the device clipping to prevent compositing outside the target.)

x11perf on PineView:
701/686 -> 881/856 kglyphs/s [aa/rgb]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-28 17:13:30 +01:00
Chris Wilson 5b2254838e uxa: Make the glyph caches' fixed size explicit.
Until we actual resize the glyph cache dynamically, make it obvious to
the reader and the compiler that the size is fixed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-28 12:47:26 +01:00
Chris Wilson 11581dda99 uxa: Use a glyph private rather than a hash table.
Store the cache position directly on the glyph using a devPrivate rather
than an through auxiliary hash table.

x11perf on PineView:
650/638 kglyphs/s -> 701/686 kglyphs/s [aa/rgb]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-28 12:44:34 +01:00
Chris Wilson 73111cf2a2 Decouple non-reusuable pixmaps from batch lists on unref.
==7596== Invalid write of size 4
==7596==    at 0x491ACA8: intel_batch_teardown (i830_batchbuffer.c:118)
==7596==    by 0x491C9D6: I830CloseScreen (i830_driver.c:1419)
==7596==    by 0x8103A9C: RRCloseScreen (randr.c:105)
==7596==    by 0x80DE794: xf86CrtcCloseScreen (xf86Crtc.c:759)
==7596==    by 0x80BEBA3: DGACloseScreen (xf86DGA.c:268)
==7596==    by 0x80D044B: DPMSClose (xf86DPMS.c:134)
==7596==    by 0x488B050: XvCloseScreen (xvmain.c:320)
==7596==    by 0x81841B1: VidModeClose (xf86VidMode.c:110)
==7596==    by 0x80EB12F: CursorCloseScreen (cursor.c:191)
==7596==    by 0x810CA17: AnimCurCloseScreen (animcur.c:108)
==7596==    by 0x816937E: compCloseScreen (compinit.c:86)
==7596==    by 0x48D39B9: glxCloseScreen (glxscreens.c:221)
==7596==  Address 0x49c1a50 is 24 bytes inside a block of size 52 free'd
==7596==    at 0x4024866: free (vg_replace_malloc.c:325)
==7596==    by 0x80B023C: Xfree (utils.c:1096)
==7596==    by 0x4927CFD: i830_set_pixmap_bo (i830_uxa.c:647)
==7596==    by 0x491C9B4: I830CloseScreen (i830_driver.c:1413)
==7596==    by 0x8103A9C: RRCloseScreen (randr.c:105)
==7596==    by 0x80DE794: xf86CrtcCloseScreen (xf86Crtc.c:759)
==7596==    by 0x80BEBA3: DGACloseScreen (xf86DGA.c:268)
==7596==    by 0x80D044B: DPMSClose (xf86DPMS.c:134)
==7596==    by 0x488B050: XvCloseScreen (xvmain.c:320)
==7596==    by 0x81841B1: VidModeClose (xf86VidMode.c:110)
==7596==    by 0x80EB12F: CursorCloseScreen (cursor.c:191)
==7596==    by 0x810CA17: AnimCurCloseScreen (animcur.c:108)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-26 21:07:46 +01:00
Chris Wilson a6fb6aa5f9 Add vertex bo to the list of buffers to be torn down.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-26 20:31:45 +01:00
Chris Wilson 5dce69002d i965: Remove ATOMIC_BATCH.
This paranoid check is deceased; pining for the fjords.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-26 20:27:29 +01:00
Eric Anholt 06ebb55d30 Add a workaround for Ironlake errata relating to disabling the clipper. 2010-05-26 12:21:09 -07:00
Eric Anholt 158a158dad Add a workaround for Ironlake errata regarding blits and other engines. 2010-05-26 12:21:09 -07:00
Eric Anholt 3461f8f4bc Remove remaining REG_DUMPER build stuff. 2010-05-26 12:21:09 -07:00
Chris Wilson 03bbb4c896 uxa: Perform manual damage for CompositeRects
[xserver-1.8] The damage layer doesn't wrap CompositeRects, so we need to
manually append the damaged region ourselves. This works for
miCompsiteRects since that translates the call into multiple invocations
of either PolyFillRectangle or Composite, which themselves cause damage.

Fixes:

  Bug 28120 - Tint2's tooltip borders end up at 0,0 and do not disappear
  https://bugs.freedesktop.org/show_bug.cgi?id=28120

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-26 10:21:03 +01:00
Chris Wilson b9ada52a30 uxa: Force the alpha value to 0xffff when treating Over as Src
Since we have at most 8 bits of alpha, we treat >= 0xff00 as opaque.
However, being paranoid we should set the alpha value to 0xfff in case
something unexpected happens when converting from the xRenderColor to
the pixel value.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-26 10:21:03 +01:00
Chris Wilson 3055d40164 uxa: Use Composite rather than solid blitter for PolyRect
Due to the relocation overhead, using a single composite with many
rectangles outperforms many solid blits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-26 10:21:03 +01:00
Chris Wilson ec2437f958 uxa: Add PICT format mapping for depth 4 pixmaps.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-26 10:21:03 +01:00
Chris Wilson 309bd3a299 i830: Skip an empty fill.
In the extremely unlikely event that the higher layer erroneous gave us
an empty fill, skip it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-26 10:21:03 +01:00
Chris Wilson b645ec83e0 uxa: Apply the drawable offset to the solid rects
Fixes:

  Bug 28120 - Tint2's tooltip borders end up at 0,0 and do not disappear
  https://bugs.freedesktop.org/show_bug.cgi?id=28120

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-25 09:49:20 +01:00
Chris Wilson 9d8ac27140 Merge branch 'glyphs'
Tweak glyphs to improve x11perf on i915 by about 33%.
PineView, aa10text:  460 -> 617 kglyphs/s.
PineView, rgb10text: 434 -> 610 kglyphs/s.

Speedups
========
  xcb                    poppler    18.636 -> 13.958:  1.34x speedup
 xlib          firefox-talos-gfx    71.905 -> 56.232:  1.28x speedup
  xcb          firefox-talos-gfx    72.882 -> 57.969:  1.26x speedup
 xlib         gnome-terminal-vim    38.126 -> 34.472:  1.11x speedup
  xcb         gnome-terminal-vim    35.164 -> 32.573:  1.08x speedup
 xlib                    poppler    19.634 -> 18.246:  1.08x speedup

Note the lack of significant improvement for firefox-planet-gnome.
2010-05-24 18:31:45 +01:00
Chris Wilson ea07535240 i915: Emit CA over using OutReverse + Add passes
On PineView:
  578/621 -> 610/617 kglyphs/sec [rgb/aa]
2010-05-24 18:31:16 +01:00
Chris Wilson 80a9e64f50 uxa: Use temporary dest when target is too large for compositor
If the destination cannot fit into the 3D pipeline when we need to
composite, we fallback to doing the operation on the CPU. This is very
slow, and quite easy to trigger on i915 by plugging in an external
display.

An alternative is to extract the extents of the operation from the
destination using the blitter which can usually handle much larger
operations. This gives us a temporary target that can fit into the 3D
pipeline and thus be accelerated, before copying back into the larger
real destination.

For x11perf this boosts glyph rendering on PineView, from 38kglyphs/s to
480kglyphs/s. Just a little shy of the native performance of 601kglyphs/s

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 18:31:16 +01:00
Chris Wilson 91f560034f uxa: Composite glyphs directly onto dst when possible.
Without using a mask and compositing directly onto the destination,
takes us from 580 kglyphs/s to 850 kglyphs/s on i945 [x11perf -aa10text].

However, the extra intersection check almost entirely cancels out the
speed up and we discover that the glyphs in x11perf are always
overlapping. Nothing is ever easy.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 18:31:15 +01:00
Chris Wilson e3ece83f57 i915: compute normalized texcoords using a scale factor.
500 -> 580kglyphs/s on i945.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 09:42:18 +01:00
Chris Wilson c2abf8d659 uxa: translate the region in line for composites
When compositing, we need to convert the box into a rect and so the
advantages of using REGION_TRANSLATE are lost.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 09:40:28 +01:00
Chris Wilson 2adf823b80 i915: Add special case primitive emitters for glyphs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 09:40:26 +01:00
Chris Wilson f64ab9e0d9 i915: Move vertices into a vertex buffer object.
In theory this should allow us to pack far more operations into a single
batch buffer, and reduce our overheads.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 09:36:23 +01:00
Chris Wilson 2b050f330f Use pwrite to upload the batch buffer
By using pwrite() instead of dri_bo_map() we can write to the batch buffer
through the GTT and not be forced to map it back into the CPU domain and
out again, eliminating a double clflush.

Measing x11perf text performance on PineView:

Before:
16000000 trep @   0.0020 msec (511000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @   0.0021 msec (480000.0/sec): Char in 80-char rgb line (Charter 10)
After:
16000000 trep @   0.0019 msec (532000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @   0.0020 msec (496000.0/sec): Char in 80-char rgb line (Charter 10)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 09:33:37 +01:00
Chris Wilson dcef703a7c Kill paranoid assertions on every write into the batchbuffer.
On my PineView box these represent ~5% overhead on x11perf text:

Before:
16000000 trep @   0.0020 msec (495000.0/sec): Char in 80-char aa line (Charter 10)
12000000 trep @   0.0022 msec (461000.0/sec): Char in 80-char rgb line (Charter 10)

After:
16000000 trep @   0.0020 msec (511000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @   0.0021 msec (480000.0/sec): Char in 80-char rgb line (Charter 10)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 09:33:35 +01:00
Chris Wilson bc41f84e01 i915: Emit composite primitive with specialised functions.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 09:32:30 +01:00
Chris Wilson 4a3476ea09 i915: amalgamate composite into a single primitive list
Combine all the calls to composite between prepare_composite and
done_composite into a single primitive list, rather than a primitive
call per composite().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-23 18:52:15 +01:00
Chris Wilson e5c971e763 uxa: Spans! OMG!
Use composite rather than solid blits in order to bring performance on
a par with the CPU when using GEM and relocations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-23 18:43:29 +01:00
Kristian Høgsberg 509df27c74 dri: Clean up DRI2 API #ifdefs a bit
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
2010-05-18 10:01:52 -04:00
Chris Wilson 5e04a81369 i830: Remove vestigal debugging ALWAYS_FLUSH and ALWAYS_SYNC
These are now debugging options exposed in Xorg.conf, and now unused int
the source code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-17 15:16:25 +01:00
Chris Wilson 723cc45b27 dri: Check error code from GetScratchGC()
It may fail so be prepared, and do use the right drawable!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-17 15:14:55 +01:00
Chris Wilson 2c00297bc3 uxa: Replace solid planemask [0xffffffff] with FB_ALLONES
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-16 20:19:22 +01:00
Chris Wilson 2c69709d8a i830: Encode surface bpp into format
References:

  Bug 28135 - [855GM] Slowdown/High CPU-Usage after Git-Commit
              926fbc7d90
  https://bugs.freedesktop.org/show_bug.cgi?id=28135

The simple answer is that I had assumed that 0 was a reserved value.
However, without the bbp encoded into the format 0 was used for a8r8g8b8
and r5g6b5, which are very common formats!

The other possibility for the slowdown is that gtkperf is using of the
now verboten xrgb formats -- but would in fact be valid if the source
covers the clip and we could fixup the alpha value in the fixed function
combine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-16 18:41:52 +01:00
Chris Wilson 21b5fd427f uxa: Tidy uxa_solid_rects()
Move the operator reduction after a few fallbacks, closer to its use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-16 13:52:35 +01:00
Chris Wilson 61835701fd uxa: Patterns are acquired at 0,0
Set the correct offset for the gradients patterns after rendering to a
local Picture.

Fixes cairo/test/huge-radial and friends

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-16 13:51:35 +01:00
Chris Wilson 89f43f69a9 uxa: Force an alpha channel when rendering source fallbacks
As the source may not cover the extents, we need to represent those
areas as transparent in the fallback picture, ergo we need an alpha
channel. We could be smarter and force a format conversion when
necessary, and we could let the backend choose the most appropriate
format.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-15 18:34:54 +01:00