Commit Graph

3114 Commits

Author SHA1 Message Date
Chris Wilson c2abf8d659 uxa: translate the region in line for composites
When compositing, we need to convert the box into a rect and so the
advantages of using REGION_TRANSLATE are lost.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 09:40:28 +01:00
Chris Wilson 2adf823b80 i915: Add special case primitive emitters for glyphs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 09:40:26 +01:00
Chris Wilson f64ab9e0d9 i915: Move vertices into a vertex buffer object.
In theory this should allow us to pack far more operations into a single
batch buffer, and reduce our overheads.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 09:36:23 +01:00
Chris Wilson 2b050f330f Use pwrite to upload the batch buffer
By using pwrite() instead of dri_bo_map() we can write to the batch buffer
through the GTT and not be forced to map it back into the CPU domain and
out again, eliminating a double clflush.

Measing x11perf text performance on PineView:

Before:
16000000 trep @   0.0020 msec (511000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @   0.0021 msec (480000.0/sec): Char in 80-char rgb line (Charter 10)
After:
16000000 trep @   0.0019 msec (532000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @   0.0020 msec (496000.0/sec): Char in 80-char rgb line (Charter 10)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 09:33:37 +01:00
Chris Wilson dcef703a7c Kill paranoid assertions on every write into the batchbuffer.
On my PineView box these represent ~5% overhead on x11perf text:

Before:
16000000 trep @   0.0020 msec (495000.0/sec): Char in 80-char aa line (Charter 10)
12000000 trep @   0.0022 msec (461000.0/sec): Char in 80-char rgb line (Charter 10)

After:
16000000 trep @   0.0020 msec (511000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @   0.0021 msec (480000.0/sec): Char in 80-char rgb line (Charter 10)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 09:33:35 +01:00
Chris Wilson bc41f84e01 i915: Emit composite primitive with specialised functions.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-24 09:32:30 +01:00
Chris Wilson 4a3476ea09 i915: amalgamate composite into a single primitive list
Combine all the calls to composite between prepare_composite and
done_composite into a single primitive list, rather than a primitive
call per composite().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-23 18:52:15 +01:00
Chris Wilson e5c971e763 uxa: Spans! OMG!
Use composite rather than solid blits in order to bring performance on
a par with the CPU when using GEM and relocations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-23 18:43:29 +01:00
Kristian Høgsberg 509df27c74 dri: Clean up DRI2 API #ifdefs a bit
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
2010-05-18 10:01:52 -04:00
Chris Wilson 5e04a81369 i830: Remove vestigal debugging ALWAYS_FLUSH and ALWAYS_SYNC
These are now debugging options exposed in Xorg.conf, and now unused int
the source code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-17 15:16:25 +01:00
Chris Wilson 723cc45b27 dri: Check error code from GetScratchGC()
It may fail so be prepared, and do use the right drawable!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-17 15:14:55 +01:00
Chris Wilson 2c00297bc3 uxa: Replace solid planemask [0xffffffff] with FB_ALLONES
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-16 20:19:22 +01:00
Chris Wilson 2c69709d8a i830: Encode surface bpp into format
References:

  Bug 28135 - [855GM] Slowdown/High CPU-Usage after Git-Commit
              926fbc7d90
  https://bugs.freedesktop.org/show_bug.cgi?id=28135

The simple answer is that I had assumed that 0 was a reserved value.
However, without the bbp encoded into the format 0 was used for a8r8g8b8
and r5g6b5, which are very common formats!

The other possibility for the slowdown is that gtkperf is using of the
now verboten xrgb formats -- but would in fact be valid if the source
covers the clip and we could fixup the alpha value in the fixed function
combine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-16 18:41:52 +01:00
Chris Wilson 21b5fd427f uxa: Tidy uxa_solid_rects()
Move the operator reduction after a few fallbacks, closer to its use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-16 13:52:35 +01:00
Chris Wilson 61835701fd uxa: Patterns are acquired at 0,0
Set the correct offset for the gradients patterns after rendering to a
local Picture.

Fixes cairo/test/huge-radial and friends

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-16 13:51:35 +01:00
Chris Wilson 89f43f69a9 uxa: Force an alpha channel when rendering source fallbacks
As the source may not cover the extents, we need to represent those
areas as transparent in the fallback picture, ergo we need an alpha
channel. We could be smarter and force a format conversion when
necessary, and we could let the backend choose the most appropriate
format.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-15 18:34:54 +01:00
Chris Wilson 524fd2dd0d uxa: Apply clip for solid rectangles.
References:

  Bug 28120 - Tint2's tooltip borders end up at 0,0 and do not disappear
  https://bugs.freedesktop.org/show_bug.cgi?id=28120

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-15 18:28:05 +01:00
Chris Wilson 58b089febc uxa: Avoid using blits when with PictFilterConvolution
References:

  Bug 28098 Compiz renders shadows wrong, garbage line of pixels along left
            and top edge of windows
  https://bugs.freedesktop.org/show_bug.cgi?id=28098

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-15 09:11:46 +01:00
Chris Wilson ef95899f5b uxa: Check the w-scaling component is 1 for an translation matrix
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-15 09:02:07 +01:00
Chris Wilson 9c3da71349 i830: Remove xrgb conversion to argb, no longer required.
All textures are now properly declared so that the alpha swizzling
occurs in the sampler or not at all. The downside is that for quite a
few composite operations we have to fallback to software on older
hardware. There is scope for more performing the alpha expansion in
shaders or combiners when we know the picture covers the clip - which is
almost all of the time for normal operations especially those
constructed by Cairo.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-15 01:09:26 +01:00
Chris Wilson 926fbc7d90 i830: Remove incorrectly mapped tex formats.
We no longer workaround the lack of alpha expansion for xrgb textures as
this interferes with EXTEND_NONE, though we could if we know the source
covers the clip...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-15 01:09:13 +01:00
Chris Wilson 95654cffa8 uxa: Fix order of conditionals to only run fill_region for SRC or opaque
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-15 00:50:42 +01:00
Chris Wilson f67b45965b uxa: Expand the range of compatible formats to cover all bpp.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-15 00:50:20 +01:00
Chris Wilson 82d07fdf10 uxa: Only use 1x1R as a solid with an opaque format or SRC
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-15 00:49:39 +01:00
Chris Wilson 3bca186a7e uxa: Call check_solid before running the solid blitter.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-15 00:48:31 +01:00
Chris Wilson 213816c30b i915: Load texture into directly into OC when possible.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-15 00:48:19 +01:00
Chris Wilson 737de9a779 uxa: Disable compatible src xrgb and dst argb
I'm seeing garbage alpha for rendercheck blend:

  x8r8g8b8a 10x10 SRC ar8g8b8a

so disable blitting until I work out if we can fast-path it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-14 23:56:26 +01:00
Chris Wilson 271240fd47 i915: Remove a couple of unsupported 16bpp no-alpha tex formats
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-14 23:56:05 +01:00
Chris Wilson a7c318d21c uxa: Parse BGRA pixel formats.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-14 23:32:44 +01:00
Chris Wilson f7bbcc492a Split the prepare blitter functions into check + prepare.
Allow us to check whether we can handle the operation using the blitter
prior to doing any work.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-14 23:31:57 +01:00
Chris Wilson 4be8d7eb89 i915: Don't force alpha=1 for RGB drawables in the shader.
I was blindly fixing rendercheck without thinking. We need to force the
alpha value to be in the blend unit and not before -- otherwise we
generate the incorrect result whilst blending. D'oh.
2010-05-14 21:16:51 +01:00
Chris Wilson b9a5e36f95 uxa: enable solid rects for backends that require pixmaps
Convert the color into a (cached) pixmap if the backend cannot handle
the SolidFill natively.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-14 21:16:50 +01:00
Chris Wilson a21297d7cc drm: Remove pin(); unpin() sync
GEM handles serialisation of the new front buffer with respect to page
flipping and rendering and reports back when the flip is complete.
Adding a sync point here is then redundant.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-14 17:52:22 +01:00
Chris Wilson 7ee73d2c6f drm: Remove unused old_front parameter from drmmode_do_pageflip.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-14 17:51:40 +01:00
Chris Wilson 030d56279b drm: don't overwrite the old intel->front_buffer
It's now handled in the common ExchangeBuffers() path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-14 17:30:23 +01:00
Chris Wilson 5bd0227395 i830: Teardown batch entries on reset.
By not cleaning up the batch entries when resetting the X server, we left
the pointers in an inconsistent state and caused X to crash.
2010-05-14 15:50:05 +01:00
Chris Wilson 0d2392d44a dri: Hold reference to buffers across swap
As we schedule swaps for some time in the future and may process a
detachment prior to receiving the vblank notification from the kernel,
we need to hold a reference to the buffers for our swap event handler.

Fixes:
  Bug 28080 - "glresize" causes X server segfault with indirect rendering.
  https://bugs.freedesktop.org/show_bug.cgi?id=28080

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-14 10:32:12 +01:00
Chris Wilson 8de09a0707 uxa: Convert 1x1R back to solid_fill
In the change to prevent blitting between incompatible sources, we also
prevented 1x1R pixmaps from being used for solid fills. Reorder the
sequence of conditions to enable this fast path again.
2010-05-13 17:17:54 +01:00
Chris Wilson 92e9cf8af7 uxa: Only use solid_fill for SRC. 2010-05-13 17:17:54 +01:00
Chris Wilson d1bd14e8b6 uxa: Replace source for CLEAR with a transparent solid
This means that we will hit the faster try_solid_fill path instead.
2010-05-13 17:17:54 +01:00
Chris Wilson cdab72c405 uxa: Fallback early if compositing with alphaMaps 2010-05-13 17:17:54 +01:00
Chris Wilson 25811dc7b7 i915: Force output alpha to 1. if dst has no alpha channel.
Ensure that garbage is not stored in the unused alpha channel so that
we can rely on it being currently initialiased when used as a source or
returning via GetImage.

Partial fix for rendercheck -t blend
2010-05-13 17:17:10 +01:00
Chris Wilson 0e726b85ca i915: Add a2r10g10b10 format and friends
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-13 09:40:27 +01:00
Chris Wilson 9f54107f86 dri2: Handle reference counting across page flipping
1. Instead of swapping bos, swap the entire private structure.

2. If we update the pixmap bo for the Screen, make sure we update the
reference inside intel->front_buffer so that xrandr still functions.

Fixes:

  Bug 27922 - i965: Rapidly resizing OpenGL window causes GPU to hang.
  https://bugs.freedesktop.org/show_bug.cgi?id=27922

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-12 21:37:49 +01:00
Chris Wilson 6c27f6e4f7 uxa: Avoid glyph ping-pong with !offscreen destination
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-12 12:50:31 +01:00
Chris Wilson d5383c2073 uxa: Avoid ping-pong with !offscreen destination and traps
If we are destined to target an !offscreen drawable, then uploading the
trapezoid mask to a bo is the last thing we actually want to do...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-12 12:50:31 +01:00
Chris Wilson 00664b8f9d uxa: Fallback when compositing to a !offscreen destination
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-12 12:50:31 +01:00
Chris Wilson 0c6372a77f i830: Prevent allocation of bo larger than half the aperture
We need to prevent overcommitting the aperture, and in particular if we
allocate a buffer larger than available space we will fail to mmap it in
and rendering will fail. Trying to allocate multiple large buffers in
the aperture, often the case when falling back, causes thrashes and
eviction of useful buffers. So from the outset simply do not allocate a
bo if the the required size is more than half the available aperture
space.

Fixes allocation failure in ocitymap.trace for instance.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-12 12:50:31 +01:00
Chris Wilson 244b7cbfff uxa: Use accelerated PutImage for uploading pixman images.
Short-circuits the current use of PutImage from CopyArea, bypassing all
the temporary allocations.
2010-05-12 12:50:31 +01:00
Chris Wilson cb887cfc67 uxa: solid rects
The cost of performing relocations outweigh the advantages of using the
blitter for solids with lots of rectangles.

References:

  Bug 22127 - [UXA] 50% performance regression for XRenderFillRectangles
  https://bugs.freedesktop.org/show_bug.cgi?id=22127

By using the 3D pipeline we improve our performance by around 4x on
i945, measured by the jxbench microbenchmark, and a factor of 10x by
short-cutting to the 3D pipeline for blended rectangles.

Before, on a i945GME:
  19982.412060 Ops/s; rects (!); 15x15
  9599.131693 Ops/s; rects (!); 75x75
  3803.654743 Ops/s; rects (!); 250x250
  6836.743772 Ops/s; rects blended; 15x15
  1443.750000 Ops/s; rects blended; 75x75
  495.335821 Ops/s; rects blended; 250x250
  23247.933884 Ops/s; rects composition (!); 15x15
  10993.073048 Ops/s; rects composition (!); 75x75
  3595.905172 Ops/s; rects composition (!); 250x250

After:
  87271.145975 Ops/s; rects (!); 15x15
  32347.744361 Ops/s; rects (!); 75x75
  5884.177215 Ops/s; rects (!); 250x250
  73500.000000 Ops/s; rects blended; 15x15
  33580.882353 Ops/s; rects blended; 75x75
  5858.811749 Ops/s; rects blended; 250x250
  25582.317073 Ops/s; rects composition (!); 15x15
  6664.728682 Ops/s; rects composition (!); 75x75
  14965.909091 Ops/s; rects composition (!); 250x250 [suspicious]

This has no impact on Cairo, but I have a suspicion from watching xtrace
that Qt likes to blit thousands of 1x1 rectangles with the same colour.
However, we are still around 2-3x slower than the reported figures for
EXA!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-12 12:50:31 +01:00