Commit Graph

3080 Commits

Author SHA1 Message Date
Chris Wilson 030d56279b drm: don't overwrite the old intel->front_buffer
It's now handled in the common ExchangeBuffers() path.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-14 17:30:23 +01:00
Chris Wilson 5bd0227395 i830: Teardown batch entries on reset.
By not cleaning up the batch entries when resetting the X server, we left
the pointers in an inconsistent state and caused X to crash.
2010-05-14 15:50:05 +01:00
Chris Wilson 0d2392d44a dri: Hold reference to buffers across swap
As we schedule swaps for some time in the future and may process a
detachment prior to receiving the vblank notification from the kernel,
we need to hold a reference to the buffers for our swap event handler.

Fixes:
  Bug 28080 - "glresize" causes X server segfault with indirect rendering.
  https://bugs.freedesktop.org/show_bug.cgi?id=28080

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-14 10:32:12 +01:00
Chris Wilson 8de09a0707 uxa: Convert 1x1R back to solid_fill
In the change to prevent blitting between incompatible sources, we also
prevented 1x1R pixmaps from being used for solid fills. Reorder the
sequence of conditions to enable this fast path again.
2010-05-13 17:17:54 +01:00
Chris Wilson 92e9cf8af7 uxa: Only use solid_fill for SRC. 2010-05-13 17:17:54 +01:00
Chris Wilson d1bd14e8b6 uxa: Replace source for CLEAR with a transparent solid
This means that we will hit the faster try_solid_fill path instead.
2010-05-13 17:17:54 +01:00
Chris Wilson cdab72c405 uxa: Fallback early if compositing with alphaMaps 2010-05-13 17:17:54 +01:00
Chris Wilson 25811dc7b7 i915: Force output alpha to 1. if dst has no alpha channel.
Ensure that garbage is not stored in the unused alpha channel so that
we can rely on it being currently initialiased when used as a source or
returning via GetImage.

Partial fix for rendercheck -t blend
2010-05-13 17:17:10 +01:00
Chris Wilson 0e726b85ca i915: Add a2r10g10b10 format and friends
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-13 09:40:27 +01:00
Chris Wilson 9f54107f86 dri2: Handle reference counting across page flipping
1. Instead of swapping bos, swap the entire private structure.

2. If we update the pixmap bo for the Screen, make sure we update the
reference inside intel->front_buffer so that xrandr still functions.

Fixes:

  Bug 27922 - i965: Rapidly resizing OpenGL window causes GPU to hang.
  https://bugs.freedesktop.org/show_bug.cgi?id=27922

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-12 21:37:49 +01:00
Chris Wilson 6c27f6e4f7 uxa: Avoid glyph ping-pong with !offscreen destination
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-12 12:50:31 +01:00
Chris Wilson d5383c2073 uxa: Avoid ping-pong with !offscreen destination and traps
If we are destined to target an !offscreen drawable, then uploading the
trapezoid mask to a bo is the last thing we actually want to do...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-12 12:50:31 +01:00
Chris Wilson 00664b8f9d uxa: Fallback when compositing to a !offscreen destination
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-12 12:50:31 +01:00
Chris Wilson 0c6372a77f i830: Prevent allocation of bo larger than half the aperture
We need to prevent overcommitting the aperture, and in particular if we
allocate a buffer larger than available space we will fail to mmap it in
and rendering will fail. Trying to allocate multiple large buffers in
the aperture, often the case when falling back, causes thrashes and
eviction of useful buffers. So from the outset simply do not allocate a
bo if the the required size is more than half the available aperture
space.

Fixes allocation failure in ocitymap.trace for instance.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-12 12:50:31 +01:00
Chris Wilson 244b7cbfff uxa: Use accelerated PutImage for uploading pixman images.
Short-circuits the current use of PutImage from CopyArea, bypassing all
the temporary allocations.
2010-05-12 12:50:31 +01:00
Chris Wilson cb887cfc67 uxa: solid rects
The cost of performing relocations outweigh the advantages of using the
blitter for solids with lots of rectangles.

References:

  Bug 22127 - [UXA] 50% performance regression for XRenderFillRectangles
  https://bugs.freedesktop.org/show_bug.cgi?id=22127

By using the 3D pipeline we improve our performance by around 4x on
i945, measured by the jxbench microbenchmark, and a factor of 10x by
short-cutting to the 3D pipeline for blended rectangles.

Before, on a i945GME:
  19982.412060 Ops/s; rects (!); 15x15
  9599.131693 Ops/s; rects (!); 75x75
  3803.654743 Ops/s; rects (!); 250x250
  6836.743772 Ops/s; rects blended; 15x15
  1443.750000 Ops/s; rects blended; 75x75
  495.335821 Ops/s; rects blended; 250x250
  23247.933884 Ops/s; rects composition (!); 15x15
  10993.073048 Ops/s; rects composition (!); 75x75
  3595.905172 Ops/s; rects composition (!); 250x250

After:
  87271.145975 Ops/s; rects (!); 15x15
  32347.744361 Ops/s; rects (!); 75x75
  5884.177215 Ops/s; rects (!); 250x250
  73500.000000 Ops/s; rects blended; 15x15
  33580.882353 Ops/s; rects blended; 75x75
  5858.811749 Ops/s; rects blended; 250x250
  25582.317073 Ops/s; rects composition (!); 15x15
  6664.728682 Ops/s; rects composition (!); 75x75
  14965.909091 Ops/s; rects composition (!); 250x250 [suspicious]

This has no impact on Cairo, but I have a suspicion from watching xtrace
that Qt likes to blit thousands of 1x1 rectangles with the same colour.
However, we are still around 2-3x slower than the reported figures for
EXA!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-12 12:50:31 +01:00
Chris Wilson c8e10f7791 debug: Add names for operators
Most useful for confirming my worst fears: unwarranted use of
OutReverse + Add.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-12 12:48:21 +01:00
Chris Wilson 6ea8ce640f xvmc: Build fix with -pedantic
Fixes:

  Bug 27352 - RPMLINT error causes build breakage
  https://bugs.freedesktop.org/show_bug.cgi?id=27352

Reported-by: Johannes Obermayr <johannesobermayr@gmx.de>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-11 19:39:01 +01:00
Chris Wilson e1b7e8bf1d drmmode: Reorder i830_set_pixmap_bo() so that the correct stride is used.
The pitch needs to be set on the pixmap prior to the private
intel_pixmap structure being created so that it can record the correct
value from the pixmap.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-11 15:54:18 +01:00
Chris Wilson dfbaf9aab8 i830: Never create a bo for depth=1 pixmaps.
As we can not accelerate these either as a destination or a source,
don't bother allocating a buffer object for them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-11 15:01:00 +01:00
Chris Wilson 5b7efe375a i830: Use set_pixmap_bo() instead of open-coding.
The advantage is that this enables in-flight reuse of the old pixmap if
possible.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-11 15:00:59 +01:00
Chris Wilson ad8af95dd3 i830: Do not cache in-flight non-reusable buffers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-11 15:00:59 +01:00
Chris Wilson f1048e14d5 i965: Add texformats mapping for additional pixman formats
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-11 13:07:19 +01:00
Chris Wilson a35afd4a2d uxa: Recheck texture after acquiring pattern.
As the first step to handling unsupported texture formats, double check
that the converted pattern can be used as a texture by the card.

Fixes: rendercheck -t repeat

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-11 13:07:03 +01:00
Keith Packard d745cab6c4 Must call ValidateGC in i830_uxa_put_image for scratch GC
Always need to call ValidateGC or the scratch GC will not get the
right composite clip.

Signed-off-by: Keith Packard <keithp@keithp.com>
2010-05-10 22:59:52 -07:00
Chris Wilson 3eded4202e i915: Fix pixmap based masks.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-10 23:38:17 +01:00
Chris Wilson 1ecd89be03 uxa: Protect against valid SourcePict in uxa_acquire_mask()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-10 23:33:52 +01:00
Chris Wilson a8761585ef i830: Minor cleanup
Remove some extraneous prototypes and unused variables.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-10 19:38:24 +01:00
Chris Wilson 9e9b0d85da i830: Update stride when swapping bo for PutImage
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-10 18:37:26 +01:00
Chris Wilson 0d4dd00aea uxa,i915: Handle SourcePict through uxa_composite()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-10 12:29:26 +01:00
Chris Wilson 21c1c3c7f6 i915: Use 1x1R pixmap for solid drawables
x11perf has a regression
  https://bugs.freedesktop.org/show_bug.cgi?id=25068

caused by

  commit e581ceb738
  i915: Use the color channels to pass along solid sources and masks.

Do not convert 1x1R pixmaps into a solid color as the readback from the
bo negates all the performances advantages of using a smaller vertex
buffer and fewer samplers.

Before (PineView):
  aa=66800 glyph/s, rgb=28800 glyphs/s

Now:
  aa=96800 glyphs/s, rgb=48500 glyphs/s

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-10 10:36:15 +01:00
Chris Wilson f52b6e8322 uxa: Rearrange checking and preparing of composite textures.
x11perf regression caused by 2D driver
  https://bugs.freedesktop.org/show_bug.cgi?id=28047

caused by

  commit a7b800513f
  uxa: Extract sub-region from in-memory buffers.

The issue is that as we extract the region prior to checking whether the
composite can in fact be accelerated, we perform expensive surplus
operations. This is particularly noticeable for ComponentAlpha text,
such as rgb10text. The solution here is to rearrange the
check_composite() prior to acquiring the sources, and only extracting
the subregion if the render path can not actually handle the texture.

Performance (on PineView):
a7b800513^: aa=68600 glyphs/s, rgb=29900 glyphs/s
a7b800513: aa=65700 glyphs/s, rgb=13200 glyphs/s
now: aa=66800 glyph/s, rgb=28800 glyphs/s

The residual lossage seems to be from the extra function call and
dixPrivate lookups. Hmm. More warning is the extremely low performance,
however the results are consistent so the improvement looks real...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-10 10:36:14 +01:00
Chris Wilson 848ab66384 uxa: Transform composites with a simple translation into a blit
We can also convert a composite with an integer translation into a
blit, so long as the sample extents remains within the source.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-08 19:35:28 +01:00
Chris Wilson a7b800513f uxa: Extract sub-region from in-memory buffers.
If the buffer is too large or not suitable for a GPU operation, we
currently fallback and perform the composite on the CPU. An alternative
is too extract the small region out of the source (as usually the
sample extents are much smaller than the actual surface size) and try
the composite with the new surface.

The effect is particularly noticeable on pathological websites that use
very large background images. For example, http://www.woodtv.com/ uses a
1299x15000 pattern that is obscured by another opaque pattern.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-05-08 19:35:07 +01:00
Chris Wilson 8562b7bc67 i830: prepare the uxa pixmap for fbCopyArea.
Complete the prepare access for the PutImage fallback via fbCopyArea(),
by remembering to set the private pointer to the GTT mapping.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-04-27 10:29:16 +01:00
Chris Wilson 9a5cd65b59 i830: if pixman_blt() fails fallback to fbCopyArea()
On older versions of pixman, pixman_blt() can return false if the images
are <= 8bpp. If we are being called from CopyArea, then we cannot return
FALSE here as that will trigger an infinite recursion. Instead we must
manually perform the fallback using fbCopyArea().

Reported-by: Peter Clifton <pcjc2@cam.ac.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-04-26 09:14:17 +01:00
Chris Wilson 86d349aa7b i830: tidy in flight bo reuse.
A left-over cleanup patch for c374c94. *sigh*

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-04-26 09:13:54 +01:00
Daniel Vetter 72fd7d191c Fix "make dist"
This is some fallout from my xvmc cleanup.

Original-Patch-by: Rico Tzschichholz <ricotz@t-online.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2010-04-19 21:56:57 +02:00
Daniel Vetter 9494f4e91f i810: adjust the pitch for DRI rendering
Current code forgot to adjust the pitch of the frontbuffer.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=16729
2010-04-16 22:24:01 +02:00
Chris Wilson c374c94e41 uxa: Reuse in-flight bo
When we need to allocate a new bo for use as a gpu target, first check
if we can reuse a pixmap that has already been relocated into the
aperture as a temporary target, for instance a glyph mask or a clip mask.

Before:
backend                      test   min(s) median(s) stddev.
xlib         firefox-planet-gnome   50.568   50.873   0.30%
 xcb         firefox-planet-gnome   49.686   53.003   3.92%
xlib                    evolution   40.115   40.131   0.86%
 xcb                    evolution   28.241   28.285   0.18%

After:
backend                      test   min(s) median(s) stddev.
xlib         firefox-planet-gnome   47.759   48.233   0.80%
 xcb         firefox-planet-gnome   48.611   48.657   0.87%
xlib                    evolution   38.954   38.991   0.05%
 xcb                    evolution   26.561   26.654   0.19%

And even more dramatic improvements when using a font size larger than
the maximum size of the glyph cache:
 xcb firefox-36-20090611:  1.79x speedup
xlib firefox-36-20090611:  1.74x speedup
 xcb firefox-36-20090609:  1.62x speedup
xlib firefox-36-20090609:  1.59x speedup

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-04-15 21:37:32 +01:00
Chris Wilson 96aa7a236a i830: Allocate bo's for glyphs larger than 32x32.
As we only use the glyph cache for small glyphs, those large than 32x32
will first be copied to a bo and used as a mask in a composite
operation. We can avoid the allocation and upload per use by allocating
a bo for the over-sized glyph from the start. As the glyph is large
anyway, the excess memory allocation is less significant.

Using normal font sizes, firefox shows no change - as expected. However,
using the 36 font size traces, we see around a 10% improvement on g45.

Before:
      xcb          firefox-36-20090609  127.333  127.897   0.22%
      xcb          firefox-36-20090611   87.456   88.624   0.66%
      xcb             firefox-20090601   19.522   20.194   1.69%
     xlib          firefox-36-20090609  201.054  201.780   0.18%
     xlib          firefox-36-20090611  133.468  133.717   0.09%
     xlib             firefox-20090601   23.740   23.975   0.49%

With large glyphs in bo:
      xcb          firefox-36-20090609  117.256  118.254   0.42%
      xcb          firefox-36-20090611   79.462   79.962   0.31%
      xcb             firefox-20090601   19.658   20.024   0.92%
     xlib          firefox-36-20090609  185.645  188.202   0.68%
     xlib          firefox-36-20090611  123.592  124.940   0.54%
     xlib             firefox-20090601   23.917   24.098   0.38%

Thanks to Owain G. Ainsworth for the suggestion!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-04-14 17:10:09 +01:00
Chris Wilson 2d17bd50af Revert "Revert "uxa: Try using put_image when copying from a memory buffer.""
This reverts commit 6d50553e8f.

Now we have taught the fallback path not to infinitely recurse,
re-enable the accelerated path for ShmPutImage and friends.
2010-04-14 17:10:09 +01:00
Chris Wilson 1cc2c2c44a i830: Use pixman_blt directly for performing the in-memory copy
In order to avoid an infinite recursion after enabling CopyArea to use
the put_image acceleration to either stream a blit or to copy in-place,
we cannot call CopyArea from put_image for the fallback path. Instead,
we can simply call pixman_blt directly, which coincidentally is a tiny
bit faster.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2010-04-14 17:10:05 +01:00
Daniel Vetter 324a2810da i830 render: check aperture space requirements
No point not doing this.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2010-04-13 08:39:43 +02:00
Daniel Vetter 804263c10d render: tell the kernel explicitly when fences are needed
This slighlty improves xrender performance on fence reg starved
i8xx hw.

I've also changed a few function calls to the new names from the
compat ones while looking at the code.

The i915 textured video path is not converted because atm the xv
code does not use tiled surfaces.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2010-04-13 08:34:20 +02:00
Daniel Vetter a619a78312 i915 render: use tiling bits where possible
This is in preparation to explicit fence allocation with execbuf2.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2010-04-13 08:34:20 +02:00
Daniel Vetter 55cd36046e i830 render: use tiling bits where possible
This is in preparation to explicit fence allocation with execbuf2.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2010-04-13 08:34:20 +02:00
Eric Anholt 6d50553e8f Revert "uxa: Try using put_image when copying from a memory buffer."
This reverts commit 27195d7dba.
put_image often calls copy_area. Which calls put_image.  Exhausting of
the stack follows.
2010-04-12 13:46:24 -07:00
Chris Wilson 28024f6c5f Revert "uxa: Add fallback warnings for PutImage."
This reverts commit 299b0338d0.
A debugging patch, it was never intended to go into master
2010-04-12 13:44:01 +01:00
Chris Wilson 27195d7dba uxa: Try using put_image when copying from a memory buffer.
Often, for example in the fallback for ShmPutImage, we will attempt to
use uxa_copy_area() copying to a normal pixmap from a memory buffer.
This triggers a fallback, and maps the destination pixmap back into the
GTT. The accelerated put_image path will attempt to stream a blit to the
destination pixmap if it is currently active, avoiding the stall.
2010-04-10 18:50:26 +01:00