Commit Graph

11 Commits

Author SHA1 Message Date
Chris Wilson 95866bd6bd sna/video: Use EXTEND_PAD to avoid mixing in the border color
...which is 0 and appears green around an unaligned YUV-video.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38723
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-27 21:08:52 +01:00
Chris Wilson 2c73b4651a sna/gen4+: Use the drawable rectangle offset for copy boxes
Saves a little bit of work whilst emitting the rectangles.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-25 16:32:30 +01:00
Chris Wilson 471115a980 sna: Also allow BLT copies to discard the alpha channel
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-24 13:35:22 +01:00
Chris Wilson 33b2ea0392 sna: Avoid using the BLT to copy between mismatching depths
We either conflated bpp (which fails given a mixture of depth-24 and
depth-30 pixmaps) or neglected to check at all.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-24 11:46:36 +01:00
Chris Wilson ea71133da7 sna/video: Use pwrite for upload of unclipped, unrotated frames
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-22 11:04:56 +01:00
Chris Wilson 1f364c6d24 sna: Reset the kgem state on server regen
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-10 10:37:37 +01:00
Chris Wilson 021209d5d3 sna: Remove the stubs from sna_render.c
These only existed to work around an include order problem, when kgem
was intended to be entirely separable from sna. Moving the function
pointer into kgem simplifies matters.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-09 10:32:48 +01:00
Chris Wilson 790f90a277 sna/gen6: Initialise a couple more composite op members for copy_boxes
Valgrind detected that I missed initialised a couple of fields for
use with the generic state emission paths:

==28683== Conditional jump or move depends on uninitialised value(s)
==28683==    at 0x83BE646: gen6_get_blend (gen6_render.c:251)
==28683==    by 0x83BF769: gen6_emit_state (gen6_render.c:818)
==28683==    by 0x83C38ED: gen6_emit_copy_state (gen6_render.c:2280)
==28683==    by 0x83C3C89: gen6_render_copy_boxes (gen6_render.c:2356)
==28683== Conditional jump or move depends on uninitialised value(s)
==28683==    at 0x83C15C3: gen6_rectangle_begin (gen6_render.c:1458)
==28683==    by 0x83C177D: gen6_get_rectangles (gen6_render.c:1502)
==28683==    by 0x83C3D16: gen6_render_copy_boxes (gen6_render.c:2363)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-07 08:58:59 +01:00
Chris Wilson 407257570f sna/gen6: Flush the pipeline before effecting a change of blend modes
... also make sure that we flush if we change the blend mode for the CA pass.

Reported-by: Ivan Bulatovic <combuster@archlinux.us>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37946
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-05 15:40:00 +01:00
Chris Wilson 0260c4ce32 sna: Fallback if presented with mask under NO_COMPOSITE
Just making sure that the debug paths actually work...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-05 14:39:34 +01:00
Chris Wilson bcef98af56 sna: Introduce a new acceleration model.
The premise is that switching between rings (i.e. the BLT and
RENDER rings) on SandyBridge imposes a large latency overhead whilst
rendering. The cause is that in order to switch rings, we need to split
the batch earlier than is desired and to add serialisation between the
rings. Both of which incur large overhead.

By switching to using a pure 3D blit engine (ok, not so pure as the BLT
engine still has uses for the core drawing model which can not be easily
represented without a combinatorial explosion of shaders) we can take
advantage of additional efficiencies, such as relative relocations, that
have been incorporated into recent hardware advances. However, even
older hardware performs better from avoiding the implicit context
switches and from the batching efficiency of the 3D pipeline...

But this is X, and PolyGlyphBlt still exists and remains in use. So for
the operations that are not worth accelerating in hardware, we introduce a
shadow buffer mechanism through out and reintroduce pixmap migration.
Doing this efficiently is the cornerstone of ensuring that we do exploit
the increased potential of recent hardware for running old applications and
environments (i.e. so that the latest and greatest chip is actually faster
than gen2!)

For the curious, sna is SandyBridge's New Acceleration. If you are
running older chipsets and welcome the performance increase offered by
this patch, then you may choose to call it Snazzy instead.

Speedups
========
 gen3           firefox-fishtank  1203584.56 (1203842.75 0.01%) -> 85561.71 (125146.44 14.87%): 14.07x speedup
 gen5             grads-heat-map  3385.42 (3489.73 1.44%) -> 350.29 (350.75 0.18%):  9.66x speedup
 gen3          xfce4-terminal-a1  4179.02 (4180.09 0.06%) -> 503.90 (531.88 4.48%):  8.29x speedup
 gen4             grads-heat-map  2458.66 (2826.34 4.64%) -> 348.82 (349.20 0.29%):  7.05x speedup
 gen3             grads-heat-map  1443.33 (1445.32 0.09%) -> 298.55 (298.76 0.05%):  4.83x speedup
 gen3             swfdec-youtube  3836.14 (3894.14 0.95%) -> 889.84 (979.56 5.99%):  4.31x speedup
 gen6             grads-heat-map  742.11 (744.44 0.15%) -> 172.51 (172.93 0.20%):  4.30x speedup
 gen3          firefox-talos-svg  71740.44 (72370.13 0.59%) -> 21959.29 (21995.09 0.68%):  3.27x speedup
 gen5                       gvim  8045.51 (8071.47 0.17%) -> 2589.38 (3246.78 10.74%):  3.11x speedup
 gen6                    poppler  3800.78 (3817.92 0.24%) -> 1227.36 (1230.12 0.30%):  3.10x speedup
 gen6         gnome-terminal-vim  9106.84 (9111.56 0.03%) -> 3459.49 (3478.52 0.25%):  2.63x speedup
 gen5              midori-zoomed  9564.53 (9586.58 0.17%) -> 3677.73 (3837.02 2.02%):  2.60x speedup
 gen5         gnome-terminal-vim  38167.25 (38215.82 0.08%) -> 14901.09 (14902.28 0.01%):  2.56x speedup
 gen5                    poppler  13575.66 (13605.04 0.16%) -> 5554.27 (5555.84 0.01%):  2.44x speedup
 gen5         swfdec-giant-steps  8941.61 (8988.72 0.52%) -> 3851.98 (3871.01 0.93%):  2.32x speedup
 gen5          xfce4-terminal-a1  18956.60 (18986.90 0.07%) -> 8362.75 (8365.70 0.01%):  2.27x speedup
 gen5           firefox-fishtank  88750.31 (88858.23 0.14%) -> 39164.57 (39835.54 0.80%):  2.27x speedup
 gen3              midori-zoomed  2392.13 (2397.82 0.14%) -> 1109.96 (1303.10 30.35%):  2.16x speedup
 gen6                       gvim  2510.34 (2513.34 0.20%) -> 1200.76 (1204.30 0.22%):  2.09x speedup
 gen5       firefox-planet-gnome  40478.16 (40565.68 0.09%) -> 19606.22 (19648.79 0.16%):  2.06x speedup
 gen5       gnome-system-monitor  10344.47 (10385.62 0.29%) -> 5136.69 (5256.85 1.15%):  2.01x speedup
 gen3                    poppler  2595.23 (2603.10 0.17%) -> 1297.56 (1302.42 0.61%):  2.00x speedup
 gen6          firefox-talos-gfx  7184.03 (7194.97 0.13%) -> 3806.31 (3811.66 0.06%):  1.89x speedup
 gen5                  evolution  8739.25 (8766.12 0.27%) -> 4817.54 (5050.96 1.54%):  1.81x speedup
 gen3                  evolution  1684.06 (1696.88 0.35%) -> 1004.99 (1008.55 0.85%):  1.68x speedup
 gen3         gnome-terminal-vim  4285.13 (4287.68 0.04%) -> 2715.97 (3202.17 13.52%):  1.58x speedup
 gen5             swfdec-youtube  5843.94 (5951.07 0.91%) -> 3810.86 (3826.04 1.32%):  1.53x speedup
 gen4                    poppler  7496.72 (7558.83 0.58%) -> 5125.08 (5247.65 1.44%):  1.46x speedup
 gen4         gnome-terminal-vim  21126.24 (21292.08 0.85%) -> 14590.25 (15066.33 1.80%):  1.45x speedup
 gen5          firefox-talos-svg  99873.69 (100300.95 0.37%) -> 70745.66 (70818.86 0.05%):  1.41x speedup
 gen4       firefox-planet-gnome  28205.10 (28304.45 0.27%) -> 19996.11 (20081.44 0.56%):  1.41x speedup
 gen5          firefox-talos-gfx  93070.85 (93194.72 0.10%) -> 67687.93 (70374.37 1.30%):  1.37x speedup
 gen4                  evolution  6696.25 (6854.14 0.85%) -> 4958.62 (5027.73 0.85%):  1.35x speedup
 gen3         swfdec-giant-steps  2538.03 (2539.30 0.04%) -> 1895.71 (2050.62 62.43%):  1.34x speedup
 gen4                       gvim  4356.18 (4422.78 0.70%) -> 3276.31 (3281.69 0.13%):  1.33x speedup
 gen6                  evolution  1242.13 (1245.44 0.72%) -> 953.76 (954.54 0.07%):  1.30x speedup
 gen6       firefox-planet-gnome  4554.23 (4560.69 0.08%) -> 3758.76 (3768.97 0.28%):  1.21x speedup
 gen3          firefox-talos-gfx  6264.13 (6284.65 0.30%) -> 5261.56 (5370.87 1.28%):  1.19x speedup
 gen4              midori-zoomed  4771.13 (4809.90 0.73%) -> 4037.03 (4118.93 0.85%):  1.18x speedup
 gen6         swfdec-giant-steps  1557.06 (1560.13 0.12%) -> 1336.34 (1341.29 0.32%):  1.17x speedup
 gen4          firefox-talos-gfx  80767.28 (80986.31 0.17%) -> 69629.08 (69721.71 0.06%):  1.16x speedup
 gen6              midori-zoomed  1463.70 (1463.76 0.08%) -> 1331.45 (1336.56 0.22%):  1.10x speedup
Slowdowns
=========
 gen6          xfce4-terminal-a1  2030.25 (2036.23 0.25%) -> 2144.60 (2240.31 4.29%):  1.06x slowdown
 gen4             swfdec-youtube  3580.00 (3597.23 3.92%) -> 3826.90 (3862.24 0.91%):  1.07x slowdown
 gen4          firefox-talos-svg  66112.25 (66256.51 0.11%) -> 71433.40 (71584.31 0.14%):  1.08x slowdown
 gen4       gnome-system-monitor  5691.60 (5724.03 0.56%) -> 6707.56 (6747.83 0.33%):  1.18x slowdown
 gen3                  ocitysmap  3494.05 (3502.44 0.20%) -> 4321.99 (4524.42 2.78%):  1.24x slowdown
 gen4                  ocitysmap  3628.42 (3641.66 9.37%) -> 5177.16 (5828.74 8.38%):  1.43x slowdown
 gen5                  ocitysmap  4027.77 (4068.11 0.80%) -> 5748.26 (6282.25 7.38%):  1.43x slowdown
 gen6                  ocitysmap  1401.61 (1402.24 0.40%) -> 2365.74 (2379.14 4.12%):  1.69x slowdown

[Note the performance regression for ocitysmap comes from that we now
attempt to support rendering to and (more importantly) from large
surfaces. By enabling such operations is the only way to one day be
faster than purely using the CPU, in the meantime we suffer regression
due to the increased migration and aperture thrashing. The other couple
of regressions will be eliminated with improved span and shader support,
now that the framework for such is in place.]

The performance increase for Cairo completely overlooks the other
critical aspects of the architecture:

World of Padman:
gen3 (800x600):   57.5 ->  96.2
gen4 (800x600):   47.8 ->  74.6
gen6 (1366x768): 100.4 -> 140.3 [F15]
                 144.3 -> 146.4 [drm-intel-next]

x11perf (gen6);
aa10text:     3.47 -> 14.3 Mglyphs/s [unthrottled!]
copywinwin10: 1.66 -> 1.99 Mops/s
copywinpix10: 2.28 -> 2.98 Mops/s

And we do not have a good measure for how much improvement the reworking
of the fallback paths give, except that xterm is now over 4x faster...

PS: This depends upon the Xorg patchset "Remove the cacheing of the last
scratch PixmapRec" for correct invalidations of scratch Pixmaps (used by
the dix to implement SHM operations, used by chromium and gtk+ pixbufs.

PPS: ./configure --enable-sna

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-04 09:19:46 +01:00