Commit Graph

4 Commits

Author SHA1 Message Date
Chris Wilson d4c82a16bc test: Remove the blit through a temporary Pixmap
Originally this was inplace so that we wouldn't simply migrate the
target away from the GPU whenever we inspected results. That is no
longer a problem and so we can speed up the tests by skipping the
temporary.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2015-05-18 09:16:12 +01:00
Chris Wilson 8d7e7010e3 test: Increase number of tiled sources
Significantly improve the stress impose upon the tiled BLT operations.
Also start dumping pngs of the failures.

References: https://bugs.freedesktop.org/show_bug.cgi?id=80033
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2014-06-20 15:08:56 +01:00
Chris Wilson c76714c29d test: Add a basic line tester
Starting with exercising drawing of a single segment.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-11-06 09:43:47 +00:00
Chris Wilson bcef98af56 sna: Introduce a new acceleration model.
The premise is that switching between rings (i.e. the BLT and
RENDER rings) on SandyBridge imposes a large latency overhead whilst
rendering. The cause is that in order to switch rings, we need to split
the batch earlier than is desired and to add serialisation between the
rings. Both of which incur large overhead.

By switching to using a pure 3D blit engine (ok, not so pure as the BLT
engine still has uses for the core drawing model which can not be easily
represented without a combinatorial explosion of shaders) we can take
advantage of additional efficiencies, such as relative relocations, that
have been incorporated into recent hardware advances. However, even
older hardware performs better from avoiding the implicit context
switches and from the batching efficiency of the 3D pipeline...

But this is X, and PolyGlyphBlt still exists and remains in use. So for
the operations that are not worth accelerating in hardware, we introduce a
shadow buffer mechanism through out and reintroduce pixmap migration.
Doing this efficiently is the cornerstone of ensuring that we do exploit
the increased potential of recent hardware for running old applications and
environments (i.e. so that the latest and greatest chip is actually faster
than gen2!)

For the curious, sna is SandyBridge's New Acceleration. If you are
running older chipsets and welcome the performance increase offered by
this patch, then you may choose to call it Snazzy instead.

Speedups
========
 gen3           firefox-fishtank  1203584.56 (1203842.75 0.01%) -> 85561.71 (125146.44 14.87%): 14.07x speedup
 gen5             grads-heat-map  3385.42 (3489.73 1.44%) -> 350.29 (350.75 0.18%):  9.66x speedup
 gen3          xfce4-terminal-a1  4179.02 (4180.09 0.06%) -> 503.90 (531.88 4.48%):  8.29x speedup
 gen4             grads-heat-map  2458.66 (2826.34 4.64%) -> 348.82 (349.20 0.29%):  7.05x speedup
 gen3             grads-heat-map  1443.33 (1445.32 0.09%) -> 298.55 (298.76 0.05%):  4.83x speedup
 gen3             swfdec-youtube  3836.14 (3894.14 0.95%) -> 889.84 (979.56 5.99%):  4.31x speedup
 gen6             grads-heat-map  742.11 (744.44 0.15%) -> 172.51 (172.93 0.20%):  4.30x speedup
 gen3          firefox-talos-svg  71740.44 (72370.13 0.59%) -> 21959.29 (21995.09 0.68%):  3.27x speedup
 gen5                       gvim  8045.51 (8071.47 0.17%) -> 2589.38 (3246.78 10.74%):  3.11x speedup
 gen6                    poppler  3800.78 (3817.92 0.24%) -> 1227.36 (1230.12 0.30%):  3.10x speedup
 gen6         gnome-terminal-vim  9106.84 (9111.56 0.03%) -> 3459.49 (3478.52 0.25%):  2.63x speedup
 gen5              midori-zoomed  9564.53 (9586.58 0.17%) -> 3677.73 (3837.02 2.02%):  2.60x speedup
 gen5         gnome-terminal-vim  38167.25 (38215.82 0.08%) -> 14901.09 (14902.28 0.01%):  2.56x speedup
 gen5                    poppler  13575.66 (13605.04 0.16%) -> 5554.27 (5555.84 0.01%):  2.44x speedup
 gen5         swfdec-giant-steps  8941.61 (8988.72 0.52%) -> 3851.98 (3871.01 0.93%):  2.32x speedup
 gen5          xfce4-terminal-a1  18956.60 (18986.90 0.07%) -> 8362.75 (8365.70 0.01%):  2.27x speedup
 gen5           firefox-fishtank  88750.31 (88858.23 0.14%) -> 39164.57 (39835.54 0.80%):  2.27x speedup
 gen3              midori-zoomed  2392.13 (2397.82 0.14%) -> 1109.96 (1303.10 30.35%):  2.16x speedup
 gen6                       gvim  2510.34 (2513.34 0.20%) -> 1200.76 (1204.30 0.22%):  2.09x speedup
 gen5       firefox-planet-gnome  40478.16 (40565.68 0.09%) -> 19606.22 (19648.79 0.16%):  2.06x speedup
 gen5       gnome-system-monitor  10344.47 (10385.62 0.29%) -> 5136.69 (5256.85 1.15%):  2.01x speedup
 gen3                    poppler  2595.23 (2603.10 0.17%) -> 1297.56 (1302.42 0.61%):  2.00x speedup
 gen6          firefox-talos-gfx  7184.03 (7194.97 0.13%) -> 3806.31 (3811.66 0.06%):  1.89x speedup
 gen5                  evolution  8739.25 (8766.12 0.27%) -> 4817.54 (5050.96 1.54%):  1.81x speedup
 gen3                  evolution  1684.06 (1696.88 0.35%) -> 1004.99 (1008.55 0.85%):  1.68x speedup
 gen3         gnome-terminal-vim  4285.13 (4287.68 0.04%) -> 2715.97 (3202.17 13.52%):  1.58x speedup
 gen5             swfdec-youtube  5843.94 (5951.07 0.91%) -> 3810.86 (3826.04 1.32%):  1.53x speedup
 gen4                    poppler  7496.72 (7558.83 0.58%) -> 5125.08 (5247.65 1.44%):  1.46x speedup
 gen4         gnome-terminal-vim  21126.24 (21292.08 0.85%) -> 14590.25 (15066.33 1.80%):  1.45x speedup
 gen5          firefox-talos-svg  99873.69 (100300.95 0.37%) -> 70745.66 (70818.86 0.05%):  1.41x speedup
 gen4       firefox-planet-gnome  28205.10 (28304.45 0.27%) -> 19996.11 (20081.44 0.56%):  1.41x speedup
 gen5          firefox-talos-gfx  93070.85 (93194.72 0.10%) -> 67687.93 (70374.37 1.30%):  1.37x speedup
 gen4                  evolution  6696.25 (6854.14 0.85%) -> 4958.62 (5027.73 0.85%):  1.35x speedup
 gen3         swfdec-giant-steps  2538.03 (2539.30 0.04%) -> 1895.71 (2050.62 62.43%):  1.34x speedup
 gen4                       gvim  4356.18 (4422.78 0.70%) -> 3276.31 (3281.69 0.13%):  1.33x speedup
 gen6                  evolution  1242.13 (1245.44 0.72%) -> 953.76 (954.54 0.07%):  1.30x speedup
 gen6       firefox-planet-gnome  4554.23 (4560.69 0.08%) -> 3758.76 (3768.97 0.28%):  1.21x speedup
 gen3          firefox-talos-gfx  6264.13 (6284.65 0.30%) -> 5261.56 (5370.87 1.28%):  1.19x speedup
 gen4              midori-zoomed  4771.13 (4809.90 0.73%) -> 4037.03 (4118.93 0.85%):  1.18x speedup
 gen6         swfdec-giant-steps  1557.06 (1560.13 0.12%) -> 1336.34 (1341.29 0.32%):  1.17x speedup
 gen4          firefox-talos-gfx  80767.28 (80986.31 0.17%) -> 69629.08 (69721.71 0.06%):  1.16x speedup
 gen6              midori-zoomed  1463.70 (1463.76 0.08%) -> 1331.45 (1336.56 0.22%):  1.10x speedup
Slowdowns
=========
 gen6          xfce4-terminal-a1  2030.25 (2036.23 0.25%) -> 2144.60 (2240.31 4.29%):  1.06x slowdown
 gen4             swfdec-youtube  3580.00 (3597.23 3.92%) -> 3826.90 (3862.24 0.91%):  1.07x slowdown
 gen4          firefox-talos-svg  66112.25 (66256.51 0.11%) -> 71433.40 (71584.31 0.14%):  1.08x slowdown
 gen4       gnome-system-monitor  5691.60 (5724.03 0.56%) -> 6707.56 (6747.83 0.33%):  1.18x slowdown
 gen3                  ocitysmap  3494.05 (3502.44 0.20%) -> 4321.99 (4524.42 2.78%):  1.24x slowdown
 gen4                  ocitysmap  3628.42 (3641.66 9.37%) -> 5177.16 (5828.74 8.38%):  1.43x slowdown
 gen5                  ocitysmap  4027.77 (4068.11 0.80%) -> 5748.26 (6282.25 7.38%):  1.43x slowdown
 gen6                  ocitysmap  1401.61 (1402.24 0.40%) -> 2365.74 (2379.14 4.12%):  1.69x slowdown

[Note the performance regression for ocitysmap comes from that we now
attempt to support rendering to and (more importantly) from large
surfaces. By enabling such operations is the only way to one day be
faster than purely using the CPU, in the meantime we suffer regression
due to the increased migration and aperture thrashing. The other couple
of regressions will be eliminated with improved span and shader support,
now that the framework for such is in place.]

The performance increase for Cairo completely overlooks the other
critical aspects of the architecture:

World of Padman:
gen3 (800x600):   57.5 ->  96.2
gen4 (800x600):   47.8 ->  74.6
gen6 (1366x768): 100.4 -> 140.3 [F15]
                 144.3 -> 146.4 [drm-intel-next]

x11perf (gen6);
aa10text:     3.47 -> 14.3 Mglyphs/s [unthrottled!]
copywinwin10: 1.66 -> 1.99 Mops/s
copywinpix10: 2.28 -> 2.98 Mops/s

And we do not have a good measure for how much improvement the reworking
of the fallback paths give, except that xterm is now over 4x faster...

PS: This depends upon the Xorg patchset "Remove the cacheing of the last
scratch PixmapRec" for correct invalidations of scratch Pixmaps (used by
the dix to implement SHM operations, used by chromium and gtk+ pixbufs.

PPS: ./configure --enable-sna

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-04 09:19:46 +01:00