Commit Graph

20 Commits

Author SHA1 Message Date
Chris Wilson 7a695c9f6b sna: Fast-path single span boxes
These are very common when compositing unclipped trapezoids, and the
majority of the overhead is in handling the arbitrary number of boxes
and misses out on the constant folding the compiler can do if it is
known we have just one box.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-13 17:43:13 +01:00
Chris Wilson 3b5971bd23 sna/gen2: Restore invariant ENABLES
One deletion too many, unnoticed until the next reboot. Besides the
failure to disable logic op and enable colour buffer blending which
causes a hang if you subsequently try to enable both, you also need
to request texture caching...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-05 22:22:41 +01:00
Chris Wilson 5fa3e73f2c sna/gen[23]: Do as the comments suggest and prefer the BLT
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-05 21:07:10 +01:00
Chris Wilson 9eceddf69f sna/gen2: fix batch buffer acounting
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-04 17:13:57 +01:00
Chris Wilson 5c8a108d2c sna/gen2: Recompute blend pipeline for component-alpha pass
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-01 21:41:23 +01:00
Chris Wilson 121511d3bd sna/gen2: Pack solid sources into the default diffuse component
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-01 21:41:23 +01:00
Chris Wilson a303f85c16 sna/gen2: Remove unused state from invariant setup
... and also some state that gets clobbered when we install the
composite pipelines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-01 21:41:23 +01:00
Chris Wilson f6c8c3bb6f sna/gen2: Use specular component for solid spans
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-01 21:41:23 +01:00
Chris Wilson de14e3c859 sna/gen2: Add missing render fallbacks for blt ops
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-01 21:41:23 +01:00
Chris Wilson ecbf6bbd27 sna/gen2: Implement composite-spans
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-07-01 21:41:23 +01:00
Chris Wilson c0434ab490 sna: Distinguish 830/845 vs 855/865 using the generation id
Remove the PCI ID device checks by using the simpler check on the
generation id for errata pertaining to 830/845.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-30 16:31:28 +01:00
Chris Wilson 1f364c6d24 sna: Reset the kgem state on server regen
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-10 10:37:37 +01:00
Chris Wilson 021209d5d3 sna: Remove the stubs from sna_render.c
These only existed to work around an include order problem, when kgem
was intended to be entirely separable from sna. Moving the function
pointer into kgem simplifies matters.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-09 10:32:48 +01:00
Chris Wilson ad5ead8257 sna/gen2: Support covered xrgb sources on 830/845
830/845 cannot directly sample from an x8r8g8b8 source, but if we know
that we are only sampling from within the confines of the source then we
force the alpha channel to one. (Outside of the source we require the
sampler to return a==0.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-07 11:16:54 +01:00
Chris Wilson 8f97157d2e sna/gen2: Replicate alpha for non-CA masks
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-06 10:21:17 +01:00
Chris Wilson c8a2fa3a2e sna/gen2: Correct command length for CA LOAD_IMMEDIATE_STATE_1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-06 08:43:34 +01:00
Chris Wilson a89fc7181b sna/gen2: Only emit the mask texcoord if there is a mask
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-06 08:39:17 +01:00
Chris Wilson d9344ab8d0 sna/gen2: Set op->floats_per_vertex
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-06 08:25:53 +01:00
Chris Wilson c76ec69660 sna/gen2: The inline primitive takes a length, not a vertex count
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-06 08:12:33 +01:00
Chris Wilson bcef98af56 sna: Introduce a new acceleration model.
The premise is that switching between rings (i.e. the BLT and
RENDER rings) on SandyBridge imposes a large latency overhead whilst
rendering. The cause is that in order to switch rings, we need to split
the batch earlier than is desired and to add serialisation between the
rings. Both of which incur large overhead.

By switching to using a pure 3D blit engine (ok, not so pure as the BLT
engine still has uses for the core drawing model which can not be easily
represented without a combinatorial explosion of shaders) we can take
advantage of additional efficiencies, such as relative relocations, that
have been incorporated into recent hardware advances. However, even
older hardware performs better from avoiding the implicit context
switches and from the batching efficiency of the 3D pipeline...

But this is X, and PolyGlyphBlt still exists and remains in use. So for
the operations that are not worth accelerating in hardware, we introduce a
shadow buffer mechanism through out and reintroduce pixmap migration.
Doing this efficiently is the cornerstone of ensuring that we do exploit
the increased potential of recent hardware for running old applications and
environments (i.e. so that the latest and greatest chip is actually faster
than gen2!)

For the curious, sna is SandyBridge's New Acceleration. If you are
running older chipsets and welcome the performance increase offered by
this patch, then you may choose to call it Snazzy instead.

Speedups
========
 gen3           firefox-fishtank  1203584.56 (1203842.75 0.01%) -> 85561.71 (125146.44 14.87%): 14.07x speedup
 gen5             grads-heat-map  3385.42 (3489.73 1.44%) -> 350.29 (350.75 0.18%):  9.66x speedup
 gen3          xfce4-terminal-a1  4179.02 (4180.09 0.06%) -> 503.90 (531.88 4.48%):  8.29x speedup
 gen4             grads-heat-map  2458.66 (2826.34 4.64%) -> 348.82 (349.20 0.29%):  7.05x speedup
 gen3             grads-heat-map  1443.33 (1445.32 0.09%) -> 298.55 (298.76 0.05%):  4.83x speedup
 gen3             swfdec-youtube  3836.14 (3894.14 0.95%) -> 889.84 (979.56 5.99%):  4.31x speedup
 gen6             grads-heat-map  742.11 (744.44 0.15%) -> 172.51 (172.93 0.20%):  4.30x speedup
 gen3          firefox-talos-svg  71740.44 (72370.13 0.59%) -> 21959.29 (21995.09 0.68%):  3.27x speedup
 gen5                       gvim  8045.51 (8071.47 0.17%) -> 2589.38 (3246.78 10.74%):  3.11x speedup
 gen6                    poppler  3800.78 (3817.92 0.24%) -> 1227.36 (1230.12 0.30%):  3.10x speedup
 gen6         gnome-terminal-vim  9106.84 (9111.56 0.03%) -> 3459.49 (3478.52 0.25%):  2.63x speedup
 gen5              midori-zoomed  9564.53 (9586.58 0.17%) -> 3677.73 (3837.02 2.02%):  2.60x speedup
 gen5         gnome-terminal-vim  38167.25 (38215.82 0.08%) -> 14901.09 (14902.28 0.01%):  2.56x speedup
 gen5                    poppler  13575.66 (13605.04 0.16%) -> 5554.27 (5555.84 0.01%):  2.44x speedup
 gen5         swfdec-giant-steps  8941.61 (8988.72 0.52%) -> 3851.98 (3871.01 0.93%):  2.32x speedup
 gen5          xfce4-terminal-a1  18956.60 (18986.90 0.07%) -> 8362.75 (8365.70 0.01%):  2.27x speedup
 gen5           firefox-fishtank  88750.31 (88858.23 0.14%) -> 39164.57 (39835.54 0.80%):  2.27x speedup
 gen3              midori-zoomed  2392.13 (2397.82 0.14%) -> 1109.96 (1303.10 30.35%):  2.16x speedup
 gen6                       gvim  2510.34 (2513.34 0.20%) -> 1200.76 (1204.30 0.22%):  2.09x speedup
 gen5       firefox-planet-gnome  40478.16 (40565.68 0.09%) -> 19606.22 (19648.79 0.16%):  2.06x speedup
 gen5       gnome-system-monitor  10344.47 (10385.62 0.29%) -> 5136.69 (5256.85 1.15%):  2.01x speedup
 gen3                    poppler  2595.23 (2603.10 0.17%) -> 1297.56 (1302.42 0.61%):  2.00x speedup
 gen6          firefox-talos-gfx  7184.03 (7194.97 0.13%) -> 3806.31 (3811.66 0.06%):  1.89x speedup
 gen5                  evolution  8739.25 (8766.12 0.27%) -> 4817.54 (5050.96 1.54%):  1.81x speedup
 gen3                  evolution  1684.06 (1696.88 0.35%) -> 1004.99 (1008.55 0.85%):  1.68x speedup
 gen3         gnome-terminal-vim  4285.13 (4287.68 0.04%) -> 2715.97 (3202.17 13.52%):  1.58x speedup
 gen5             swfdec-youtube  5843.94 (5951.07 0.91%) -> 3810.86 (3826.04 1.32%):  1.53x speedup
 gen4                    poppler  7496.72 (7558.83 0.58%) -> 5125.08 (5247.65 1.44%):  1.46x speedup
 gen4         gnome-terminal-vim  21126.24 (21292.08 0.85%) -> 14590.25 (15066.33 1.80%):  1.45x speedup
 gen5          firefox-talos-svg  99873.69 (100300.95 0.37%) -> 70745.66 (70818.86 0.05%):  1.41x speedup
 gen4       firefox-planet-gnome  28205.10 (28304.45 0.27%) -> 19996.11 (20081.44 0.56%):  1.41x speedup
 gen5          firefox-talos-gfx  93070.85 (93194.72 0.10%) -> 67687.93 (70374.37 1.30%):  1.37x speedup
 gen4                  evolution  6696.25 (6854.14 0.85%) -> 4958.62 (5027.73 0.85%):  1.35x speedup
 gen3         swfdec-giant-steps  2538.03 (2539.30 0.04%) -> 1895.71 (2050.62 62.43%):  1.34x speedup
 gen4                       gvim  4356.18 (4422.78 0.70%) -> 3276.31 (3281.69 0.13%):  1.33x speedup
 gen6                  evolution  1242.13 (1245.44 0.72%) -> 953.76 (954.54 0.07%):  1.30x speedup
 gen6       firefox-planet-gnome  4554.23 (4560.69 0.08%) -> 3758.76 (3768.97 0.28%):  1.21x speedup
 gen3          firefox-talos-gfx  6264.13 (6284.65 0.30%) -> 5261.56 (5370.87 1.28%):  1.19x speedup
 gen4              midori-zoomed  4771.13 (4809.90 0.73%) -> 4037.03 (4118.93 0.85%):  1.18x speedup
 gen6         swfdec-giant-steps  1557.06 (1560.13 0.12%) -> 1336.34 (1341.29 0.32%):  1.17x speedup
 gen4          firefox-talos-gfx  80767.28 (80986.31 0.17%) -> 69629.08 (69721.71 0.06%):  1.16x speedup
 gen6              midori-zoomed  1463.70 (1463.76 0.08%) -> 1331.45 (1336.56 0.22%):  1.10x speedup
Slowdowns
=========
 gen6          xfce4-terminal-a1  2030.25 (2036.23 0.25%) -> 2144.60 (2240.31 4.29%):  1.06x slowdown
 gen4             swfdec-youtube  3580.00 (3597.23 3.92%) -> 3826.90 (3862.24 0.91%):  1.07x slowdown
 gen4          firefox-talos-svg  66112.25 (66256.51 0.11%) -> 71433.40 (71584.31 0.14%):  1.08x slowdown
 gen4       gnome-system-monitor  5691.60 (5724.03 0.56%) -> 6707.56 (6747.83 0.33%):  1.18x slowdown
 gen3                  ocitysmap  3494.05 (3502.44 0.20%) -> 4321.99 (4524.42 2.78%):  1.24x slowdown
 gen4                  ocitysmap  3628.42 (3641.66 9.37%) -> 5177.16 (5828.74 8.38%):  1.43x slowdown
 gen5                  ocitysmap  4027.77 (4068.11 0.80%) -> 5748.26 (6282.25 7.38%):  1.43x slowdown
 gen6                  ocitysmap  1401.61 (1402.24 0.40%) -> 2365.74 (2379.14 4.12%):  1.69x slowdown

[Note the performance regression for ocitysmap comes from that we now
attempt to support rendering to and (more importantly) from large
surfaces. By enabling such operations is the only way to one day be
faster than purely using the CPU, in the meantime we suffer regression
due to the increased migration and aperture thrashing. The other couple
of regressions will be eliminated with improved span and shader support,
now that the framework for such is in place.]

The performance increase for Cairo completely overlooks the other
critical aspects of the architecture:

World of Padman:
gen3 (800x600):   57.5 ->  96.2
gen4 (800x600):   47.8 ->  74.6
gen6 (1366x768): 100.4 -> 140.3 [F15]
                 144.3 -> 146.4 [drm-intel-next]

x11perf (gen6);
aa10text:     3.47 -> 14.3 Mglyphs/s [unthrottled!]
copywinwin10: 1.66 -> 1.99 Mops/s
copywinpix10: 2.28 -> 2.98 Mops/s

And we do not have a good measure for how much improvement the reworking
of the fallback paths give, except that xterm is now over 4x faster...

PS: This depends upon the Xorg patchset "Remove the cacheing of the last
scratch PixmapRec" for correct invalidations of scratch Pixmaps (used by
the dix to implement SHM operations, used by chromium and gtk+ pixbufs.

PPS: ./configure --enable-sna

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-06-04 09:19:46 +01:00