xf86-video-intel

History

Chris Wilson 32fc0c896e sna/gen6: Prefer the BLT ring, except for copies on behalf of DRI As demonstrated by the all-important trap300, using the BLT is 2x faster than the RENDER ring for the simple case of solid fills. (Though note that performing the relocations costs 3x as much CPU for 2x GPU performance.) One case that may regress from this change is copywinpix which should benefit from the batching in the RENDER commands, and might warrant revisiting in the future (with realistic and synthetic benchmarks in hand!) However, due to the forced stall when switching rings, we still want to perform RENDER copies on behalf of DRI clients and before page-flips. Checking against cairo-perf-trace indicated no major impact -- I had worried that setting the BLT flag for some clears might have had a knock-on effect causing too many operations that could be pipelined on the RENDER ring to be sent to the BLT ring instead. Reported-by: Michael Larabel <Michael@phoronix.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>		2011-08-31 23:58:39 +01:00
..
Makefile.am	Fix typos for distcheck	2011-07-30 09:26:23 +01:00
README	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
blt.c	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
gen2_render.c	sna: Fast-path single span boxes	2011-07-13 17:43:13 +01:00
gen2_render.h	sna/gen2: Use specular component for solid spans	2011-07-01 21:41:23 +01:00
gen3_render.c	sna/gen3: reset blend state after applying CA pass	2011-08-24 21:38:54 +01:00
gen3_render.h	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
gen4_render.c	sna/video: Flush the video state at the end of the operation	2011-08-25 19:55:49 +01:00
gen4_render.h	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
gen5_render.c	sna/video: Flush the video state at the end of the operation	2011-08-25 19:55:49 +01:00
gen5_render.h	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
gen6_render.c	sna/gen6: Prefer the BLT ring, except for copies on behalf of DRI	2011-08-31 23:58:39 +01:00
gen6_render.h	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
gen7_render.c	sna/video: Flush the video state at the end of the operation	2011-08-25 19:55:49 +01:00
gen7_render.h	sna: Port IVB acceleration code (Xrender + Xv)	2011-07-30 08:50:19 +01:00
kgem.c	sna: Retain the GTT space used for an upload buffer	2011-08-29 16:50:12 +01:00
kgem.h	sna: Cleanup up the cache upon close	2011-08-29 15:14:41 +01:00
kgem_debug.c	sna: Force tiled modes for large pitches	2011-07-04 15:27:40 +01:00
kgem_debug.h	sna: Downsample sources 2x too large to fit in the 3D pipeline	2011-07-01 21:41:23 +01:00
kgem_debug_gen2.c	sna/gen2: Add missing stub debug files	2011-07-02 07:51:15 +01:00
kgem_debug_gen3.c	sna: Force tiled modes for large pitches	2011-07-04 15:27:40 +01:00
kgem_debug_gen4.c	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
kgem_debug_gen5.c	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
kgem_debug_gen6.c	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
sna.h	sna: Port IVB acceleration code (Xrender + Xv)	2011-07-30 08:50:19 +01:00
sna_accel.c	sna: Cleanup up the cache upon close	2011-08-29 15:14:41 +01:00
sna_blt.c	sna: Also allow BLT copies to discard the alpha channel	2011-06-24 13:35:22 +01:00
sna_composite.c	sna: Clamp results for computing BoxRec coords from xRectangle	2011-07-09 14:58:35 +01:00
sna_damage.c	sna/damage: Take advantage of marking all-damaged	2011-08-11 19:42:42 +01:00
sna_damage.h	sna/damage: Avoid testing against a completey damaged region	2011-07-13 17:43:13 +01:00
sna_display.c	sna/display: Destroy shadow data	2011-08-25 14:50:33 +01:00
sna_dri.c	sna/gen6: Prefer the BLT ring, except for copies on behalf of DRI	2011-08-31 23:58:39 +01:00
sna_driver.c	sna: Clear structures across server reset	2011-08-25 14:50:37 +01:00
sna_glyphs.c	sna/glyphs: Discard GLYPH_PICTURE hint if the glyph doesn't fit into the cache	2011-07-13 17:41:02 +01:00
sna_gradient.c	sna/gradient: Use a high-precision ramp for a color step rather than fallback	2011-08-11 19:42:42 +01:00
sna_io.c	sna: Don't change tiling modes on replace	2011-07-04 15:27:54 +01:00
sna_module.h	sna: Add zaphod support	2011-06-07 16:54:57 +01:00
sna_reg.h	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
sna_render.c	sna: Clear structures across server reset	2011-08-25 14:50:37 +01:00
sna_render.h	sna: Port IVB acceleration code (Xrender + Xv)	2011-07-30 08:50:19 +01:00
sna_render_inline.h	sna: Also allow BLT copies to discard the alpha channel	2011-06-24 13:35:22 +01:00
sna_stream.c	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
sna_tiling.c	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
sna_transform.c	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
sna_trapezoids.c	sna/trapezoids: Refactor to project the trapezoid only once	2011-08-31 09:55:05 +01:00
sna_video.c	sna: Distinguish 830/845 vs 855/865 using the generation id	2011-06-30 16:31:28 +01:00
sna_video.h	sna/video: Use pwrite for upload of unclipped, unrotated frames	2011-06-22 11:04:56 +01:00
sna_video_hwmc.c	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
sna_video_hwmc.h	sna: Introduce a new acceleration model.	2011-06-04 09:19:46 +01:00
sna_video_overlay.c	sna/video: Downgrade severity of "overlay not found" message	2011-07-02 09:53:11 +01:00
sna_video_textured.c	sna/video: Defend against PutImage to a broken screen	2011-08-29 10:47:45 +01:00

README

SandyBridge's New Acceleration
------------------------------

The guiding principle behind the design is to avoid GPU context switches.
On SandyBridge (and beyond), these are especially pernicious because the
RENDER and BLT engine are now on different rings and require
synchronisation of the various execution units when switching contexts.
They were not cheap on early generation, but with the increasing
complexity of the GPU, avoiding such serialisations is important.

Furthermore, we try very hard to avoid migrating between the CPU and GPU.
Every pixmap (apart from temporary "scratch" surfaces which we intend to
use on the GPU) is created in system memory. All operations are then done
upon this shadow copy until we are forced to move it onto the GPU. Such
migration can only be first triggered by: setting the pixmap as the
scanout (we obviously need a GPU buffer here), using the pixmap as a DRI
buffer (the client expects to perform hardware acceleration and we do not
want to disappoint) and lastly using the pixmap as a RENDER target. This
last is chosen because when we know we are going to perform hardware
acceleration and will continue to do so without fallbacks, using the GPU
is much, much faster than the CPU. The heuristic I chose therefore was
that if the application uses RENDER, i.e. cairo, then it will only be
using those paths and not intermixing core drawing operations and so
unlikely to trigger a fallback.

The complicating case is front-buffer rendering. So in order to accommodate
using RENDER on an application whilst running xterm without a composite
manager redirecting all the pixmaps to backing surfaces, we have to
perform damage tracking to avoid excess migration of portions of the
buffer.