Commit Graph

4213 Commits

Author SHA1 Message Date
Chris Wilson d5456e40d9 uxa/glamor: Silence a compiler warning for some unused code
intel_glamor.c: In function 'intel_glamor_create_screen_image':
intel_glamor.c:192:12: warning: variable 'pixmap' set but not used
[-Wunused-but-set-variable]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-14 10:41:33 +00:00
Chris Wilson 4f1a99a70e sna: Protect against deferred malloc failures for pixel data
As we now defer the allocation of pixel data until first use, it can
fail in the middle of a rendering routine. In order to prevent chasing
us passing a NULL pointer into the fallback routines, we need to propagate
the failure from the malloc and suppress the failure, discarding the
operation, which is less than ideal.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-14 10:35:04 +00:00
Chris Wilson d2c6d950ed sna: Mark upload buffers as unaccessible upon submission
Use valgrind to catch use-after-finish bugs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-14 08:58:31 +00:00
Chris Wilson e39ea29bcc sna: Allow the debugger to map bo from the batch during kgem_submit()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-13 13:18:56 +00:00
Chris Wilson 2fabb5068d sna: Debug fixup for non-LLC systems
The cpu bo is only allocated on LLC systems, so do avoid the NULL deref on
debugging for others.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-13 13:15:51 +00:00
Chris Wilson e037379c8e sna: Fix a debugging assert
The bo is allowed to be NULL, so defer the assert until after it is
known to be non-NULL.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-13 13:14:15 +00:00
Chris Wilson 5d5b2b8ee2 uxa: Cap the maximum number of VMA cached
Since we can not keep an unlimited number of vma cached due to the hard
per-process limits on the number of mappings and recreating mappings is
slow due to excruciatingly slow GTT pagefaults, we need to compromise
and keep a small MRU cache of inactive mmaps.

This uses the new API in libdrm-2.4.29 to specify the limit upon the VMA
cache maintained by libdrm.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-13 11:30:05 +00:00
Chris Wilson 1128825efb uxa: Wakeup 3s after the last rendering to reap the bo-cache
libdrm expires its bo 2s after entry into the cache, but we need to free
a buffer to trigger the reaper. So schedule a timer event to trigger 3s
after the last rendering is submitted to free any resident bo during
long periods of idleness.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-13 11:27:17 +00:00
Chris Wilson db7c9e8561 configure: Link the extra valgrind debugging to --enable-debug
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-13 10:05:25 +00:00
Chris Wilson d02dc0fd84 sna: Set the refcnt on the replacement bo
The paranoia wasn't in vain.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-13 01:38:09 +00:00
Chris Wilson 7472db8c8c sna: Double-check that the submitted buffers were not purged
More paranoia is good for the soul.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-13 01:38:09 +00:00
Chris Wilson 0bbd6a08fe sna/gen2: Tidy checking against too large pixmaps for the 3D pipeline
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-13 01:38:09 +00:00
Chris Wilson b392474f3a sna: Force a suitable minimum stride for 3D temporaries
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-13 01:38:09 +00:00
Chris Wilson 3c22baaba9 sna/gen2: Check for unhandled pitches in the render pipeline
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-13 01:38:09 +00:00
Chris Wilson f6a30df8dc sna: Enable memcpy uploads to SHM pixmaps
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-13 01:38:09 +00:00
Chris Wilson 3c163d105e sna: Use the CPU bo as a render source if compatible and no GPU bo
This is principally to catch the cases of compositing after a fresh
PutImage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-13 01:38:09 +00:00
Chris Wilson c481bec356 sna: Experiment with creating the CPU pixmap using an LLC BO
A poor cousin to vmap is to instead allocate snooped bo and use a CPU
mapping for zero-copy uploads into GPU resident memory. For maximum
performance, we still need tiled GPU buffers so CPU bo are only useful
in situations where we are frequently migrating data.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-13 01:38:09 +00:00
Chris Wilson 6c9aa6f9cf sna: Defer allocation of memory for larger pixmap until first use
In the happy scenario where the pixmap only resides upon the GPU we can
forgo the CPU allocation entirely. The goal is to reduce the number of
needless mmaps performed by the system memory allocator and reduce
overall memory consumption.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-12 16:03:11 +00:00
Chris Wilson 4b48d28f6e sna: Fix a typo, end statements with semi-colons
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-12 10:52:34 +00:00
Chris Wilson 4d20798c78 sna: We need to remap the gpu_only mmap prior to every use
Since the VMA may be reaped at any time whilst the mapping is idle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-12 10:13:20 +00:00
Chris Wilson 2682308c10 sna: Remove bo transference for whole XCopyArea
In benchmarking firefox this performs whose - it would appear the
sources are indeed used more often than not.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-12 09:24:11 +00:00
Chris Wilson 7703424222 sna/gen6: Only use CPU bo for a render target if untiled
For large render targets, we prefer to use tiled bo in order to avoid
severe performance degradation. However, if we don't have a GPU bo but
do have a CPU bo and the operation would be untiled, then simply use the
CPU bo.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-12 00:15:10 +00:00
Chris Wilson a92a41ba32 sna/gen6: Tidy the usage of the max pipeline size
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-11 19:03:54 +00:00
Chris Wilson e9e6d6f7c8 sna/gen3: Move the video dst_bo to make the conditional clearer
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-11 17:25:46 +00:00
Chris Wilson 118ef0781c sna/composite: Make the check for a no-op earlier and clearer
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-11 16:56:53 +00:00
Chris Wilson 2674ef864c sna: Enable hooking up of valgrind during debugging
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-11 16:23:13 +00:00
Chris Wilson c83fd4e24d sna: Add some more debug messages for VMA caching
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-11 16:14:38 +00:00
Chris Wilson 3ae7fb918a sna: Restrict pitch alignment on 945gm to 64 bytes
In theory we should be able to disable dual-stream mode and so be
subject to much looser restrictions (such as the pitch need only be
dword aligned). However, achieving single-stream mode seems quite
difficult!

Reported-by: Paul Neumann <paul104x@yahoo.de>
References: https://bugs.freedesktop.org/show_bug.cgi?id=43706
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-11 13:52:42 +00:00
Chris Wilson 2f35d77cd0 sna: Update computation of untiled pitch to cater for CREATE_SCANOUT
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-11 13:37:18 +00:00
Chris Wilson 5a0139487f sna/gen3: Ensure that depth read/writes are disabled before first use
Our goal is to achieve "single-stream" rendering where the entire
RenderCache is allocated to the colour buffer (rather than split between
colour and depth). In theory all that is required is for the pipeline
not to reference the depth buffer at all, however it is not made clear
when that evaluation is made.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-11 12:37:22 +00:00
Chris Wilson a02bbd8700 sna: Only transfer the bo if the src/dst are of matching size
If the src replaces the dst, it could just be a much larger pixmap!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-11 10:34:37 +00:00
Chris Wilson 43a9964863 sna: Only transfer unpinned buffers
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-11 10:30:48 +00:00
Chris Wilson eb859f6446 uxa/video: Correct the offset of the binding table in the surface buffer
The binding table is intended to be after all the surface descriptions,
so make sure we write it with the appropriate offset into the buffer.

Fixes regression from 699888a64 (uxa/video: Use the common bo
allocations and upload)

Reported-by: Cyril Brulebois <kibi@debian.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43704
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-11 01:38:51 +00:00
Chris Wilson 051a18063d sna: Implement a VMA cache
A VMA cache appears unavoidable thanks to compiz and an excrutiatingly
slow GTT pagefault, though it does look like it will be ineffectual
during everyday usage. Compiz (and presumably other compositing
managers) appears to be undoing all the pagefault minimisation as
demonstrated on gen5 with large XPutImage. It also appears the CPU to
memory bandwidth ratio plays a crucial role in determining whether
going straight to GTT or through the CPU cache is a win - so no trivial
heuristic.

x11perf -putimage10 -putimage500 on i5-2467m:
Before:
  bare:   1150,000   2,410
  compiz:  438,000   2,670
After:
  bare:   1190,000   2,730
  compiz:  437,000   2,690
UXA:
  bare:    658,000   2,670
  compiz:  389,000   2,520

On i3-330m
Before:
  bare:    537,000   1,080
  compiz:  263,000     398
After:
  bare:    606,000   1,360
  compiz:  203,000     985
UXA:
  bare:    294,000   1,070
  compiz:  197,000     821

On pnv:
Before:
  bare:    179,000   213
  compiz:  106,000   123
After:
  bare:    181,000   246
  compiz:  103,000   197
UXA:
  bare:    114,000   312
  compiz:   75,700   191

Reported-by: Michael Larabel <Michael@phoronix.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-11 00:52:54 +00:00
Chris Wilson 735a15208d sna/gen5: Remove a redundant format check
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-10 23:52:25 +00:00
Chris Wilson c5584252c3 sna: Remember to assign a new unique id for the replaced bo
Missed from the previous patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-10 23:34:51 +00:00
Chris Wilson 9c764dc13b sna: Be more pessimistic with CPU sources
Try to avoid a few more unnecessary context switches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-10 23:34:51 +00:00
Chris Wilson 358aaef6db sna/dri: Prefer using the BLT for DRICopyRegion on pre-SNB
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-10 23:34:51 +00:00
Chris Wilson c295ad8da9 sna: Transfer the whole bo for a replacement XCopyArea
If we are copying over the entire source onto the destination,just copy
across the GPU bo. This is often used for caching images as pixmaps.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-10 23:34:51 +00:00
Chris Wilson ece7fc8afe sna: Only use the 64-byte pitch alignment for scanout
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-10 23:34:51 +00:00
Chris Wilson b3816cf3a9 sna: Remove assertions that external bo are not busy
We have to be careful to assume bo via exposed are under our full
control, in particular not to assert their state. :(

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-10 23:34:51 +00:00
Chris Wilson b5a6bc9e33 sna/gen[23]: Fixup render targets with pitches below hw minimum
gen2/3 have a restriction that the 3D pipeline cannot render to a pixmap
with a pitch less than 8/16 respectively. Rather than mandating all
pixmaps to be created with a stride greater than 16, fixup the bo for
the rare occasions when it is necessary.

Reported-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43688
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-10 13:18:44 +00:00
Chris Wilson c0dab7b1cf sna/trapezoids: Try to render traps onto a8 destinations in place
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-10 12:46:46 +00:00
Chris Wilson c73b14cabb sna/trapezoids: First try the scan converter for fallbacks
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-10 11:41:18 +00:00
Chris Wilson 22d9bc0bc1 sna: Use a single definition for the inactive cache timeout
And share it between the timer and the expiration function, just to
simplify the code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-09 23:51:02 +00:00
Chris Wilson eb3e04d960 sna: Fallback to ordinary monotonic clock if coarse is not supported
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-09 23:51:02 +00:00
Chris Wilson 1c202cc074 sna: s/MONOTONICE/MONOTONIC/
A late addition to be flexible for compiling on different systems
heralded its doom.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-09 17:25:19 +00:00
Chris Wilson c51626ccb6 sna: Use the coarse monotonic clock to coalesce wakeup events
For the long interval events (such as expiring the caches), we do not
need precise timing and so can use a coarse timer to allow the system
to coalesce and reduce wakeup events.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-09 17:14:38 +00:00
Chris Wilson c22197f25b sna: Discard bo for idle private pixmaps
If a pixmap lies around for a couple of minutes not being used, it is
unlikely to be used again in the near future. Reap the GPU buffers of
any of those idle pixmaps (copying to a more compact buffer in system
memory) in order to free up resources for use elsewhere. Any object
that is exposed via DRI is obviously exempt from this reaping.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-09 17:14:38 +00:00
Chris Wilson 429a36f748 uxa: Fix clip processing for uxa_fill_spans()
Fixes regression from e0066e77e0
(uxa: Simplify Composite solid acceleration for spans by only clipping
once) [2.15.901]

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43649
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-12-09 09:54:22 +00:00