This gives us a few more standard modes on eDP panels with just a simple
fixed timing in the VBT, just like on older, LVDS attached panels.
Fixes FDO bug https://bugs.freedesktop.org/show_bug.cgi?id=30069.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The current backlight value is clamped to the valid range [0, max] and
so as we queried the value before setting the max, we forced the current
backlight to 0 and so set it to be zero on initialising the display.
Fixes:
Bug 30063 - start X will modify brightness value to zero
https://bugs.freedesktop.org/show_bug.cgi?id=30063
which is a regression due to 38f940dfea.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Quoting Adam Jackson:
"But the X driver looks like
it never sets MONITOR_EDID_COMPLETE_RAWDATA, which means the X core
doesn't know that any sections beyond the first are present, so it won't
ever hand back more than 128 bytes to clients. Boo."
This patch is based on his.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the buffer object is tiled, we need to use the fence registers to
perform the appropriate untiling for CPU access. Ensure that we always
take this path for tiled objects, regardless of their size.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This reverts commit 9c3e34703d.
This commit is not ready, as first the driver needs to handle all
controllers, especially those that ignore the BLC and require their own
interface. Fortunately, by moving that discovery into the kernel - where
it just means finding which ACPI device is attached to the video and has a
backlight interface - the userspace code should become much more sane,
and work even with multi-gpu, multi-lid systems.
But that is for tomorrow.
If the i915 driver exposes a native ACPI interface to modify the panel
backlight use it in preference to the generic interfaces. On multi-GPU
systems, the panel backlight is meant to be connected via the IGP and
this ensures that we always find the right interface.
Fixes:
Bug 29273 - XORG Intel driver chooses wrong acpi_video to control
brightness in multi-GPU system
https://bugs.freedesktop.org/show_bug.cgi?id=29273
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
During -configure we would attempt to query the availablility of KMS
before the DRI module was loaded, thus we were unable to create a valid
bus identifier and so the query failed and we disowned the device.
Fixes:
Bug 29611 - Xorg -configure fails
https://bugs.freedesktop.org/show_bug.cgi?id=29611
Reported-by: Sergey Samokhin <prikrutil@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Marty Jack reported an issue he found where the page-flipping handler
was being lost on server reset. This results in the swap completion
notification being lost, with the sporadic hang of full screen
applications like Compiz, flash and even glxgears!
Fixes:
Bug 29584 - Server in compute loop
https://bugs.freedesktop.org/show_bug.cgi?id=29584
There are also several possibly related bugs with similar symptoms, i.e.
OpenGL applications hanging on missed swap notifications.
Reported-by: Marty Jack <martyj19@comcast.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
When an output is attached to a crtc and that crtc is enabled, the
output is automatically enabled so we can remove the redundant manual
dpms on.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Dave Airlie advised that hotplug detection can be unreliable and that
mode caching, in general, should be done in the kernel in any case.
This reverts commit 622e600069.
Remember for the detection cycle whether we have already probed for the
EDID -- as this can be slow.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The kernel may know about more types than we do, so protect ourselves
from reading from beyond the end of the string array.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
And fixup all the drmmode_* functions to have an intel prefix and
categorise those into intel_mode, intel_crtc, intel_output and
intel_property so that the functions are a little more self-descriptive
and, more importantly, are consistent.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Now that we submit from the flush callback chain, we know we'll always
submit before the client receives the reply or event that blocks it from
rendering the next frame.
There are a few cases where the server will flush client output buffers
but our block handler only catches the most common (before going into select).
If the server flushes client buffers before we submit our batch buffer,
the client may receive a damage event for rendering that hasn't happened yet.
Instead, we can hook into the flush callback chain, which the server will
invoke just before flushing output. This lets us submit batch buffers
before sending out events, preserving ordering.
Fixes 28438: [bisected] incorrect character in gnome-terminal under compiz
https://bugs.freedesktop.org/show_bug.cgi?id=28438
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Chris Wilson likes to sprinkle these all over, but in this
case it's just misleading. libdrm already does this for us.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The kernel overlay code does asynchronous overlay flips. So keep
onto two old buffers, for otherwise the rendering of the next
frame might overwrite the contents of the currently still displaying
one. With ~25fps videos and ~50 Hz screens that's rather unlikely,
still, fix it.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Per-target compilation flags (libIntelXvMC_la_CFLAGS) are required
when multiple targets which require different compiler flags,
are build in the same makefile.
Automake issues a command with -c and -o flags which not all compilers
support. The object fles are prefixed with libIntelXvMC_la.
The macro AM_PROG_CC_C_O must then be used to provide this feature
on compilers that do not have it. If not, a warning is issued at make time.
This macros checks for compiler support and if missing, uses a "compile"
script it generates in the package root directory.
Currently the driver uses per-target flags but the macro is missing.
Rather than adding the macro, this patch stops using per-target flags
by using the AM_CFLAGS variable for all targets in the makefile, as
there is only one.
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Gaetan Nadon <memsize@videotron.ca>
Order is important. And ensure that the scratch GC is performing clip by
children.
Fixes:
Bug 29213 - video artifacts if used dualscreen mode
https://bugs.freedesktop.org/show_bug.cgi?id=29213
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
After passing the new buffer to the kernel, the old buffer is unpinned
and becomes available for re-use. So keep hold of the old buffer and
swap after a PutImage. This greatly reduces the amount of CPU time
consumed by the kernel on behalf of the video overlay -- by only
allocating two buffers for an entire sequence, we avoid clflushing and
page allocation on every frame.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In order to cleanup all CRTCs and outputs on shutdown, we need to keep a
list of the individual structures and iterate over that list on
shutdown.
Also, the output and crtcs are configured just once and not for each
screen generation so move the shutdown to the termination and not on
CloseScreen. Oops.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we use the static DummyEncoding and may attempt to modify it for each
adaptor (on each device), we should use copies instead.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The dri code is much more careful in ensuring that the scan lines that
is waits for are valid. Copy this code to video, with a bit of work this
can be refactored, and perhaps even teach dri how to handle rotated
front buffers.
References:
Bug 28964 - [i965gm] GPU infinite MI_WAIT_FOR_EVENT while watching video
in Totem
https://bugs.freedesktop.org/show_bug.cgi?id=28964
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we reject the front buffer because it has too large a stride, repeat
the allocation using untiled for the cases where we can utilize laxer
hardware restrictions.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Avoid a potential use-after-free of the copied mode string by reusing
the converted kernel mode on resize.
==19897== Invalid read of size 8
==19897== at 0x661C330: ??? (strcpy.S:1308)
==19897== by 0x8618AE7: drmmode_set_mode_major (drmmode_display.c:293)
==19897== by 0x8618E6F: drmmode_xf86crtc_resize (drmmode_display.c:1299)
==19897== by 0x529A77: xf86RandR12ScreenSetSize (xf86RandR12.c:708)
==19897== by 0x4BD528: ProcRRSetScreenSize (rrscreen.c:301)
==19897== by 0x42B820: Dispatch (dispatch.c:432)
==19897== by 0x4254C9: main (main.c:289)
==19897== Address 0x72e91e0 is 0 bytes inside a block of size 9 free'd
==19897== at 0x4C23DBC: free (vg_replace_malloc.c:325)
==19897== by 0x48424F: xf86DeleteMode (xf86Mode.c:1921)
==19897== by 0x4942B7: xf86ProbeOutputModes (xf86Crtc.c:1572)
==19897== by 0x5290BB: xf86RandR12GetInfo12 (xf86RandR12.c:1551)
==19897== by 0x5313AE: RRGetInfo (rrinfo.c:202)
==19897== by 0x4BCCAA: rrGetScreenResources (rrscreen.c:337)
==19897== by 0x42B820: Dispatch (dispatch.c:432)
==19897== by 0x4254C9: main (main.c:289)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Unwind the array of Pixmaps already allocated and report failure for the
old dri GetBuffers() path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
After splitting out the i810 driver into its own legacy directory, we
can identify the common routines not as i830 but as intel. This
clarifies the code which *is* i830 specific.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The driver is still built but is no longer under active development so
move it and supporting files to a new directory.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tiling on gen 2/3 hardware is only supported for pitches up to 8192
bytes, so above this limit the surface will be untiled and we will no
longer have to comply with the power-of-two pitch alignment. So
disabling tiling for these too wide surface should ~halve the memory
requirement for the full surface.
Also the absolute limit for the 2D blitter is 32,768 bytes. The
documentation says "up to 32,768 bytes" and my PineView box was
malfunction with a surface stride of 32,768 so set the limit to be
32,767.
References:
Bug 28497 - Graphics corruption after opening a specific website
https://bugs.freedesktop.org/show_bug.cgi?id=28497
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Oops, I spent more time discussing these flushing bugs than I spent
paying attention to what I was actually doing.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is a situation that should not be possible, need_mi_flush being
true but the list of pending flush pixmaps being clear. However, an
earlier bug in doing just that revealed this minor bug. So for
correctness, be careful not to clear need_mi_flush without emitting a
MI_FLUSH.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The key difference between i965 and earlier, is that the surfaces passed
to the samplers through an indirect table and so the batch and render
target was not being marked dirty by the relocation (since the
relocation only happens within prepare_composite() which may have been
in another batch.) Simply call intel_pixmap_mark_dirty() when binding
the sampler table into the batch to ensure that the dirty is tracked
appropriately.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the batch submit may not trigger further drawing through flushing the
vertices, pass the requirement to emit the flush down to the submission
routine so that the flush can be appended after the final commands.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The presumption is that we wish to keep the target hot, so
copy to a new bo and move that to the CPU in preference to
causing ping-pong of the original.
Also the gpu is much faster at detiling.
Before (PineView):
400000 trep @ 0.1128 msec ( 8860.0/sec): GetImage 10x10 square
18000 trep @ 1.3839 msec ( 723.0/sec): GetImage 100x100 square
800 trep @ 30.0987 msec ( 33.2/sec): GetImage 500x500 square
After: (PineView)
180000 trep @ 0.1478 msec ( 6770.0/sec): GetImage 10x10 square
60000 trep @ 0.4545 msec ( 2200.0/sec): GetImage 100x100 square
4000 trep @ 8.0739 msec ( 124.0/sec): GetImage 500x500 square
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Use a single code path to upload the image data after selecting the
right bo, and take advantage of pwrite() when possible.
Fixes:
Bug 28569 - [i965] IGN's flash-based video player crashes X
https://bugs.freedesktop.org/show_bug.cgi?id=28569
Bug 28573 - [i965] Fullscreen flash and windowed SDL games fail to
update the screen
https://bugs.freedesktop.org/show_bug.cgi?id=28573
Reported-and-tested-by: Brian Rogers <brian@xyzw.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We should be able to eliminate these as the drawable remains unchanged.
However, the implicit flush of BUF_INFO fixes the rendering in KDE.
Alternatively, we need an MI_FLUSH | INHIBIT_RENDER_CACHE_FLUSH between
composites. (Note that it is not stale cache data causing the rendering
corruption and that a pipelined flush is not sufficient either.) Also,
having tried varies points at which to flush, the only place where the
flush is effective seems to be between composite operations - that is a
flush after 2D is not sufficient.
Reported-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reported-by: Clemens Eisserer <linuxhippy@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
After d41684d545 we now allocate all framebuffers as tiled bo, and so
we must be careful to use the appropriate stride as returned from the
allocation, instead of assuming that it is just an aligned width.
Fixes:
Bug 28461 - screen rotation results in corrupted output.
https://bugs.freedesktop.org/show_bug.cgi?id=28461
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reported-by: Till Matthiesen <entropy@everymail.net>
Fixes:
Bug 28446 - Garbled Font with Mathematica 7
https://bugs.freedesktop.org/show_bug.cgi?id=28446
Rewriting the glyphs to render to the destination directly and removing
the more expensive multiple invocations of CompositePicture per picture
was a great performance boost -- except that it needs special handling
in the backend in order to not fallback. Having done so for i915, I
neglected to ensure the sanity checking in i965_prepare_composite() was
sufficient. As it turns out, it was not and so we misrendered CA-glyphs
when rendering directly to the destination. This causes us to fallback
properly, but is a performance regression as we no longer try the 2-pass
magic helper before resorting to s/w. At the moment, I'd rather live
with the temporary regression and fix i965 to do the same magic as i915,
as it critical to fixing the severe performance issues currently
crippling i965, as I believe that this regression only affects the
minority of applications (incorrect, as it turns out, as the glyphs are
overlapping) rendering directly to the destination.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Following a conversation with Owain G. Ainsworth, it was decided that
the second best approach to handling a wedged GPU was to hope that the
kernel could successfully reset it, which currently is only possible for
i965 and later chipsets.
The best approach is of course to prevent such hangs from ever occurring
in the first place.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
But emit the warning about rendering corruption every time for the
transient errors like out-of-memory.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When I made libdrm stop overallocating so much memory for the purpose
of bo caching, things started scribbling on the bottom of my
frontbuffer (and vice versa, leading to GPU hangs). We had the usual
mistake of size = tiled_pitch * height instead of size = tiled_pitch *
tile_aligned_height.
We need to install the acceleration functions so that they are wrapped
by the Damage layer. This fixes the corruption under a compositing WM
introduced in commit 8700673157.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reported-and-tested-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
A trivial change, I thought, having tested it before rebasing, unworthy
even of a perfunctory compile test. How wrong I was.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The i915 textured video routine know how to handle drawing on an output
larger than the 3D pipe, so allow them to do so.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
gcc is horribly bad at collapsing the constants:
text data bss dec hex filename
282336 8720 256 291312 471f0 intel_drv.so.old
269280 8720 256 278256 43ef0 intel_drv.so
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Use centre sampling of textures to match pixman, and remove numerous
off-by-one and visual artefacts when rendering. The classic example for
this is cairo/text/xcomposite-projection where the edge of the rotated
rectangle is jaggy due to the incorrect sample position.
Fixes:
Bug 16917 - [i915] Blur on y-axis also when only x-axis is scaled
billiear
https://bugs.freedesktop.org/show_bug.cgi?id=16917
And about 15 tests from the Cairo test suite.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
3DSTATE_BUF_INFO is an implicit flush of the piepline, so avoid emitting
that and associated state unless the destination pixmap has actually
changed. This is a win of around 3-5% for cairo-perf-trace, notably for
firefox.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Chris's new buffer exchange check is a good one, but we don't want to
hit the immediate blit fallback path if it fails. We still want to
schedule a blit for sometime in the future, and we need to use it
wherever an exchange might occur (like the secondary flip check or the
currently disabled CanExchange check).
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=28252.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This is wildly optimistic, but it should work in a surprising number of
error situations and some output in those cases will be hopefully be
better than none...
If we submit a batchbuffer and the kernel reports the GPU is hung (which
will be caused by an earlier execbuffer, and so the kernel should have
had enough time to determine whether or not it could reset the GPU) then
disable any further attempt to accelerate gfx and force fallbacks to map
the buffers and use the CPU. We cannot normally map any more buffers if
the GPU is hung, so only those already mapped prior to the hang can be
written to, or those allocated in system memory. However, we can expect
that the framebuffer is already mapped, and so have a reasonable
expectation to continue to see the display update.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rewrite glyph rendering to avoid the intermediate buffer, accumulating
the glyph rectangles directly in the backend composite routines. And
modify the glyph cache routines to fully utilise the allocated size of
the tiled buffer on older hardware. To do this we alias all glyph sizes
into the same texture using a technique suggested by Keith Packard.
PineView:
885/856-> 1150/1110 kglyph/s (aa/rgb)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Handle rendering textured video onto an extended desktop (>2048) by
using a temporary pixmap. Note that we still cannot handle rendering to
a greater than 2048 destination region, for that we will need to tile.
Hmm, time to request a 2560x1600, 10bpc monitor...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
17:53 < arekm> ickle: i830_dri.c:630:28: error: ‘DrawableRec’ has no member named ‘bpp’
17:53 < arekm> ickle: i830_dri.c:630:57: error: ‘DrawableRec’ has no member named ‘bpp’
* sigh. I need to fix this machine to have the right version of the
* headers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
An unredirected window (thanks Michel for the reminder) is backed by the
Screen pixmap, and so uses a reference of that as its front buffer. The
back buffer is a pixmap appropriately sized for the drawable. When the
application requests to swap its buffers, obviously we cannot simply
exchange the front and back buffer as they do not match, but need to copy
the appropriate region from the back to the front.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This reverts commit 44d45d3fa5.
Michel Dänzer pointed out the flaw in using the pixmap size instead of
the drawable size:
Using the backing pixmap dimensions for this is not desirable. In
particular, it means that the DRI2 buffers of non-redirected windows
always have the same size as the screen. But even for redirected windows
it wastes some graphics memory with a re-parenting window manager, that
is if it doesn't break in various ways due to the top left corner of the
DRI2 buffers no longer corresponding to the top left corner of the window.
This avoid using the garbage values stored in the Screen drawable,
instead of the true values which are only maintained in its backing
pixmap. The consequence of using the wrong size was to hand a 1x1
pixmap to metacity/mutter and have it believe it was a full screen
drawable; GPU hangs ensued if using page flipping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
==7596== Invalid write of size 4
==7596== at 0x491ACA8: intel_batch_teardown (i830_batchbuffer.c:118)
==7596== by 0x491C9D6: I830CloseScreen (i830_driver.c:1419)
==7596== by 0x8103A9C: RRCloseScreen (randr.c:105)
==7596== by 0x80DE794: xf86CrtcCloseScreen (xf86Crtc.c:759)
==7596== by 0x80BEBA3: DGACloseScreen (xf86DGA.c:268)
==7596== by 0x80D044B: DPMSClose (xf86DPMS.c:134)
==7596== by 0x488B050: XvCloseScreen (xvmain.c:320)
==7596== by 0x81841B1: VidModeClose (xf86VidMode.c:110)
==7596== by 0x80EB12F: CursorCloseScreen (cursor.c:191)
==7596== by 0x810CA17: AnimCurCloseScreen (animcur.c:108)
==7596== by 0x816937E: compCloseScreen (compinit.c:86)
==7596== by 0x48D39B9: glxCloseScreen (glxscreens.c:221)
==7596== Address 0x49c1a50 is 24 bytes inside a block of size 52 free'd
==7596== at 0x4024866: free (vg_replace_malloc.c:325)
==7596== by 0x80B023C: Xfree (utils.c:1096)
==7596== by 0x4927CFD: i830_set_pixmap_bo (i830_uxa.c:647)
==7596== by 0x491C9B4: I830CloseScreen (i830_driver.c:1413)
==7596== by 0x8103A9C: RRCloseScreen (randr.c:105)
==7596== by 0x80DE794: xf86CrtcCloseScreen (xf86Crtc.c:759)
==7596== by 0x80BEBA3: DGACloseScreen (xf86DGA.c:268)
==7596== by 0x80D044B: DPMSClose (xf86DPMS.c:134)
==7596== by 0x488B050: XvCloseScreen (xvmain.c:320)
==7596== by 0x81841B1: VidModeClose (xf86VidMode.c:110)
==7596== by 0x80EB12F: CursorCloseScreen (cursor.c:191)
==7596== by 0x810CA17: AnimCurCloseScreen (animcur.c:108)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the destination cannot fit into the 3D pipeline when we need to
composite, we fallback to doing the operation on the CPU. This is very
slow, and quite easy to trigger on i915 by plugging in an external
display.
An alternative is to extract the extents of the operation from the
destination using the blitter which can usually handle much larger
operations. This gives us a temporary target that can fit into the 3D
pipeline and thus be accelerated, before copying back into the larger
real destination.
For x11perf this boosts glyph rendering on PineView, from 38kglyphs/s to
480kglyphs/s. Just a little shy of the native performance of 601kglyphs/s
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In theory this should allow us to pack far more operations into a single
batch buffer, and reduce our overheads.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
By using pwrite() instead of dri_bo_map() we can write to the batch buffer
through the GTT and not be forced to map it back into the CPU domain and
out again, eliminating a double clflush.
Measing x11perf text performance on PineView:
Before:
16000000 trep @ 0.0020 msec (511000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @ 0.0021 msec (480000.0/sec): Char in 80-char rgb line (Charter 10)
After:
16000000 trep @ 0.0019 msec (532000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @ 0.0020 msec (496000.0/sec): Char in 80-char rgb line (Charter 10)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
On my PineView box these represent ~5% overhead on x11perf text:
Before:
16000000 trep @ 0.0020 msec (495000.0/sec): Char in 80-char aa line (Charter 10)
12000000 trep @ 0.0022 msec (461000.0/sec): Char in 80-char rgb line (Charter 10)
After:
16000000 trep @ 0.0020 msec (511000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @ 0.0021 msec (480000.0/sec): Char in 80-char rgb line (Charter 10)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Combine all the calls to composite between prepare_composite and
done_composite into a single primitive list, rather than a primitive
call per composite().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
References:
Bug 28135 - [855GM] Slowdown/High CPU-Usage after Git-Commit
926fbc7d90https://bugs.freedesktop.org/show_bug.cgi?id=28135
The simple answer is that I had assumed that 0 was a reserved value.
However, without the bbp encoded into the format 0 was used for a8r8g8b8
and r5g6b5, which are very common formats!
The other possibility for the slowdown is that gtkperf is using of the
now verboten xrgb formats -- but would in fact be valid if the source
covers the clip and we could fixup the alpha value in the fixed function
combine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
All textures are now properly declared so that the alpha swizzling
occurs in the sampler or not at all. The downside is that for quite a
few composite operations we have to fallback to software on older
hardware. There is scope for more performing the alpha expansion in
shaders or combiners when we know the picture covers the clip - which is
almost all of the time for normal operations especially those
constructed by Cairo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We no longer workaround the lack of alpha expansion for xrgb textures as
this interferes with EXTEND_NONE, though we could if we know the source
covers the clip...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Allow us to check whether we can handle the operation using the blitter
prior to doing any work.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I was blindly fixing rendercheck without thinking. We need to force the
alpha value to be in the blend unit and not before -- otherwise we
generate the incorrect result whilst blending. D'oh.
GEM handles serialisation of the new front buffer with respect to page
flipping and rendering and reports back when the flip is complete.
Adding a sync point here is then redundant.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we schedule swaps for some time in the future and may process a
detachment prior to receiving the vblank notification from the kernel,
we need to hold a reference to the buffers for our swap event handler.
Fixes:
Bug 28080 - "glresize" causes X server segfault with indirect rendering.
https://bugs.freedesktop.org/show_bug.cgi?id=28080
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Ensure that garbage is not stored in the unused alpha channel so that
we can rely on it being currently initialiased when used as a source or
returning via GetImage.
Partial fix for rendercheck -t blend
1. Instead of swapping bos, swap the entire private structure.
2. If we update the pixmap bo for the Screen, make sure we update the
reference inside intel->front_buffer so that xrandr still functions.
Fixes:
Bug 27922 - i965: Rapidly resizing OpenGL window causes GPU to hang.
https://bugs.freedesktop.org/show_bug.cgi?id=27922
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to prevent overcommitting the aperture, and in particular if we
allocate a buffer larger than available space we will fail to mmap it in
and rendering will fail. Trying to allocate multiple large buffers in
the aperture, often the case when falling back, causes thrashes and
eviction of useful buffers. So from the outset simply do not allocate a
bo if the the required size is more than half the available aperture
space.
Fixes allocation failure in ocitymap.trace for instance.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The pitch needs to be set on the pixmap prior to the private
intel_pixmap structure being created so that it can record the correct
value from the pixmap.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we can not accelerate these either as a destination or a source,
don't bother allocating a buffer object for them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
x11perf has a regression
https://bugs.freedesktop.org/show_bug.cgi?id=25068
caused by
commit e581ceb738
i915: Use the color channels to pass along solid sources and masks.
Do not convert 1x1R pixmaps into a solid color as the readback from the
bo negates all the performances advantages of using a smaller vertex
buffer and fewer samplers.
Before (PineView):
aa=66800 glyph/s, rgb=28800 glyphs/s
Now:
aa=96800 glyphs/s, rgb=48500 glyphs/s
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
x11perf regression caused by 2D driver
https://bugs.freedesktop.org/show_bug.cgi?id=28047
caused by
commit a7b800513f
uxa: Extract sub-region from in-memory buffers.
The issue is that as we extract the region prior to checking whether the
composite can in fact be accelerated, we perform expensive surplus
operations. This is particularly noticeable for ComponentAlpha text,
such as rgb10text. The solution here is to rearrange the
check_composite() prior to acquiring the sources, and only extracting
the subregion if the render path can not actually handle the texture.
Performance (on PineView):
a7b800513^: aa=68600 glyphs/s, rgb=29900 glyphs/s
a7b800513: aa=65700 glyphs/s, rgb=13200 glyphs/s
now: aa=66800 glyph/s, rgb=28800 glyphs/s
The residual lossage seems to be from the extra function call and
dixPrivate lookups. Hmm. More warning is the extremely low performance,
however the results are consistent so the improvement looks real...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Complete the prepare access for the PutImage fallback via fbCopyArea(),
by remembering to set the private pointer to the GTT mapping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>