==7596== Invalid write of size 4
==7596== at 0x491ACA8: intel_batch_teardown (i830_batchbuffer.c:118)
==7596== by 0x491C9D6: I830CloseScreen (i830_driver.c:1419)
==7596== by 0x8103A9C: RRCloseScreen (randr.c:105)
==7596== by 0x80DE794: xf86CrtcCloseScreen (xf86Crtc.c:759)
==7596== by 0x80BEBA3: DGACloseScreen (xf86DGA.c:268)
==7596== by 0x80D044B: DPMSClose (xf86DPMS.c:134)
==7596== by 0x488B050: XvCloseScreen (xvmain.c:320)
==7596== by 0x81841B1: VidModeClose (xf86VidMode.c:110)
==7596== by 0x80EB12F: CursorCloseScreen (cursor.c:191)
==7596== by 0x810CA17: AnimCurCloseScreen (animcur.c:108)
==7596== by 0x816937E: compCloseScreen (compinit.c:86)
==7596== by 0x48D39B9: glxCloseScreen (glxscreens.c:221)
==7596== Address 0x49c1a50 is 24 bytes inside a block of size 52 free'd
==7596== at 0x4024866: free (vg_replace_malloc.c:325)
==7596== by 0x80B023C: Xfree (utils.c:1096)
==7596== by 0x4927CFD: i830_set_pixmap_bo (i830_uxa.c:647)
==7596== by 0x491C9B4: I830CloseScreen (i830_driver.c:1413)
==7596== by 0x8103A9C: RRCloseScreen (randr.c:105)
==7596== by 0x80DE794: xf86CrtcCloseScreen (xf86Crtc.c:759)
==7596== by 0x80BEBA3: DGACloseScreen (xf86DGA.c:268)
==7596== by 0x80D044B: DPMSClose (xf86DPMS.c:134)
==7596== by 0x488B050: XvCloseScreen (xvmain.c:320)
==7596== by 0x81841B1: VidModeClose (xf86VidMode.c:110)
==7596== by 0x80EB12F: CursorCloseScreen (cursor.c:191)
==7596== by 0x810CA17: AnimCurCloseScreen (animcur.c:108)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the destination cannot fit into the 3D pipeline when we need to
composite, we fallback to doing the operation on the CPU. This is very
slow, and quite easy to trigger on i915 by plugging in an external
display.
An alternative is to extract the extents of the operation from the
destination using the blitter which can usually handle much larger
operations. This gives us a temporary target that can fit into the 3D
pipeline and thus be accelerated, before copying back into the larger
real destination.
For x11perf this boosts glyph rendering on PineView, from 38kglyphs/s to
480kglyphs/s. Just a little shy of the native performance of 601kglyphs/s
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In theory this should allow us to pack far more operations into a single
batch buffer, and reduce our overheads.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
By using pwrite() instead of dri_bo_map() we can write to the batch buffer
through the GTT and not be forced to map it back into the CPU domain and
out again, eliminating a double clflush.
Measing x11perf text performance on PineView:
Before:
16000000 trep @ 0.0020 msec (511000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @ 0.0021 msec (480000.0/sec): Char in 80-char rgb line (Charter 10)
After:
16000000 trep @ 0.0019 msec (532000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @ 0.0020 msec (496000.0/sec): Char in 80-char rgb line (Charter 10)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
On my PineView box these represent ~5% overhead on x11perf text:
Before:
16000000 trep @ 0.0020 msec (495000.0/sec): Char in 80-char aa line (Charter 10)
12000000 trep @ 0.0022 msec (461000.0/sec): Char in 80-char rgb line (Charter 10)
After:
16000000 trep @ 0.0020 msec (511000.0/sec): Char in 80-char aa line (Charter 10)
16000000 trep @ 0.0021 msec (480000.0/sec): Char in 80-char rgb line (Charter 10)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Combine all the calls to composite between prepare_composite and
done_composite into a single primitive list, rather than a primitive
call per composite().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
References:
Bug 28135 - [855GM] Slowdown/High CPU-Usage after Git-Commit
926fbc7d90https://bugs.freedesktop.org/show_bug.cgi?id=28135
The simple answer is that I had assumed that 0 was a reserved value.
However, without the bbp encoded into the format 0 was used for a8r8g8b8
and r5g6b5, which are very common formats!
The other possibility for the slowdown is that gtkperf is using of the
now verboten xrgb formats -- but would in fact be valid if the source
covers the clip and we could fixup the alpha value in the fixed function
combine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
All textures are now properly declared so that the alpha swizzling
occurs in the sampler or not at all. The downside is that for quite a
few composite operations we have to fallback to software on older
hardware. There is scope for more performing the alpha expansion in
shaders or combiners when we know the picture covers the clip - which is
almost all of the time for normal operations especially those
constructed by Cairo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We no longer workaround the lack of alpha expansion for xrgb textures as
this interferes with EXTEND_NONE, though we could if we know the source
covers the clip...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Allow us to check whether we can handle the operation using the blitter
prior to doing any work.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I was blindly fixing rendercheck without thinking. We need to force the
alpha value to be in the blend unit and not before -- otherwise we
generate the incorrect result whilst blending. D'oh.
GEM handles serialisation of the new front buffer with respect to page
flipping and rendering and reports back when the flip is complete.
Adding a sync point here is then redundant.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we schedule swaps for some time in the future and may process a
detachment prior to receiving the vblank notification from the kernel,
we need to hold a reference to the buffers for our swap event handler.
Fixes:
Bug 28080 - "glresize" causes X server segfault with indirect rendering.
https://bugs.freedesktop.org/show_bug.cgi?id=28080
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Ensure that garbage is not stored in the unused alpha channel so that
we can rely on it being currently initialiased when used as a source or
returning via GetImage.
Partial fix for rendercheck -t blend
1. Instead of swapping bos, swap the entire private structure.
2. If we update the pixmap bo for the Screen, make sure we update the
reference inside intel->front_buffer so that xrandr still functions.
Fixes:
Bug 27922 - i965: Rapidly resizing OpenGL window causes GPU to hang.
https://bugs.freedesktop.org/show_bug.cgi?id=27922
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We need to prevent overcommitting the aperture, and in particular if we
allocate a buffer larger than available space we will fail to mmap it in
and rendering will fail. Trying to allocate multiple large buffers in
the aperture, often the case when falling back, causes thrashes and
eviction of useful buffers. So from the outset simply do not allocate a
bo if the the required size is more than half the available aperture
space.
Fixes allocation failure in ocitymap.trace for instance.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The pitch needs to be set on the pixmap prior to the private
intel_pixmap structure being created so that it can record the correct
value from the pixmap.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we can not accelerate these either as a destination or a source,
don't bother allocating a buffer object for them.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
x11perf has a regression
https://bugs.freedesktop.org/show_bug.cgi?id=25068
caused by
commit e581ceb738
i915: Use the color channels to pass along solid sources and masks.
Do not convert 1x1R pixmaps into a solid color as the readback from the
bo negates all the performances advantages of using a smaller vertex
buffer and fewer samplers.
Before (PineView):
aa=66800 glyph/s, rgb=28800 glyphs/s
Now:
aa=96800 glyphs/s, rgb=48500 glyphs/s
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
x11perf regression caused by 2D driver
https://bugs.freedesktop.org/show_bug.cgi?id=28047
caused by
commit a7b800513f
uxa: Extract sub-region from in-memory buffers.
The issue is that as we extract the region prior to checking whether the
composite can in fact be accelerated, we perform expensive surplus
operations. This is particularly noticeable for ComponentAlpha text,
such as rgb10text. The solution here is to rearrange the
check_composite() prior to acquiring the sources, and only extracting
the subregion if the render path can not actually handle the texture.
Performance (on PineView):
a7b800513^: aa=68600 glyphs/s, rgb=29900 glyphs/s
a7b800513: aa=65700 glyphs/s, rgb=13200 glyphs/s
now: aa=66800 glyph/s, rgb=28800 glyphs/s
The residual lossage seems to be from the extra function call and
dixPrivate lookups. Hmm. More warning is the extremely low performance,
however the results are consistent so the improvement looks real...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Complete the prepare access for the PutImage fallback via fbCopyArea(),
by remembering to set the private pointer to the GTT mapping.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
On older versions of pixman, pixman_blt() can return false if the images
are <= 8bpp. If we are being called from CopyArea, then we cannot return
FALSE here as that will trigger an infinite recursion. Instead we must
manually perform the fallback using fbCopyArea().
Reported-by: Peter Clifton <pcjc2@cam.ac.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is some fallout from my xvmc cleanup.
Original-Patch-by: Rico Tzschichholz <ricotz@t-online.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we need to allocate a new bo for use as a gpu target, first check
if we can reuse a pixmap that has already been relocated into the
aperture as a temporary target, for instance a glyph mask or a clip mask.
Before:
backend test min(s) median(s) stddev.
xlib firefox-planet-gnome 50.568 50.873 0.30%
xcb firefox-planet-gnome 49.686 53.003 3.92%
xlib evolution 40.115 40.131 0.86%
xcb evolution 28.241 28.285 0.18%
After:
backend test min(s) median(s) stddev.
xlib firefox-planet-gnome 47.759 48.233 0.80%
xcb firefox-planet-gnome 48.611 48.657 0.87%
xlib evolution 38.954 38.991 0.05%
xcb evolution 26.561 26.654 0.19%
And even more dramatic improvements when using a font size larger than
the maximum size of the glyph cache:
xcb firefox-36-20090611: 1.79x speedup
xlib firefox-36-20090611: 1.74x speedup
xcb firefox-36-20090609: 1.62x speedup
xlib firefox-36-20090609: 1.59x speedup
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we only use the glyph cache for small glyphs, those large than 32x32
will first be copied to a bo and used as a mask in a composite
operation. We can avoid the allocation and upload per use by allocating
a bo for the over-sized glyph from the start. As the glyph is large
anyway, the excess memory allocation is less significant.
Using normal font sizes, firefox shows no change - as expected. However,
using the 36 font size traces, we see around a 10% improvement on g45.
Before:
xcb firefox-36-20090609 127.333 127.897 0.22%
xcb firefox-36-20090611 87.456 88.624 0.66%
xcb firefox-20090601 19.522 20.194 1.69%
xlib firefox-36-20090609 201.054 201.780 0.18%
xlib firefox-36-20090611 133.468 133.717 0.09%
xlib firefox-20090601 23.740 23.975 0.49%
With large glyphs in bo:
xcb firefox-36-20090609 117.256 118.254 0.42%
xcb firefox-36-20090611 79.462 79.962 0.31%
xcb firefox-20090601 19.658 20.024 0.92%
xlib firefox-36-20090609 185.645 188.202 0.68%
xlib firefox-36-20090611 123.592 124.940 0.54%
xlib firefox-20090601 23.917 24.098 0.38%
Thanks to Owain G. Ainsworth for the suggestion!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In order to avoid an infinite recursion after enabling CopyArea to use
the put_image acceleration to either stream a blit or to copy in-place,
we cannot call CopyArea from put_image for the fallback path. Instead,
we can simply call pixman_blt directly, which coincidentally is a tiny
bit faster.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This slighlty improves xrender performance on fence reg starved
i8xx hw.
I've also changed a few function calls to the new names from the
compat ones while looking at the code.
The i915 textured video path is not converted because atm the xv
code does not use tiled surfaces.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We appear to have a confusion of stride in terms of pixels, pitch in
terms of bytes and the actual width of the surface.
i830_pad_drawable_width() appears to be operating aligning *pixels* to a
64 pixel boundary and has never used the chars-per-pixel causing
considerable confusion in its callers. Remove the parameter and ensure
that the callers are expecting a value in pixels returned, multiplying
by cpp where necessary to get the pitch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Caught by a malloc library assert.
Note to self: Don't just copy&paste codelines around :(
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Buzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27540
Tested-by: Nick Bowler <nbowler@draconx.ca>
Tested-by: Calvin Walton <calvin.walton@gmail.com>
For some reason I've made a mess out of the overlay stride constrains.
Fix it up.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Calvin Walton <calvin.walton@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27453
In my recent fix for the chroma pitch for i915 xvmc I've forgotten about
i965 class hw. For videos with a non-even sized stride (measured in dwords)
the chroma pitch was internally incosistent and one dword off.
Fix this by using pitch2 for the chroma pitch in i965 textured video like
everywhere else.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=27417
Tested-by: Nick Bowler <nbowler@draconx.ca>
Tested-by: Sven Arvidsson <sa@whiz.se>
Simply store the desired bo size in intel_xvmc_context and initialize
it in the driver's create_context function.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... by putting struct intel_xvmc_surface at the beginning. Also kill
the common context handling code and simply keep a pointer in the
surface private to the context.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
It's unused. Also drop all related generic code that tries to do
clever stuff with this callback. These are all remnants from a
pre-gem world.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
All of these are also stored in the context. Also kill the context
reference counting. Doesn't serve a purpose besides occupying a
pointer to the context in the private surface struct.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
... by putting struct intel_xvmc_surface at the beginning. This
will allow to consolidate surface and bo handling.
Also kill some now dead code used to handle the common surface
structure.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We only passed around and actually used the gem handle. Don't
need a struct for one field alone ...
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And kill all the static structures. This way it's clearer what's
common and what's specific. And the code is shorter too.
Also clean up src/i830_hwmc.c - kill the nonstandard surface types
for i915 and the associated code.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Doing the same with the i965 code will allow us to share the
create_context function.
src/i915_hwmc.h is now almost empty. Move the last #defines to
src/xvmv/i915_xvmc.c where they are actually used and delete the
file.
Also rename the ddx context struct to something sane.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Like for the subpicture stuff, share the "do-nothing" functions ...
And fix function name spelling, too.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Like for i915. Also drop that now totally superflous limit on the
available surfaces.
Move the surface struct into the userspace library header now that
the ddx doesn't use it anymore.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The XvMC driver api in the server is insane. Even for optional stuff
like subpicture support it doesn't check for NULL-pointers. So we
have to retain some dummy functions.
Wonder how many copies of these things exist on fdo ...
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Both xvmc are handing in the bo in the exact same way. So move the code
to src/i830_video.c and kill this great oeuvre of spaghetti-code.
The xvmc driver ini and fini also lost their last use, kill them, too.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After unifying i915 and i965, not much will be left of these files.
Therefore merge them to make the following changes easier.
This creates some warnings about some redefined macros, but when this
is all cleaned up they'll all be gone.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Pauli pointed out that we take a ref on the front buffer when exchanging
but forget to release it. The ref is necessary since the set functions
will drop refs as necessary, but once we set the front buffer to point
at the back pixmap, we ned to release our private ref again, or we'll
leak buffers.
Reported-by: Pauli Nieminen <suokkos@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After reports of segmentation faults caused by
d6b7f96fde and vmware, the most obvious
cause would be illegally writing to the src data when performing the alpha
fill inline. So force the image upload to go via a fresh buffer whenever
we need to modify the incoming data.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
On memory constrained hardware, tiling is vital for good performance as
it minimizes cache misses. The downside is that for older hardware
(which often suffers from the lack of bandwidth) requires the use of
fences for many operations, which are in short supply and so may cause
shorter batchbuffers. However our batch buffers are typically short and
so this is unlikely to be a concern and not affect the performance wins.
A quick bit of testing suggests the effect is inconclusive on
firefox/i945:
linear tiled
xcb 205.470 206.219
xcb-render-0.0 404.704 388.413
xlib 166.410 170.805
A secondary effect of the patch is to workaround a G31 specific hang
when attempting to use linear 2048x2048 surfaces. Bonus!
Fixes:
Bug 25375 - Performance issue using texture from pixmap (tfp) glx extension on 945
http://bugs.freedesktop.org/show_bug.cgi?id=25375
Bug 27100 - GPU Hung copying a 2048x1152 pixmap
http://bugs.freedesktop.org/show_bug.cgi?id=27100
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: John <jvinla@gmail.com>
Otherwise it would be a random value and drmmode_page_flip_handler()
won't have a chance to call I830DRI2FlipEventHandler() and indicate
a full page flip is complete.
Signed-off-by: Li Peng <peng.li@intel.com>
Fixes:
http://bugs.freedesktop.org/show_bug.cgi?id=27123
Fatal server error:
i915_emit_composite_setup: ADVANCE_BATCH: under-used allocation 100/104
Introduced with commit d6b7f96fde.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Do not try to fixup the alpha in the ff/shaders as this has the
side-effect of overriding the alpha value of the border color, causing
images to be padded with black rather than transparent. This can
generate large and obnoxious visual artefacts.
Fixes:
Bug 17933 - x8r8g8b8 doesn't sample alpha=0 outside surface bounds
http://bugs.freedesktop.org/show_bug.cgi?id=17933
and many related cairo test suite failures.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes a number of cairo test suite failures.
Also affects:
Bug 16917 - Blur on y-axis also when only x-axis is scaled bilinear
http://bugs.freedesktop.org/show_bug.cgi?id=16917
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
My cleanup accidently created a inconsistency in the YUV plane ordering.
I think we can safely assume that I'm colorblind ;)
As Carl Worth rightly pointed out, this change deserves a more elaborate
explanation:
For Xv planar formats, the three planes are stored consecutively in
memory, ordered Y U V. Now for some totally odd reason (= none at all),
i915 xvmc stored it in Y V U order. Right after the release of 2.10, with
commit "Xv: consolidate xmvc passthrough handling" I've inadvertently
broken xvmc support (which started this whole odyssey into xvmc). When
fixing stuff up, I neglected this special plane ordering and simply
assumed it to be the same as Xv and dropped that special case for i915 in
src/i830_video.c. This patch completes the change to standard YUV plane
ordering by making the corresponding change in src/xvmc/i915_xvmc.c.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Just make it mirror ScheduleSwap: complete the wait on any error
condition so as not to crash the client if the kernel is misbehaving.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We can only handle 32 bit values unless we totally virtualize the count,
since the kernel only handles 32 bits itself. Rather than adding all
that overhead, just tolerate the occasional missed event everytime the
counter runs over.
Reported-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
A couple more niggles: make sure we return a target_msc that at least
matches the current count; this is a little more friendly to clients
that missed an event. Also check for >= when calculating the remainder
so we'll catch the *next* vblank event when the calculation is
satisfied, rather than the current one as might happen at times.
Reported-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
My merge of Mario's patch for this was botched. Fix it up so that OML
waits work correctly, and remove a bogus warning from ScheduleSwap.
Reported-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The current code in I830DRI2ScheduleSwap() only schedules the correct
vblank events for the case divisor == 0, i.e., the simple
glXSwapBuffers() case.
In a glXSwapBuffersMscOML() request, divisor can be > 0, which would go
wrong.
This modified code should handle target_msc, divisor, remainder and the
different cases defined in the OML_sync_control extension correctly for
the divisor > 0 case.
It also tries to make sure that the effective framecount of swap
satisfies all constraints, taking the 1 frame delay in pageflipping mode
and possible delays in blitting/exchange mode due to
DRM_VBLANK_NEXTONMISS into account.
The swap_interval logic in the X-Servers DRI2SwapBuffers() call expects
the returned swap_target from the DDX to be reasonably accurate,
otherwise implementation of swap_interval for the glXSwapBuffers() as
defined in the SGI_swap_interval extension may become unreliable.
For non-pageflipped mode, the returned swap_target is always correct due
to the adjustments done by drmWaitVBlank(), as DRM_VBLANK_NEXTONMISS is
set.
In pageflipped mode, DRM_VBLANK_NEXTONMISS can't be used without severe
impact on performance, so the code in I830DRI2ScheduleSwap() must make
manual adjustments to the returned vbl.reply.sequence number.
This patch adds the needed adjustments.
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Previous code only handled divisor == 0 case correctly. This should
honor a given target_msc for the divisor > 0 case and handle the
(msc % divisor) == remainder constraint correctly.
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
If a drawable isn't visible due to DPMS or redirection, we'll just blit
it rather than schedule a swap event. However, we didn't reset the
target_msc, so the swap target we receive from the server could get out
of sync with the vblank count of the drawable's display. So at DPMS on
time, the swap target would be the last good vblank count plus some
large number (since the swaps won't have been throttled).
Solve this by zeroing out the swap target like we should when we fall
back to a blit. Also make the kernel error cases more friendly by
making them fall back to blits too.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Once we hit this error it's unlikely that we're coming back - so don't
flood the logs with redundant information.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This kills one wip remnant from my i830_memory cleanup and the last
remainings of the subpicture support.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In the long long ago, fbOffset was used for DGA. The server now has
only one reference to fbOffset, a leftover setting of it in fbdevhw.
We can safely ignore it now, which is good since we weren't updating
it in other places where the front buffer offset could change.
We know that it's clobbered at each batchbuffer, anyway. And even if
this server isn't running DRI2, it can still be clobbered at batch
start in the KMS world.
The previous code made no sense, (multiplying an offset by 4 is
meaningless). It could have onlt worked with the offset being
fortuitously 0.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Carl Worth <cworth@cworth.org>
Like with the per context stuff, also drop the now artificial limit
on surfaces. Again, with that gone, a lot of code can be deleted.
Reviewed-by: Carl Worth <cworth@cworth.org>
There's now not a reason anymore to limit the number of active contexts.
So kill this accounting, too.
With that all gone, per-context state in the ddx is nil, so rip out
all associated code.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Carl Worth <cworth@cworth.org>
Proper bo management ensures that the cpu doesn't step on buffers
used by the gpu. Drop the now unnecessary synchronization.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Carl Worth <cworth@cworth.org>
Cache coherency is now fully under the control of gem.
For lack of hw documentation, I had to find out the correct cache
placements by trial and error:
Backward and forward surfaces: I915_GEM_DOMAIN_RENDER
Correlation data: I915_GEM_DOMAIN_SAMPLER
Changing any of them leads to visual corruptions, so I think these
are the correct ones.
Reviewed-by: Carl Worth <cworth@cworth.org>
Now the last user of the fixed buffers provided by the ddx is gone!
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Carl Worth <cworth@cworth.org>
It works!
v2: Correlation data needs to be in the render cache!
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Carl Worth <cworth@cworth.org>
I've decided to allocate a new buffer for every render command, to
prevent stalling for the gpu. libdrm bo reuse should take care of
not wasting memory in case the buffer is not busy.
Also always emit the full state, it's not worth it to complicate
the code over a few stores to wc memory.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Carl Worth <cworth@cworth.org>
Like with one_time_state_emit, this preps for relocatable bo's.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Carl Worth <cworth@cworth.org>
This also starts to kill the last remnants of the support for
physical addresses for the indirect state buffers. With gem this
would need kernel support (in the form of a new reloc type in
execbuf2).
This does not change the ABI between ddx and client libIntelXvMC.
I've decided to do this in one swoop when all the buffer rework is
done.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Carl Worth <cworth@cworth.org>
Seems to be a remnant from i810 XvMC support. last_flip is always 0,
so serves no real purpose anymore. Kill it and the associated code.
With last_flip gone, last_render also lost its purpose. Kill it, too.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Carl Worth <cworth@cworth.org>
This is in preparation for real relocatable drm_bo's instead
of memory at a fixed address. By switching to the batchbuffer
macros (like i965 xvmc) we can use the nice OUT_RELOC macro.
Also align the code more with coding-style elsewhere, i.e. bitops
instead of bitfield structures. The bitfield structures are
quite a mess to work with the batchbuffer macros, so they were
getting in the way, anyway.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Carl Worth <cworth@cworth.org>
WIP code that hasn't changed for over two years is unlikely to
suddenly start progressing. Drop it. After all, git can easily
resurect it in cases it's needed.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Carl Worth <cworth@cworth.org>
Yes, this breaks binary compat of the struct passed around between
X ddx and the client libXvMC. But we always ship both, so they should
not get out of sync.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Carl Worth <cworth@cworth.org>
Kill the corresponding !bo path in i830_free_memory.
Also kill another remnant of the pre-kms era in the same file, while I
was looking at the code.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
It doesn't bind anything anymore, but does a few random things.
Give it a hopefully vague enough name to cover all cases ;)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Besides the debug stuff the went away in the previous patch,
this stuff was totally unused ...
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Totally useless debug function from the pre-gem era. No point
to occasionally spam Xorg.log with a bogus "No memory allocations"
message.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
It's a left-over from the non-gem era and no longer used at all.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
On i965 class hw, kernel_exec_fencing was 1 always, anyway. And on
i945, this patch kills a memory leak (dunno how, but it does).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I've accidentally broken i915 xvmc due to alignment constrains that
break my assumption that Y-pitch == UV-pitch*2. Fix this up by consistenly
using dstPitch2 for the Y-pitch. This also unifies the dst pitch
computation slightly, now that the i915 xvmc special case is gone.
Bugzilla: http://bugs.freedesktop.org/show_bug.cgi?id=25949
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(Minor edit to support compilation without INTEL_XVMC defined by
Carl Worth <cworth@cworth.org>)
In my previous cleanup I've inadvertedly dropped the offset adjustment
code for the xvmc passthrough case. Fix this up.
Also reimplement that ugly hack I've accidently killed to keep i915 class
xvmc a tad bit longer on life support.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: xunx.fang@intel.com
Rather than mangle the EDID block and hope the server does the right
thing, just build a sensible mode list up front. Do this for LVDS where
there is no EDID or where it does not claim to be continuous-frequency
(since in the latter case, the server will add reasonable modes for us).
Signed-off-by: Adam Jackson <ajax@redhat.com>
On 965 and up, if we detect a full height blit, we should just wait for
vblank, rather than try to do a scanline wait for the whole display.
On pre-965, doing a scanline wait followed by a blit works, but in the
full height case we need to give the blitter time to start up, so we
wait until the bottom line of the blit minus 2 padding scanlines to
accommodate.
Fixes FDO bug #22475.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This keeps us from trying to set tiling on it while pinned, which also
keeps us from trying to unpin it in the kernel, causing an error.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Check for page flipping support before enabling flip and vblank event
support needed for the new DRI2 APIs.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
With DRI2 supporting multiple subsystems, the video driver must
initialize the list of driver names instead of just passing the single
driver name used by Mesa. Without this, the X server will fail to
initialize DRI2 as the numDrivers field in this structure will be
uninitialized.
Signed-off-by: Keith Packard <keithp@keithp.com>
Of course, it's still fail since you can't correctly composite
colorkey overlay, but at least this doesn't spam colorkey to the root
window.
Tested-by: Daniel Vetter <daniel@ffwll.ch>
If we get to the point where we check the divisor/remainder equation and
it's satisfied, we should complete the swap immediately.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The new interfaces allow for improved buffer swap, and support for the
SGI_swap_control, SGI_video_sync and OML_sync_control GLX extensions.
The Intel implementation allows page flipping to occur for swaps that
are full screen and not rotated.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The PRM (Vol 1, p32) specifies that the URB_FENCE command must not cross
a cache-line boundary (64-bytes) in order to workaround a silicon issue.
Ensure that it does not by inserting an alignment point before the atomic
section.
This is a slightly too large hammer, but the easiest method to work with
the current BEGIN_BATCH/ADVANCE_BATCH protections.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The mapping type to use is determined by the tiling of the underlying
object, not by whether or not not we control the vt. This was a
left-over wart that was intended to mean that we had GEM and so could
use GTT mappings.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Add a small wrapper function so that the callsites need only call the
single function when checking the available aperture size for
determining the maximum viable size for operations. This will allow us
to easily extend this set in the future by only needing to adding the
check to a single location.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This just makes it _really_ clear, what's supported. No other changes.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
It's now all in I830PutImageTextured. Also kill some leftovers
from XVMC-on-overlay support and ums-XVMC-on-i915 support. Plus
a small comment as a reminder for where to add i915 xvmc support
back in.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
I'm still curious as to why fixed-point semantics are necessary
for this generic XV helper function that's been causing all this.
Can modern X really run on hw without floating-point support?
Anyway, the ugliness is now all nicely under the carpet (in
i830_clip_video_helper).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
After this there are no other external users of these strange variables,
so we can nicely hide them somewhere in the next changeset.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
This is the first part of my small crusade to rip out x1, x2, y1, y2
from I830PutImage*. These variables have strange semantics (they
change from simple integers to fixed-point values somewhere in
the middle) and don't really seem to be what we actually need.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
We always pass a non-null pointer for crtc_ret, no point to check
for this.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
This wasn't making much sense anymore, and further cleanups will
make this even more apparent. This change just makes two copies of
I830PutImage and kills the not-applicable if-clauses in both
versions.
There is one small functional change in here: The textured video
path doesn't munch around with adaptor_priv->videoStatus anymore,
which is only used by the overlay. This could prevent the overlay
from being switched off if someone would use textured video at the
same time.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eric Anholt <eric@anholt.net>
The variable "intel" is unused when building i830_video.c without XvMC
support which results in a compiler warning:
i830_video.c: In function 'i830_copy_video_data':
i830_video.c:1443: warning: unused variable `intel'
Trivial fix via #ifdef.
Now that libdrm 2.4.16 is released (and already required) we can
unconditionally enable this.
Please add something like this to the release-notes/NEWS file:
* Overlay support for kernel modesetting. This needs at least kernel
v2.6.33 to work. A backport to 2.6.32 is available at:
http://gitorious.org/daniel-s-linux-stuff/linux-kernel/commits/intel-kms-overlay-for-2.6.32
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This reverts commit 3f11bbec42.
For unknown reasons, enabling tiling for the glyph cache is causing
glyph corruption both across suspend and resume and VT switching, on a
wide range of chipsets (reports include both i8xx and gm45)
This strongly suggests that we are handling tiling, or updates to tiled
buffers, incorrectly across i915_gem_idle(). However, until we can find
the root cause, we want to fix this regression before the next stable
release, so simply revert this patch. :(
Fixes:
[Bug 25406] fonts garbled after resuming from suspend since 6729b508http://bugs.freedesktop.org/show_bug.cgi?id=25406
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This should restore the previous level of synchronisation between
textures and pixmaps, but *does not* guarantee that a texture will be
flushed before use. tfp should be fixed so that the ddx can submit the
batch if required to flush the pixmap.
A side-effect of this patch is to rename intel_batch_flush() to
intel_batch_submit() to reduce the confusion of executing a batch buffer
with that of emitting a MI_FLUSH.
Should fix the remaining rendering corruption involving tfp [inc compiz]:
Bug 25431 [i915 bisected] piglit/texturing_tfp regressed
http://bugs.freedesktop.org/show_bug.cgi?id=25431
Bug 25481 Wrong cursor format and cursor blink rate with compiz enabled
http://bugs.freedesktop.org/show_bug.cgi?id=25481
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In commit 98e11210
Remove flush parameter from intel_batch_flush()
Maxi spotted that I had broken screen updating. It appears in my haste
to eliminate the extra parameter I removed a call to intel_batch_flush()
when throttling, i.e. when pushing the updates to the screen before
idling.
Should fix:
Bug 25409 [bisected] rendering corruption since a938673ehttps://bugs.freedesktop.org/show_bug.cgi?id=25409
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we wedge the GPU then we will return -EIO for the current batch and
then attempt to reset the GPU. Meanwhile the X server detects the error,
throws a FatalError and to all intents and purposes appears to crash to
the user - whereas before it often just appeared to momentarily freeze.
Of course, on older hardware the server remains frozen until we can find
a way to reset those GPUs at runtime.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
There is only a single caller that wishes to forcibly append a flush
into the batch: intel_sync(). So move the logic there.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
During shutdown from a FatalError during batchbuffer submission, it is
possible for the batch_ptr to be NULL, so we must be careful not to
append a flush on this error path.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Since drm may not actually set the appropriate errno after a failure, we
must use the return code instead when determining the cause of failure.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reduce the 3 conditions into the 2 distinct cases. This has the
secondary benefit of also distinguishing between the reported errors.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The kernel will only emit a flush iff the buffer is currently owned by
the GPU. Instead of presuming that the kernel must emit a flush, it is
safer to assume that it does not and so cannot mapping the buffer on to
the CPU as a synchronisation point. The most obvious counter-example is
when we map the same buffer twice without using it in a batch.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
These files have been dropped from the generated tar file since the
removal of UMS support. However, the bios_reader code still includes
these, so "make distcheck" fails unless these are distributed.
There's probably a cleaner fix possible, but this at least fixes the
build so that the snapshot can be pushed out.
On older chipsets (i.e. pre-i965) tiling is very restrictive and imposes
severe size and alignment constraints. Combine that with relatively
small apertures and it is very easy to create a batch buffer that
cannot be mapped into the aperture (but would otherwise fit based purely
on total object size). To prevent this we need to not use tiling for large
buffers (the very same buffers where tiling would be of most benefit!).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
make dist failed due to missing i2c_vid.h
Commit b9b159c498 Remove UMS support.
The above commit did not remove this header file from the makefile.
Signed-off-by: Gaetan Nadon <memsize@videotron.ca>
When updating a buffer object for the framebuffer, we may need to
allocate a fresh pixmap private structure, for example if the pixmap is
replaced due to resize. When doing so it is then imperative to
initialise the circularly linked lists correctly.
Should fix the fault:
#0 i830_set_pixmap_bo (pixmap=0x24ab380, bo=0x24ab780) at i830_uxa.c:524
#1 0x00007f8615c629fd in drmmode_xf86crtc_resize (scrn=0x247a320, width=1280, height=800) at drmmode_display.c:1345
#2 0x000000000051246c in xf86RandR12ScreenSetSize (pScreen=0x24824f0, width=<value optimized out>, height=<value optimized
out>, mmWidth=<value optimized out>, mmHeight=<value optimized out>) at xf86RandR12.c:709
#3 0x0000000000512aa8 in xf86RandR12CreateScreenResources (pScreen=<value optimized out>) at xf86RandR12.c:839
#4 0x0000000000514ec0 in xf86CrtcCreateScreenResources (screen=0x24824f0) at xf86Crtc.c:727
#5 0x0000000000424fb3 in main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at main.c:215
as reported by 'buscher'.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the copy uses the 2D blitter, it uses the render cache so the source
should not require flushing if it has previously been used as a
destination.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I still have no idea how this is triggering failures, but it is. So
revert until the problem is solved.
Should fix once again:
Bug 23803 [bisected i915] gnome characters disappear
http://bugs.freedesktop.org/show_bug.cgi?id=23803
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
I incorrectly changed the logic in 285f286 and caused the batch to
always be flushed when debugging, instead of merely inserting a MI_FLUSH
between operations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The compile cleanup was not without fault... Apparently I don't have
XVMC enabled anymore and so missed that this variable is actually used.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Avoid waiting on dirty buffer object by streaming the upload to a fresh,
non-GPU hot buffer and blitting to the destination.
This should help to redress the regression reported in bug 18075:
[UXA] XPutImage performance regression
https://bugs.freedesktop.org/show_bug.cgi?id=18075
Using the particular synthetic benchmark in question on a g45:
Before:
9542.910448 Ops/s; put composition (!); 15x15
5623.271889 Ops/s; put composition (!); 75x75
1685.520362 Ops/s; put composition (!); 250x250
After:
40173.865300 Ops/s; put composition (!); 15x15
28670.280612 Ops/s; put composition (!); 75x75
4794.368601 Ops/s; put composition (!); 250x250
which while not stellar performance is at least an improvement. As
anticipated this has little impact on the non-fallback RENDER paths, for
instance the current cairo-xlib backend is unaffected by this change.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we track when a pixmap is active inside a batch buffer, we can avoid
unnecessary flushes of the batch when mapping a pixmap back to the CPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Ensure that the render caches and texture caches are appropriately
flushed when switching a pixmap from a target to a source.
This should fix bug 24315,
[855GM] Rendering corruption in text (usually)
https://bugs.freedesktop.org/show_bug.cgi?id=24315
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In order to detect when we require cache flushes we need to track which
domains the pixmap currently belongs to. So to do so we create a device
private structure to hold the extra information and hook it up.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Using common defaults will reduce errors and maintenance.
Only the very small or inexistent custom section need periodic maintenance
when the structure of the component changes. Do not edit defaults.
Using common defaults will reduce errors and maintenance.
Only the very small or inexistent custom section need periodic maintenance
when the structure of the component changes. Do not edit defaults.
Particularly noting to route alpha to the green channel when blending
with a8 destinations.
Fixes:
rendercheck/repeat/triangles regressed
http://bugs.freedesktop.org/show_bug.cgi?id=25047
introduced with commit 14109a.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
RENDER specifies that texels should sampled from the pixel centre. This
corrects a number of failures in the cairo test suite and a few
off-by-one bug reports.
Grey border around images
https://bugs.freedesktop.org/show_bug.cgi?id=21523
Note that the earlier attempt to fix this was subverted by the buggy use
of 1x1R textures for solid sources -- which caused the majority of text
to disappear.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Instead of allocating and utilising the texture samplers for 1x1R
solid sources and masks we can simply use the default diffuse and
specular colour channels and adjust the fragment shader appropriately.
The big advantage is the reduction in size of batches which should give
a good boost to glyph performance, irrespective of the additional boost
from using simpler shaders.
However, the motivating factor behind the switch is that our use of 1x1
textures turns out to be buggy...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As the immediate victim of the overflow would be to overwrite the maximum
permissible value, the test was optimistic.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Since batch buffers are rarely emitted by themselves but as part of a
sequence of state and vertices, the whole sequence is emitted atomically.
Here we just enforce that batches are marked as being part of an atomic
sequence as appropriate.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
It can go up to 32k. Upping this lets me use my 2560x1600 and 1920x1200
monitors in an extended desktop configuration.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>