The call into intel_batch_flush() will invalidate the pI830->batch_bo
stored in bo_table[0]. Fix it by re-read the refreshed value.
Signed-off-by: Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
This required reordering the relocation emits for surface/binding table
so that we didn't add new relocations to things that had already been
relocated at (the check_aperture requirement).
Instead of having a static array for these and doing an ugly sync
everytime we recycle the array, we now simply allocate short-lived
buffer objects for this dynamic state. The dri layer, in turn, can
take care of efficiently reusing objects as necessary.
On a GM965 this change was tested to improve the performance of
x11perf -aa10text from roughly 120000 to 154000 glyphs/sec.
We don't actually plan to put any other data in this structure, so it
doesn't make sense to have a generic name, (since we'll only be using
it for our vertex buffer).
This function is the new name for _allocate_dynamic_state now that
it also emits everything to the batch necessary for setting up a
composite operation. This happens in prepare_composite() every
time and in composite() whenever our vertex buffer fills up.
It's not yet strictly necessary to be redoing this setup in
composite() but it will be soon when the setup starts referring
to buffer objects for surface state and binding table. This
move prepares for that.
This begins the process of separating the dynamic data from the
static data, (still to move are the surface state and binding
table objects). The new dynamic_state is stored in a buffer
object, so this patch restores the buffer-object-for-vertex-buffer
functionality originally in commit 1abf4d3a7a and later reverted
in 5c9a62a29f.
A notable difference is that this time we actually do use
check_aperture_space to ensure things will fit, (assuming
there's a non-empty implementation under that).
This follows naturally from the structure rename.
Also we make things less muddled by having this function
actually accept a pointer to a gen4_static_state_t rather
than a gen4_state_t, (and then fetching the desired pointer
out from that).
Again, no intended change in functionality here.
It doesn't contain only static data yet, but it will soon, so
this renaming prepares for that. Also, this helps make things
more clear between gen4_render_state_t and gen4_state_t which
were muddled before, (particularly because the corresponding
identifiers were render_state and card_state). The card_state
identifier is now known as static_state which should be less
confusing.
This change is strictly search-and-replace with no functional
changes.
It's very convenient that the hardware supports this non-default
mode since it's exactly what is specified by the Render extension.
This provides a more efficient means of fixing bug #16820:
[EXA] Composition result in black for areas outside of source-surface bo
https://bugs.freedesktop.org/show_bug.cgi?id=16820
without the software fallback we had in the earlier fix,
(commit 76c9ece36e ).
We wish it wouldn't, but the hardware ignores the alpha in the
BorderColor we set when the source picture format has no alpha
in it, (and it uses alpha of 1.0 where we want 0.0). For now,
fallback for these cases. This gives a correct result, but
obviously is not as fast as we would like.
This fixes bug #16820:
[EXA] Composition result in black for areas outside of source-surface bounds
https://bugs.freedesktop.org/show_bug.cgi?id=16820
Eric informed me that the repeat field exists only for backwards
compatibility with old drivers that weren't prepared for values
other than 0 or 1 here. Since we are, we can just ignore that
field and examine only repeatType. So the code's a (tiny) bit
simpler this way.
It's quite simple to support these modes---we simply need to
turn on the support for them in the hardware.
These changes have been verified with the extend-pad and
extend-reflect tests in cairo's test suite. However, this
currently required using a custom-modified version of cairo.
The issue is that released versions of cairo, (and even
cairo master so far), don't pass RepeatPad and RepeatReflect
to Render, (due to various bugs and workarounds in cairo
and pixman). I do plan to fix those issues in cairo, so that
in a future release of cairo, (1.8.2 perhaps?), the cairo
test suite will usefully test these new repeat modes in our
driver.
The existing switch statement was switching on the Boolean
repeat field rather than the correct repeatType field. This
had not caused any problem before as only two possible repeat
values were supported (RepeatNone = 0 and RepeatNormal = 1)
so they were always the same as the repeat field.
Soon, however, we'll be supporting more repeat types, so we'll
need to switch on the correct value.
This allows us to only call i830WaitSync once every 128 calls to composite
rather than on every call. However, we do need to also call MI_FLUSH to
avoid the vertex cache getting in our way, (since our "separate" buffers
are all allocated as one contiguous chunk).
The gen4_render_state is now always called "render_state" (i965_render.c
bookkeeping) and gen4_state_t is now always called "card_state" (the buffer
for state used by the chip).
We have a collection of wm_state objects for each ps kernel,
(one for each combination of src and mask extend and repeat
values).
Thanks to Dave Airlie for noticing an errant write through a
wild wm_state pointer in an early version of this commit.
(cherry picked from 7763706a93d3021907273f9b330750ba110e2fc3 commit)
This cherry-pick required more reformatting than most, due to the
projective texturing merge.
This will eventually allow for the elimination of sampler state
updates while compositing---and initializing everything in the
initialization function.
(cherry picked from commit d0874697be8086cd64740c24698df8cd4d31c76f)
We need one for each possible combination of src and dst
blend_factors. Again, as with recent changes, this eliminates
state updates from prepare_composite and allows that function
to instead simply reference an existing object initialized
within gen4_state_init.
Thanks to Dave Airlie (and git-bisect) for pointing out that with
gnome-terminal all text was appearing as solid black with an early
version of this commit. As expected the bug was an alignment issue.
(cherry picked from 0c0ab52c2d100c47f38c7ef826ef585c8b9815e9 commit)
Performance is approximately equivalent on text tests, but may be
around +2%.
This reverts commit 346cf57deabb4c336612df4c13650a87b5ef6775.
Mixing randr transforms and video caused screen corruption for Render
operations. No, I don't understand why.
Instead of leaving pixel values in src_sample registers, compute the pixel
values directl to the data port to save 8 moves. This cannot work when no
computation is done as there is both no way to wait for the sampler to
finish and because the sampler returns data in a different order from that
required by the data port (sigh).