Commits on Source (68)
-
Marek Olšák authored
It could load an NGG shader when we want a legacy shader and vice versa.
-
Marek Olšák authored
-
Marek Olšák authored
ported from PAL
-
Marek Olšák authored
-
Marek Olšák authored
-
Marek Olšák authored
Legacy GS only works with Wave64.
-
Marek Olšák authored
-
Marek Olšák authored
Only gfx9 and older use it to get InstanceID in VGPR1.
-
Marek Olšák authored
The best way to prevent GDS hangs is not to use GDS.
-
Marek Olšák authored
-
Marek Olšák authored
-
Marek Olšák authored
-
Marek Olšák authored
-
Marek Olšák authored
It varies depending on si_shader_key::as_ngg.
-
Marek Olšák authored
We need two different values of the register, one for NGG and one for legacy, in order to fix edge flags for the legacy pipeline. Passing the ngg flag to emit_clip_regs would be too complicated, so CONTEXT_REG_RMW is used for partial register updates.
-
Marek Olšák authored
-
Ilia Mirkin authored
The compute paths in vl are a bit AMD-specific. For example, they (on nouveau), try to use a BGRX8 image format, which is not supported. Fixing all this is probably possible, but since the compute paths aren't in any way better, it's difficult to care. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213 Fixes: 9364d66c (gallium/auxiliary/vl: Add video compositor compute shader render) Signed-off-by:
Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 958390a9)
-
Bas Nieuwenhuizen authored
Should take the max of the 2. Fixes: ea337c8b "radv/gfx10: fix VS input VGPRs with the legacy path" Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> (cherry picked from commit 2e763f7c)
-
Bas Nieuwenhuizen authored
Otherwise hangs are possible. This register was already set for GS and NGG. Fixes: 5eaed7ec "radv/gfx10: enable support for NAVI10, NAVI12 and NAVI14" Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> (cherry picked from commit e04761d0)
-
Danylo Piliaiev authored
Without loop_prepare_for_unroll loops are losing phis. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111411 Fixes: 5db98195 "nir: add loop unroll support for wrapper loops" Signed-off-by:
Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by:
Timothy Arceri <tarceri@itsqueeze.com> (cherry picked from commit 84b3ef6a)
-
Samuel Pitoiset authored
Scans aren't implemented on SI/CIK. Cc: 19.2 <mesa-stable@lists.freedesktop.org> Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (cherry picked from commit e73d863a)
-
Samuel Pitoiset authored
Shader ballot will be enabled by default for Wolfenstein Youngblood. This follows what we did for sisched. Cc: 19.2 <mesa-stable@lists.freedesktop.org Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (cherry picked from commit f202ac27)
-
Samuel Pitoiset authored
This gives a nice boost, +20% at this time on my Vega 56. Shader ballot should be enabled by default at some point but it reduces performance a bit (-6%) with Wolfeinstein II. Enable it only for Youngblood at the moment, like what we did for Talos in the past. As a bonus point, it gets rid of some minor artifacts that only happens when ballot is disabled for some reasons. Cc: 19.2 <mesa-stable@lists.freedesktop.org Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (cherry picked from commit a6ad9e8c)
-
Jose Maria Casanova Crespo authored
At compressed_tex_sub_image we only can obtain the tex_object after compressed_subtexture_target_check is validated for TEX_MODE_CURRENT. So if the target is wrong the error is raised to the user. This completes the fix for the regression introduced on "mesa: refactor compressed_tex_sub_image function" of the pending failing tests: dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage3d dEQP-GLES31.functional.debug.negative_coverage.get_error.texture.compressedtexsubimage3d v2: Fix warning that texObj might be used uninitialized (Gert Wollny) Fixes: 7df233d6 ("mesa: refactor compressed_tex_sub_image function") Reviewed-By:
Gert Wollny <gert.wollny@collabora.com> (cherry picked from commit 74a7e3ed)
-
Kenneth Graunke authored
Fixes: 0346b700 ("gallium/screen: Add pipe_screen::resource_get_param") Reviewed-by:
Jordan Justen <jordan.l.justen@intel.com> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 0e6b573a)
-
Kenneth Graunke authored
Fixes: 0346b700 ("gallium/screen: Add pipe_screen::resource_get_param") Reviewed-by:
Jordan Justen <jordan.l.justen@intel.com> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> (cherry picked from commit c43a4479)
-
Kenneth Graunke authored
Fixes: 0346b700 ("gallium/screen: Add pipe_screen::resource_get_param") Reviewed-by:
Jordan Justen <jordan.l.justen@intel.com> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> (cherry picked from commit f02d1a0b)
-
Kenneth Graunke authored
v2: Pass through to oscreen rather than faking it (review from Marek). Fixes: 0346b700 ("gallium/screen: Add pipe_screen::resource_get_param") Reviewed-by:
Marek Olšák <marek.olsak@amd.com> (cherry picked from commit bc844d92)
-
Tapani Pälli authored
Commit fixes current crashes with Vulkan applications on Android. Fixes: c0376a12 "util: add anon_file.h for all memfd/temp file usage" Signed-off-by:
Tapani Pälli <tapani.palli@intel.com> Reviewed-by:
Eric Engestrom <eric@engestrom.ch> (cherry picked from commit ce8fd042)
-
Samuel Pitoiset authored
This fixes a regression introduced with scan&reduce operations on GFX10. Note that some subgroups CTS still fail on GFX10 but I assume it's a different issue. This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive*. Fixes: 227c29a8 "amd/common/gfx10: implement scan & reduce operations" Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (cherry picked from commit 2d9f401a)
-
Tapani Pälli authored
Fixes: 0fd43597 "iris/perf: implement routines to return counter info" Signed-off-by:
Tapani Pälli <tapani.palli@intel.com> Reviewed-by:
Kenneth Graunke <kenneth@whitecape.org> (cherry picked from commit 728ebcde)
-
Lionel Landwerlin authored
We added this utility for vulkan where all timeouts are given as uint64_t values. We can switch from signed to unsigned as this is the only user and if we ever deal with signed integers somewhere else we'll have to be careful to use the corresponding timespec_(add|sub)_msec and always pass absolute values. v2: Forgot to drop the test calling add_nsec() with a negative number Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reported-by:
Juan A. Suarez Romero <jasuarez@igalia.com> Fixes: d2d70c3b ("util: add a timespec helper") Acked-by:
Daniel Stone <daniels@collabora.com> (cherry picked from commit 5833f433)
-
Bas Nieuwenhuizen authored
A bunch of remaining issues including some that affect users. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111248 Fixes: ee21bd74 "radv/gfx10: implement NGG support (VS only)" Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> (cherry picked from commit c037fe5a)
-
Alyssa Rosenzweig authored
shader-db regression in the scheduler. Fixes: dff4986b ("pan/midgard: Emit store_output branch just-in-time") total bundles in shared programs: 2055 -> 2019 (-1.75%) bundles in affected programs: 1055 -> 1019 (-3.41%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.35% max: 20.00% x̄: 6.71% x̃: 5.16% 95% mean confidence interval for bundles value: -1.00 -1.00 95% mean confidence interval for bundles %-change: -8.45% -4.97% Bundles are helped. total quadwords in shared programs: 3444 -> 3408 (-1.05%) quadwords in affected programs: 1897 -> 1861 (-1.90%) helped: 36 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.19% max: 14.29% x̄: 3.97% x̃: 2.99% 95% mean confidence interval for quadwords value: -1.00 -1.00 95% mean confidence interval for quadwords %-change: -5.08% -2.86% Quadwords are helped. Signed-off-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (cherry picked from commit 272ce6f5)
-
Kenneth Graunke authored
This is genxml, we can compile out this code. Fixes: 26606672 ("iris/gen8: Re-emit the SURFACE_STATE if the clear color changed.") Reviewed-by:
Rafael Antognolli <rafael.antognolli@intel.com> (cherry picked from commit f6c44549)
-
Kenneth Graunke authored
For renderable surfaces, we allocate SURFACE_STATEs for each bit in res->aux.possible_usages. Sampler views use res->aux.sampler_usages. When pinning buffers, we call surf_state_offset_for_aux() to calculate the offset to the desired surface state. surf_state_offset_for_aux() took an aux_modes parameter, which should be one of those two fields. However...it was not using that parameter. It always used the broader res->aux.possible_usages field directly. One of the callers, update_clear_value(), was passing incorrect masks for this parameter. It iterated through the bits in order, using u_bit_scan(), which destructively modifies the mask. So each time we called it, the count of bits before our selected mode was 0, which would cause us to always update the SURFACE_STATE for ISL_AUX_USAGE_NONE, rather than updating each in turn. This was hidden by the earlier bug where surf_state_offset_for_aux() ignored the parameter. Fixes: 7339660e ("iris: Add aux.sampler_usages.") Reviewed-by:
Rafael Antognolli <rafael.antognolli@intel.com> (cherry picked from commit 117a0368)
-
Kenneth Graunke authored
Gen11 stores the fast clear color in an "indirect clear buffer", as a packed pixel value. Gen9 hardware stores it as a float or integer value, which is interpreted via the format. We were trying to store that in a buffer, for similarity with Icelake, and MI_COPY_MEM_MEM it from there to the actual SURFACE_STATE bytes where it's stored. This unfortunately doesn't work for blorp_copy(), which does bit-for-bit copies, and overrides the format to a CCS-compatible UINT format. This causes the clear color to be interpreted in the overridden format. Normally, we provide the clear color on the CPU, and blorp_blit.c:2611 converts it to a packed pixel value in the original format, then unpacks it in the overridden format, so the clear color we use expands to the bits we originally desired. However, BLORP doesn't support this pack/unpack with an indirect clear buffer, as it would need to do the math on the GPU. On Gen11+, it isn't necessary, as the hardware does the right thing. This patch changes Gen9 to stop using an indirect clear buffer and simply do PIPE_CONTROLs with post-sync write immediate operations to store the new color over the surface states for regular drawing. BLORP continues streaming out surface states, and handles fast clear colors on the CPU. Fixes: 53c484ba ("iris: blorp using resolve hooks") Reviewed-by:
Rafael Antognolli <rafael.antognolli@intel.com> (cherry picked from commit 1cd13cce)
-
Kenneth Graunke authored
This doesn't work for compressed formats, as the source texture and temporary texture would have different block sizes. (Forcing the driver to always take the GPU path would expose the bug.) Instead, just use the source format for the temporary, and let blorp_copy deal with overrides. The one case where we can't do this is ASTC, because isl won't let us create a linear ASTC surface. Fall back to the CPU paths there for now. Fixes: 9d1334d2 ("iris: Use copy_region and staging resources to avoid transfer stalls") Reviewed-by:
Rafael Antognolli <rafael.antognolli@intel.com> (cherry picked from commit 136629a1)
-
Kenneth Graunke authored
We were always resolving the buffer as if we were accessing it via CPU maps, which don't understand any auxiliary surfaces. But we often copy to a temporary using BLORP, which understands compression just fine. So we can avoid the resolve, and accelerate the copy as well. Fixes: 9d1334d2 ("iris: Use copy_region and staging resources to avoid transfer stalls") Reviewed-by:
Rafael Antognolli <rafael.antognolli@intel.com> (cherry picked from commit 2d799250)
-
Tapani Pälli authored
Fixes errors seen with eglSetBlobCacheFuncsANDROID on Android when running dEQP that terminates and reinitializes a display. Fixes: 6f5b5709 "egl: add support for EGL_ANDROID_blob_cache" Signed-off-by:
Tapani Pälli <tapani.palli@intel.com> Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com> (cherry picked from commit 3e03a3fc)
-
Samuel Pitoiset authored
Only gfx9 and older use it to get InstanceID in VGPR1. Ported from RadeonSI. Cc: 19.2 <mesa-stable@lists.freedesktop.org> Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (cherry picked from commit 0813c27d)
-
Samuel Pitoiset authored
Cc: 19.2 <mesa-stable@lists.freedesktop.org> Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (cherry picked from commit a4e6e59d)
-
Kenneth Graunke authored
...by copying the implementation of anv_get_absolute_timeout(). Appears to fix a CTS test with 32-bit builds: GTF-GL46.gtf32.GL3Tests.sync.sync_functionality_clientwaitsync_flush Fixes: f459c56b ("iris: Add fence support using drm_syncobj") Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by:
Eric Engestrom <eric@engestrom.ch> (cherry picked from commit 7ee7b0ec)
-
Andres Rodriguez authored
Make sure we read the updated data from the gpu in cases where WAIT_BIT is not set. Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (cherry picked from commit a410823b)
-
Lionel Landwerlin authored
timespec_get() is not available on macos, we need to pull in the include/c11/threads_posix.h helper. Signed-off-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103674 Fixes: e2d761de ("util: drop final reference to p_compiler.h") Reviewed-by:
Eric Engestrom <eric.engestrom@intel.com> (cherry picked from commit 9d3fc737)
-
Rafael Antognolli authored
On commit f6e7de41, we started emitting 3DSTATE_LINE_STIPPLE as part of the non-dynamic state. That gets re-emitted every time we bind a new VkPipeline. But that instruction is non-pipelined, and it caused a perf regression of about 9-10% on Dota2. This commit makes anv_dynamic_state_copy() return a mask with only the state that has changed when copying it. 3DSTATE_LINE_STIPPLE won't be emitted anymore unless it has changed, fixing the problem above. v2: Improve commit message and add documentation about skipped checks (Jason) Fixes: f6e7de41 ("anv: Implement VK_EXT_line_rasterization") Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Reviewed-by:
Lionel Landwerlin <lionel.g.landwerlin@intel.com> (cherry picked from commit 2b7ba9f2)
-
Alex Smith authored
Put the uncached GTT type at a higher index than the visible VRAM type, rather than having GTT first. When we don't have dedicated VRAM, we don't have a non-visible VRAM type, and the property flags for GTT and visible VRAM are identical. According to the spec, for types with identical flags, we should give the one with better performance a lower index. Previously, apps which follow the spec guidance for choosing a memory type would have picked the GTT type in preference to visible VRAM (all Feral games will do this), and end up with lower performance. On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of other (Feral) games and saw similar improvement on those as well. Signed-off-by:
Alex Smith <asmith@feralinteractive.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: 19.2 <mesa-stable@lists.freedesktop.org> (Bas: CCing this to 19.2-rc due to high impact and limited complexity) (cherry picked from commit fe0ec41c)
-
Dave Airlie authored
The virgl formats are fixed in time snapshots of the gallium ones, we just need to provide a translation table between them when we enter the hardware. This fixes a regression since Eric renumbered the gallium table. Fixes: c45c33a5 (gallium: Remove manual defining of PIPE_FORMAT enum values.) Bugzilla: https://bugs.freedesktop.org/111454 v1 by Dave Airlie <airlied@redhat.com> v2: virgl: Add a number of formats to the table that are used, e.g. for vertex attributes v3: cover some more missing formats from a piglit run Signed-off-by:
Gert Wollny <gert.wollny@collabora.com> (cherry picked from commit bba4d2f4)
-
Samuel Pitoiset authored
16-bit and 32-bit values match hardware values but 8-bit doesn't. This fixes dEQP-VK.pipeline.input_assembly.* with 8-bit index. Fixes: 372c3dcf ("radv: implement VK_EXT_index_type_uint8") Signed-off-by:
Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by:
Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (cherry picked from commit 89671ef2)
-
Kenneth Graunke authored
Jason suggested I remove this in review, and he's right. AFAICT this affects blending, and that just isn't going to happen on buffers. Fixes: f741de23 ("isl: Enable Unorm Path in Color Pipe") Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> (cherry picked from commit 1b090f06)
-
Kenneth Graunke authored
This fixes dEQP-GLES3.functional.texture.specification subtests on iris: - texsubimage3d_depth.depth24_stencil8_2d_array - texsubimage3d_depth.depth32f_stencil8_2d_array - texsubimage3d_depth.depth_component32f_2d_array - texsubimage3d_depth.depth_component24_2d_array - texstorage2d.format.depth24_stencil8_2d - texstorage2d.format.depth32f_stencil8_2d - texstorage2d.format.depth_component24_2d - texstorage2d.format.depth_component32f_2d - texstorage3d.format.depth24_stencil8_2d_array - texstorage3d.format.depth32f_stencil8_2d_array - texstorage3d.format.depth_component24_2d_array - texstorage3d.format.depth_component32f_2d_array Here, something appears to be going wrong with having this bit set during blorp_copy operations for texture upload, which override the format to R8G8B8A8_UINT. AFAICT this bit should have no effect for integer surfaces, as it has to do with blending, and integer blending is not a thing. So it should be harmless to disable it. The Windows driver appears to be setting this bit universally, so I am unclear why we would need to. Perhaps they simply haven't run into this issue. Fixes: f741de23 ("isl: Enable Unorm Path in Color Pipe") Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> (cherry picked from commit 2e1be771)
-
Paulo Zanoni authored
Looks like a copy/paste error. This patch prevents a segfault when running the following on BDW: INTEL_DEBUG=no8,no16,do32 ./deqp-vk -n \ dEQP-VK.subgroups.arithmetic.compute.subgroupmin_dvec4 For the curious, the message we're getting is: CS compile failed: Failure to register allocate. Reduce number of live scalar values to avoid this. Fixes: 864737ce ("i965/fs: Build 32-wide compute shader when needed.") Reviewed-by:
Jason Ekstrand <jason@jlekstrand.net> Signed-off-by:
Paulo Zanoni <paulo.r.zanoni@intel.com> (cherry picked from commit 848d5e44)
-
Dave Airlie authored
Not sure how I missed this before, but compswap was hitting an assert here as it is it's own special case. Fixes: b5ac381d ("gallivm: add buffer operations to the tgsi->llvm conversion.") Reviewed-by:
Roland Scheidegger <sroland@vmware.com> (cherry picked from commit 1eda49cc)
-
Marek Olšák authored
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111414 Fixes: b758eed9 ("radeonsi: make sure that blend state != NULL and remove all NULL checking") Cc: 19.2 <mesa-stable@lists.freedesktop.org> Tested-by:
Edmondo Tommasina <edmondo.tommasina@gmail.com> Reviewed-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> (cherry picked from commit f95a28d3)
-
Marek Olšák authored
Cc: 19.2 19.1 <mesa-stable@lists.freedesktop.org> Reviewed-by:
Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> (cherry picked from commit 360cf3c4)
-
Kenneth Graunke authored
This always returns a int64_t, translating to _mesa_lroundevenf on systems where long is 64-bit, and llrintf where "long long" is needed. Fixes: 594fc0f8 ("mesa: Replace F_TO_I() with _mesa_lroundevenf().") Reviewed-by:
Marek Olšák <marek.olsak@amd.com> Reviewed-by:
Matt Turner <mattst88@gmail.com> (cherry picked from commit b59914e1)
-
Kenneth Graunke authored
This fixes the following CTS test on 32-bit systems: GTF-GL46.gtf30.GL3Tests.packed_depth_stencil.packed_depth_stencil_init It does glGetTexImage of a 16-bit SNORM image, requesting 32-bit UNORM data. In get_tex_rgba_uncompressed, we round trip through float to handle image transfer ops for clamping. _mesa_format_convert does: _mesa_float_to_unorm(0.571428597f, 32) which translated to: _mesa_lroundevenf(0.571428597f * 0xffffffffu) which produced different results on 64-bit and 32-bit systems: 64-bit: result = 0x92492500 32-bit: result = 0x80000000 This is because the size of "long" varies between the two systems, and 0x92492500 is too large to fit in a signed 32-bit integer. To fix this, we switch to the new _mesa_i64roundevenf function which always does the 64-bit operation. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104395 Fixes: 594fc0f8 ("mesa: Replace F_TO_I() with _mesa_lroundevenf().") Reviewed-by:
Marek Olšák <marek.olsak@amd.com> Reviewed-by:
Matt Turner <mattst88@gmail.com> (cherry picked from commit e18cd545)
-
Ian Romanick authored
This caused a problem on Sandybridge where an open-coded bitfieldReverse() function could be optimized to a nir_op_bitfield_reverse that would generate an unsupported BFREV instruction in the backend. This was encountered in some Unreal4 tech demos in shader-db. The bug was not previously noticed because we don't actually try to run those demos on Sandybridge. The fixes tag is a bit a lie. The actual bug was introduced about 26,000 commits earlier in 371c4b3c ("nir: Recognize open-coded bitfield_reverse."). Without the NIR lowering pass, the flag needed to avoid the optimization does not exist. Hopefully nobody will care to fix this on an earlier Mesa release. Reviewed-by:
Matt Turner <mattst88@gmail.com> Fixes: 7afa26d4 ("nir: Add lowering for nir_op_bitfield_reverse.") (cherry picked from commit d3fd1c76)
-
Ian Romanick authored
See the previous commit for the explanation of the Fixes tag. Hurts 21 shaders in shader-db. All of the hurt shaders are in Unreal Engine 4 tech demos. Reviewed-by:
Matt Turner <mattst88@gmail.com> Fixes: 7afa26d4 ("nir: Add lowering for nir_op_bitfield_reverse.") (cherry picked from commit b418269d)
-
Kenneth Graunke authored
We enabled fast clears at level > 0, but didn't minify the dimensions when comparing the box size, so we always thought it was a partial clear and as a result never actually enabled any. This eliminates some slow clears in Civilization VI, but they are mostly during initialization and not the main rendering. Thanks to Dan Walsh for noticing we had too many slow clears. Fixes: 393f659e ("iris: Enable fast clears on other miplevels and layers than 0.") Reviewed-by:
Rafael Antognolli <rafael.antognolli@intel.com> (cherry picked from commit 30b9ed92)
-
Ian Romanick authored
This didn't fix bug #111308, but it was found will trying to find the actual cause of that bug. Fixes piglit tests (new in piglit!110): - fs-fract-of-NaN.shader_test - fs-lt-nan-tautology.shader_test - fs-ge-nan-tautology.shader_test No shader-db changes on any Intel platform. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308 Fixes: b77070e2 ("nir/algebraic: Use value range analysis to eliminate tautological compares") Reviewed-by:
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by:
Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (cherry picked from commit ccb236d1)
-
Ian Romanick authored
Fixes piglit tests (new in piglit!110): - fs-underflow-exp2-compare-zero.shader_test Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308 Fixes: 405de7cc ("nir/range-analysis: Rudimentary value range analysis pass") Reviewed-by:
Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Most of the shaders affected are, unsurprisingly, in Unigine Heaven. All Gen6+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16278207 -> 16278465 (<.01%) instructions in affected programs: 11374 -> 11632 (2.27%) helped: 0 HURT: 58 HURT stats (abs) min: 2 max: 13 x̄: 4.45 x̃: 4 HURT stats (rel) min: 0.54% max: 4.11% x̄: 2.42% x̃: 2.82% 95% mean confidence interval for instructions value: 3.77 5.13 95% mean confidence interval for instructions %-change: 2.19% 2.64% Instructions are HURT. total cycles in shared programs: 367134284 -> 367135159 (<.01%) cycles in affected programs: 81207 -> 82082 (1.08%) helped: 17 HURT: 36 helped stats (abs) min: 6 max: 356 x̄: 90.35 x̃: 6 helped stats (rel) min: 0.69% max: 21.45% x̄: 5.71% x̃: 0.78% HURT stats (abs) min: 4 max: 235 x̄: 66.97 x̃: 16 HURT stats (rel) min: 0.35% max: 27.58% x̄: 5.34% x̃: 1.09% 95% mean confidence interval for cycles value: -20.36 53.38 95% mean confidence interval for cycles %-change: -1.08% 4.67% Inconclusive result (value mean confidence interval includes 0). No changes on any earlier platforms. (cherry picked from commit 33ad2bab)
-
Ian Romanick authored
Fixes piglit tests (new in piglit!110): - fs-underflow-fma-compare-zero.shader_test - fs-underflow-mul-compare-zero.shader_test v2: Add back part of comment accidentally deleted. Noticed by Caio. Remove is_not_zero function as it is no longer used. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111308 Fixes: fa116ce3 ("nir/range-analysis: Range tracking for ffma and flrp") Fixes: 405de7cc ("nir/range-analysis: Rudimentary value range analysis pass") Reviewed-by:
Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> All Gen7+ platforms** had similar results. (Ice Lake shown) total instructions in shared programs: 16278465 -> 16279492 (<.01%) instructions in affected programs: 16765 -> 17792 (6.13%) helped: 0 HURT: 23 HURT stats (abs) min: 7 max: 275 x̄: 44.65 x̃: 8 HURT stats (rel) min: 1.15% max: 17.51% x̄: 4.23% x̃: 1.62% 95% mean confidence interval for instructions value: 9.57 79.74 95% mean confidence interval for instructions %-change: 1.85% 6.61% Instructions are HURT. total cycles in shared programs: 367135159 -> 367154270 (<.01%) cycles in affected programs: 279306 -> 298417 (6.84%) helped: 0 HURT: 23 HURT stats (abs) min: 13 max: 6029 x̄: 830.91 x̃: 54 HURT stats (rel) min: 0.17% max: 45.67% x̄: 7.33% x̃: 0.49% 95% mean confidence interval for cycles value: 100.89 1560.94 95% mean confidence interval for cycles %-change: 0.94% 13.71% Cycles are HURT. total spills in shared programs: 8870 -> 8869 (-0.01%) spills in affected programs: 19 -> 18 (-5.26%) helped: 1 HURT: 0 total fills in shared programs: 21904 -> 21901 (-0.01%) fills in affected programs: 81 -> 78 (-3.70%) helped: 1 HURT: 0 LOST: 0 GAINED: 1 ** On Broadwell, a shader was hurt for spills / fills instead of helped. No changes on any earlier platforms. (cherry picked from commit ef2e2352)
-
Ian Romanick authored
Found by inspection. I tried really, really hard to make a test case that would trigger this problem, but I was unsuccesful. It's very hard to get an instruction to produce a ne_zero result without ne_zero sources. The most plausible way is using bcsel. That proves problematic because bcsel interprets its sources as integers, so it cannot currently be used to "clean" values for floating point instructions. No shader-db changes on any Intel platform. Reviewed-by:
Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Fixes: 405de7cc ("nir/range-analysis: Rudimentary value range analysis pass") (cherry picked from commit 0b4782fc)
-
Ian Romanick authored
I discovered this while looking at a shader that was hurt by some other work I'm doing. When I examined the changes, I was confused that one instance of a comparison that was used in a discard_if was (incorrectly) eliminated, while another instance used by a bcsel was (correctly) not eliminated. I had to use NIR_PRINT=true to see exactly where things when wrong. A bunch of shaders in Goat Simulator, Dungeon Defenders, Sanctum 2, and Strike Suit Zero were impacted. Reviewed-by:
Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Fixes: 405de7cc ("nir/range-analysis: Rudimentary value range analysis pass") All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16280659 -> 16281075 (<.01%) instructions in affected programs: 21042 -> 21458 (1.98%) helped: 0 HURT: 136 HURT stats (abs) min: 1 max: 9 x̄: 3.06 x̃: 3 HURT stats (rel) min: 1.16% max: 6.12% x̄: 2.23% x̃: 2.03% 95% mean confidence interval for instructions value: 2.93 3.19 95% mean confidence interval for instructions %-change: 2.08% 2.37% Instructions are HURT. total cycles in shared programs: 367168270 -> 367170313 (<.01%) cycles in affected programs: 172020 -> 174063 (1.19%) helped: 14 HURT: 111 helped stats (abs) min: 2 max: 80 x̄: 21.21 x̃: 9 helped stats (rel) min: 0.10% max: 4.47% x̄: 1.35% x̃: 0.79% HURT stats (abs) min: 2 max: 584 x̄: 21.08 x̃: 5 HURT stats (rel) min: 0.12% max: 17.28% x̄: 1.55% x̃: 0.40% 95% mean confidence interval for cycles value: 5.41 27.28 95% mean confidence interval for cycles %-change: 0.64% 1.81% Cycles are HURT. (cherry picked from commit 7dba7df5)
-
Thong Thai authored
This reverts commit 5a2e65be. Even though CONTEXT_CONTROL is emitted by the kernel, CONTEXT_CONTROL still needs to be emitted by the UMD, or else the driver will hang Cc: 19.2 <mesa-stable@lists.freedesktop.org> Signed-off-by:
Thong Thai <thong.thai@amd.com> Reviewed-by:
Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 2a3a5604)
-
Pierre-Eric Pelloux-Prayer authored
This fixes a hang in shadertoy for radeonsi where a buffer was initialized with: value -= value with value being undefined. In this case LLVM replace the operation with an assignment to NaN. Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111241 Reviewed-by:
Marek Olšák <marek.olsak@amd.com> (cherry picked from commit 47cc660d)
-
Dylan Baker authored