Extra tests workaround for the new serie

The 32 bits workaround should also probably be changed to use DEB_BUILD_ARCH_BITS rather than a list of architectures

added 3 commits

8c22dae9 - ignore a bunch of reftests that a now failing on s390x
753dae7a - disable gsk tests on i386 and arm since they are currently buggy and
864c6fda - * debian/patches/ignore_a11ytext_i386.patch:

Compare with previous version

added 1 commit

de3911a4 - * debian/patches/ignore_a11ytext_i386.patch:

Compare with previous version

added 4 commits

7cccbc22 - * debian/ignore.keyfile:
e851f793 - mark also memorytexture as expected to fail on big endian
a410ba48 - Workaround on 32 bits architectures instead of names
8417fe08 - Include some extra ignore reftests on s390x

Compare with previous version

I've pushed extra workarounds for s390x, including bumping the debian/ignore.keyfile limit since adding some of the tests to ignore_reftests isn't enough

# Storing test result image at /<<PKGBUILDDIR>>/debian/build/deb/testsuite/reftests/output/x11/center-center-100x100-picture-in-100x200.out.png
# Storing test result image at /<<PKGBUILDDIR>>/debian/build/deb/testsuite/reftests/output/x11/center-center-100x100-picture-in-100x200.ref.png
# Maximum difference tolerated: 255 levels
# Different pixels tolerated: 4000
# 10000 (out of 20000) pixels differ from reference by up to 255 levels

Please get the test result images out and compare them visually. If the difference in rendering would not be enough to harm use of an application, then we can ignore them. If the difference in rendering is significant (as I suspect it is here), then we should attach them to a usertagged Debian bug that CCs the appropriate porting team, and perhaps also an upstream issue if it seems appropriate. Upstream are not going to care about anything except x86 and maybe ARM, but an upstream issue is a good place to coordinate with other distros.

https://gitlab.gnome.org/GNOME/librsvg/-/issues/972 is an example of an endian-specific upstream bug report in which I was able to extract and compare the reference/output/diff images (this is with librsvg rather than GTK but the principle is the same).

If you're building GTK interactively, you can get the images from the "Storing test result image at..." paths.

If you're building on a buildd, currently there is no way for a failing build to produce an equivalent of $AUTOPKGTEST_ARTIFACTS, so instead I wrote some horrible glue to uuencode the images and exfiltrate them via the buildd log (this is real "spite-fuelled development" stuff). You can uudecode them to get the image back for visual comparison.

I feel strongly that we should be moving the burden of supporting s390x towards the s390x porting team, and so on, rather than simply ignoring test failures: if we don't report these bugs, then core teams will continue to believe that s390x has no ongoing maintenance cost and is 100% fine. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024391, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1003348, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1058740 are examples of what I'm hoping for. Unfortunately the only s390x porter who has showed up on that last bug seems to need a lot of guidance, and I do not have enough time or spoons available to be providing a personalized "beginner's guide to compiling Debian packages".

@seb128 @smcv anything i can do to help ? This is currently blocking some GTK rust packages (indirectly) and I need 4.14 to fix that

If you can get the test results out and compare them visually, as described briefly in !23 (comment 480109), then that would be an important step towards either being reassured that it is OK to ignore those particular test failures, or identifying them as a significant problem.

After we have those comparisons, it would be wonderful if someone other than me could be the one to open bugs for them. I am sorry, but I am not in a position where the team can always rely on me to be a single point of failure for handling new GTK versions.

I'm also trying to avoid getting into a situation where, whenever I open an upstream bug, the maintainers' automatic response is "oh no, another weird Debianism from smcv" and telling me to leave them alone (or, worse, silently ignoring the report).

At a high level, the blocker for 4.14.4 progressing from experimental to unstable is "it needs to not FTBFS everywhere". Anything that someone can do to either fix the failing tests, or demonstrate that they are unimportant and then mark them to be ignored, would be useful.

Thanks, I kicked off a i386 build and will try to extract the images and open an issue upstream

Sorry but I don't really time to help moving that forward more. I noticed that some s390x issues got fixed upstream (https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/7111) so maybe we can remove those workarounds...

Just ftr: I can get the failing images form the base64 -> decode that and stuff it in a png ? I will open an issue then

hbox-with-ellipsizing-wrapping-label.diff:

hbox-with-ellipsizing-wrapping-label.out:

hbox-with-ellipsizing-wrapping-label.ref:

window-border-diff

window-border-width.out

window-border-width.ref

seems to me that it's rendered correctly but rather beside each other than on top of each other

Are those two the only failing tests? On what architecture(s)?

seems to me that it's rendered correctly but rather beside each other than on top of each other

That is a difference between the reference rendering and the actual rendering. Upstream's opinion (which we do not necessarily always share) is that any difference is meant to indicate a failure.

I suspect that in these cases, there is something that is causing the reference and the test scenario to be wrapped differently. The next thing to look at would be what is in the reference (for example testsuite/reftests/window-border-width.ref.ui) and the test scenario (for example testsuite/reftests/window-border-width.ui) and think about why they might be rendered differently.

Some possibilities:

Maybe they are using different font metrics? Could one of them be using a vendored/bundled font and the other one be using a system font from Debian, or something like that?
Maybe decisions about the most-correct wrapping are made differently by the version of Pango that upstream tested with, and the version of Pango (which one?) that we are using in Debian? I see that @jbicha recently uploaded a new version of Pango to unstable - does rebuilding gtk4/experimental against that Pango version change the test results?

In hbox-with-ellipsizing-wrapping-label, it looks as though the only difference is that in the reference, the label is not set to ellipsize, and in the test scenario, it is.

In window-border-width, it looks as though the only difference is that in the reference, the GtkLabel is inside a GtkBox inside the GtkWindow, and in the test scenario, the GtkLabel is directly inside the GtkWindow. Is there anything (Debian-specific?) that might cause a GtkBox to be rendered differently?

Or, could the choice of whether to wrap "World" to a new line be non-deterministic? If you re-run these two tests repeatedly, do they give consistent results?

Are those two the only failing tests? On what architecture(s)?

Those are the only failing tests on i386. I did not have time to run any other arch; might do so on the weekend.

Some possibilities:

Maybe they are using different font metrics? Could one of them be using a vendored/bundled font and the other one be using a system font from Debian, or something like that?

Maybe decisions about the most-correct wrapping are made differently by the version of Pango that upstream tested with, and the version of Pango (which one?) that we are using in Debian? I see that @jbicha recently uploaded a new version of Pango to unstable - does rebuilding gtk4/experimental against that Pango version change the test results?

In hbox-with-ellipsizing-wrapping-label, it looks as though the only difference is that in the reference, the label is not set to ellipsize, and in the test scenario, it is.

In window-border-width, it looks as though the only difference is that in the reference, the GtkLabel is inside a GtkBox inside the GtkWindow, and in the test scenario, the GtkLabel is directly inside the GtkWindow. Is there anything (Debian-specific?) that might cause a GtkBox to be rendered differently?

Or, could the choice of whether to wrap "World" to a new line be non-deterministic? If you re-run these two tests repeatedly, do they give consistent results?

Thanks for the input. I will debug some more on the weekend and share my findings here.

armel:

Summary of Failures:

  9/668 gtk:gdk / memorytexture                                                      TIMEOUT        300.06s   killed by signal 15 SIGTERM
 44/668 gtk:gsk+gsk-nodeparser / parser empty-fill.node                              FAIL             1.03s   exit status 1
 53/668 gtk:gsk+gsk-nodeparser / parser empty-stroke.node                            FAIL             0.98s   exit status 1
 59/668 gtk:gsk+gsk-nodeparser / parser fill.node                                    FAIL             0.99s   exit status 1
 60/668 gtk:gsk+gsk-nodeparser / parser fill2.node                                   FAIL             0.98s   exit status 1
 73/668 gtk:gsk+gsk-nodeparser / parser stroke.node                                  FAIL             1.02s   exit status 1
 98/668 gtk:gsk / path-special-cases                                                 ERROR            0.78s   killed by signal 6 SIGABRT
103/668 gtk:gsk / curve-special-cases                                                ERROR            0.74s   killed by signal 6 SIGABRT
106/668 gtk:gsk / path-private                                                       ERROR            0.74s   killed by signal 6 SIGABRT
494/668 gtk:css / parser at-invalid-18.css                                           TIMEOUT        300.06s   killed by signal 15 SIGTERM
502/668 gtk:css / parser dash-backslash-eof-is-identifier.css                        TIMEOUT        300.06s   killed by signal 15 SIGTERM
503/668 gtk:css / parser dash-dash-eof-is-identifier.css                             TIMEOUT        300.08s   killed by signal 15 SIGTERM
508/668 gtk:css / parser filter-invalid1.css                                         TIMEOUT        300.06s   killed by signal 15 SIGTERM
522/668 gtk:css / parser transition-timing-function-invalid2.css                     TIMEOUT        300.05s   killed by signal 15 SIGTERM

Running an arm64 build overnight.

@smcv the gdk memorytexture fails on arm64 too; imo we should increase the timeout and re-upload.

bumping the general timeout to 3 passes the gdkmemorytexture test for me on i386. This would fix arm64 and ppc64el at least since they seem to suffer from the same issue. arm(el,hf), mips64el, risc64 and s390x have some image diff I will investigate.