Skip to content
Snippets Groups Projects

Extra tests workaround for the new serie

Merged Sebastien Bacher requested to merge seb128/gtk4:extra-tests-workarounds into debian/latest

I've uploaded 4.14.2 to experimental without those workarounds we currently carry in Ubuntu. I think those are suboptimal and the 32 bits issue is probably something should be fixed instead of ignored but those architectures aren't considered supported upstream and not a desktop target for Ubuntu and we don't have more resources to spend on those on the Ubuntu side at the moment. Unsure if we want to do the same in Debian but I'm at least pushing to get some feedback maybe from @smcv and @jbicha

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • The 32 bits workaround should also probably be changed to use DEB_BUILD_ARCH_BITS rather than a list of architectures

  • Sebastien Bacher added 3 commits

    added 3 commits

    • 8c22dae9 - ignore a bunch of reftests that a now failing on s390x
    • 753dae7a - disable gsk tests on i386 and arm since they are currently buggy and
    • 864c6fda - * debian/patches/ignore_a11ytext_i386.patch:

    Compare with previous version

  • added 1 commit

    • de3911a4 - * debian/patches/ignore_a11ytext_i386.patch:

    Compare with previous version

  • Sebastien Bacher added 4 commits

    added 4 commits

    • 7cccbc22 - * debian/ignore.keyfile:
    • e851f793 - mark also memorytexture as expected to fail on big endian
    • a410ba48 - Workaround on 32 bits architectures instead of names
    • 8417fe08 - Include some extra ignore reftests on s390x

    Compare with previous version

    • I've pushed extra workarounds for s390x, including bumping the debian/ignore.keyfile limit since adding some of the tests to ignore_reftests isn't enough

      # Storing test result image at /<<PKGBUILDDIR>>/debian/build/deb/testsuite/reftests/output/x11/center-center-100x100-picture-in-100x200.out.png
      # Storing test result image at /<<PKGBUILDDIR>>/debian/build/deb/testsuite/reftests/output/x11/center-center-100x100-picture-in-100x200.ref.png
      # Maximum difference tolerated: 255 levels
      # Different pixels tolerated: 4000
      # 10000 (out of 20000) pixels differ from reference by up to 255 levels
    • Please get the test result images out and compare them visually. If the difference in rendering would not be enough to harm use of an application, then we can ignore them. If the difference in rendering is significant (as I suspect it is here), then we should attach them to a usertagged Debian bug that CCs the appropriate porting team, and perhaps also an upstream issue if it seems appropriate. Upstream are not going to care about anything except x86 and maybe ARM, but an upstream issue is a good place to coordinate with other distros.

      https://gitlab.gnome.org/GNOME/librsvg/-/issues/972 is an example of an endian-specific upstream bug report in which I was able to extract and compare the reference/output/diff images (this is with librsvg rather than GTK but the principle is the same).

      If you're building GTK interactively, you can get the images from the "Storing test result image at..." paths.

      If you're building on a buildd, currently there is no way for a failing build to produce an equivalent of $AUTOPKGTEST_ARTIFACTS, so instead I wrote some horrible glue to uuencode the images and exfiltrate them via the buildd log (this is real "spite-fuelled development" stuff). You can uudecode them to get the image back for visual comparison.

      I feel strongly that we should be moving the burden of supporting s390x towards the s390x porting team, and so on, rather than simply ignoring test failures: if we don't report these bugs, then core teams will continue to believe that s390x has no ongoing maintenance cost and is 100% fine. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024391, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1003348, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1058740 are examples of what I'm hoping for. Unfortunately the only s390x porter who has showed up on that last bug seems to need a lot of guidance, and I do not have enough time or spoons available to be providing a personalized "beginner's guide to compiling Debian packages".

    • Please register or sign in to reply
    • @seb128 @smcv anything i can do to help ? This is currently blocking some GTK rust packages (indirectly) and I need 4.14 to fix that

    • If you can get the test results out and compare them visually, as described briefly in !23 (comment 480109), then that would be an important step towards either being reassured that it is OK to ignore those particular test failures, or identifying them as a significant problem.

    • After we have those comparisons, it would be wonderful if someone other than me could be the one to open bugs for them. I am sorry, but I am not in a position where the team can always rely on me to be a single point of failure for handling new GTK versions.

      I'm also trying to avoid getting into a situation where, whenever I open an upstream bug, the maintainers' automatic response is "oh no, another weird Debianism from smcv" and telling me to leave them alone (or, worse, silently ignoring the report).

    • Please register or sign in to reply
  • At a high level, the blocker for 4.14.4 progressing from experimental to unstable is "it needs to not FTBFS everywhere". Anything that someone can do to either fix the failing tests, or demonstrate that they are unimportant and then mark them to be ignored, would be useful.

  • Thanks, I kicked off a i386 build and will try to extract the images and open an issue upstream

  • Sorry but I don't really time to help moving that forward more. I noticed that some s390x issues got fixed upstream (https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/7111) so maybe we can remove those workarounds...

  • Just ftr: I can get the failing images form the base64 -> decode that and stuff it in a png ? I will open an issue then

  • hbox-with-ellipsizing-wrapping-label.diff:

    hbox-with-ellipsizing-wrapping-label.diff

    hbox-with-ellipsizing-wrapping-label.out:

    hbox-with-ellipsizing-wrapping-label.out

    hbox-with-ellipsizing-wrapping-label.ref: hbox-with-ellipsizing-wrapping-label.ref

    Edited by Matthias Geiger
  • window-border-diff window-border-width.diff

    window-border-width.out window-border-width.out

    window-border-width.ref window-border-width.ref

    Edited by Matthias Geiger
  • seems to me that it's rendered correctly but rather beside each other than on top of each other

    • Are those two the only failing tests? On what architecture(s)?

      seems to me that it's rendered correctly but rather beside each other than on top of each other

      That is a difference between the reference rendering and the actual rendering. Upstream's opinion (which we do not necessarily always share) is that any difference is meant to indicate a failure.

      I suspect that in these cases, there is something that is causing the reference and the test scenario to be wrapped differently. The next thing to look at would be what is in the reference (for example testsuite/reftests/window-border-width.ref.ui) and the test scenario (for example testsuite/reftests/window-border-width.ui) and think about why they might be rendered differently.

      Some possibilities:

      • Maybe they are using different font metrics? Could one of them be using a vendored/bundled font and the other one be using a system font from Debian, or something like that?
      • Maybe decisions about the most-correct wrapping are made differently by the version of Pango that upstream tested with, and the version of Pango (which one?) that we are using in Debian? I see that @jbicha recently uploaded a new version of Pango to unstable - does rebuilding gtk4/experimental against that Pango version change the test results?

      In hbox-with-ellipsizing-wrapping-label, it looks as though the only difference is that in the reference, the label is not set to ellipsize, and in the test scenario, it is.

      In window-border-width, it looks as though the only difference is that in the reference, the GtkLabel is inside a GtkBox inside the GtkWindow, and in the test scenario, the GtkLabel is directly inside the GtkWindow. Is there anything (Debian-specific?) that might cause a GtkBox to be rendered differently?

      Or, could the choice of whether to wrap "World" to a new line be non-deterministic? If you re-run these two tests repeatedly, do they give consistent results?

    • Are those two the only failing tests? On what architecture(s)?

      Those are the only failing tests on i386. I did not have time to run any other arch; might do so on the weekend.

      Some possibilities:

      • Maybe they are using different font metrics? Could one of them be using a vendored/bundled font and the other one be using a system font from Debian, or something like that?
      • Maybe decisions about the most-correct wrapping are made differently by the version of Pango that upstream tested with, and the version of Pango (which one?) that we are using in Debian? I see that @jbicha recently uploaded a new version of Pango to unstable - does rebuilding gtk4/experimental against that Pango version change the test results?

      In hbox-with-ellipsizing-wrapping-label, it looks as though the only difference is that in the reference, the label is not set to ellipsize, and in the test scenario, it is.

      In window-border-width, it looks as though the only difference is that in the reference, the GtkLabel is inside a GtkBox inside the GtkWindow, and in the test scenario, the GtkLabel is directly inside the GtkWindow. Is there anything (Debian-specific?) that might cause a GtkBox to be rendered differently?

      Or, could the choice of whether to wrap "World" to a new line be non-deterministic? If you re-run these two tests repeatedly, do they give consistent results?

      Thanks for the input. I will debug some more on the weekend and share my findings here.

      Edited by Matthias Geiger
    • Please register or sign in to reply
  • armel:

    Summary of Failures:
    
      9/668 gtk:gdk / memorytexture                                                      TIMEOUT        300.06s   killed by signal 15 SIGTERM
     44/668 gtk:gsk+gsk-nodeparser / parser empty-fill.node                              FAIL             1.03s   exit status 1
     53/668 gtk:gsk+gsk-nodeparser / parser empty-stroke.node                            FAIL             0.98s   exit status 1
     59/668 gtk:gsk+gsk-nodeparser / parser fill.node                                    FAIL             0.99s   exit status 1
     60/668 gtk:gsk+gsk-nodeparser / parser fill2.node                                   FAIL             0.98s   exit status 1
     73/668 gtk:gsk+gsk-nodeparser / parser stroke.node                                  FAIL             1.02s   exit status 1
     98/668 gtk:gsk / path-special-cases                                                 ERROR            0.78s   killed by signal 6 SIGABRT
    103/668 gtk:gsk / curve-special-cases                                                ERROR            0.74s   killed by signal 6 SIGABRT
    106/668 gtk:gsk / path-private                                                       ERROR            0.74s   killed by signal 6 SIGABRT
    494/668 gtk:css / parser at-invalid-18.css                                           TIMEOUT        300.06s   killed by signal 15 SIGTERM
    502/668 gtk:css / parser dash-backslash-eof-is-identifier.css                        TIMEOUT        300.06s   killed by signal 15 SIGTERM
    503/668 gtk:css / parser dash-dash-eof-is-identifier.css                             TIMEOUT        300.08s   killed by signal 15 SIGTERM
    508/668 gtk:css / parser filter-invalid1.css                                         TIMEOUT        300.06s   killed by signal 15 SIGTERM
    522/668 gtk:css / parser transition-timing-function-invalid2.css                     TIMEOUT        300.05s   killed by signal 15 SIGTERM

    Running an arm64 build overnight.

    Edited by Matthias Geiger
  • @smcv the gdk memorytexture fails on arm64 too; imo we should increase the timeout and re-upload.

  • bumping the general timeout to 3 passes the gdkmemorytexture test for me on i386. This would fix arm64 and ppc64el at least since they seem to suffer from the same issue. arm(el,hf), mips64el, risc64 and s390x have some image diff I will investigate.

  • On mips there is only this one test failing; the diff looks benign to me:

    label-shadows-diff label-shadows-diff

    label-shadows-ref label-shadows-ref

    label-shadows-out label-shadows-out

    The same test fails on riscv too. On riscv two other test error; imho those could be skipped.

    Edited by Matthias Geiger
  • Jeremy Bícha added 130 commits

    added 130 commits

    • 8417fe08...10c8811f - 122 commits from branch gnome-team:debian/latest
    • 530f7d47 - ignore a bunch of reftests that a now failing on s390x
    • c48a9167 - disable gsk tests on i386 and arm since they are currently buggy and
    • fe834f20 - * debian/patches/ignore_a11ytext_i386.patch:
    • fb97bbcf - * debian/ignore.keyfile:
    • ccc2b204 - mark also memorytexture as expected to fail on big endian
    • b5ae35b4 - Workaround on 32 bits architectures instead of names
    • 38329e90 - Include some extra ignore reftests on s390x
    • b6dbc1a7 - Update changelog

    Compare with previous version

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading