As suggested in https://bugs.debian.org/851558 and implemented for autopkgtest in ci-team/autopkgtest!19 (merged) and ci-team/autopkgtest!20 (merged).
Flaky tests are intended to pass, but are known to be unreliable, subject to intermittent failures or otherwise unsuitable for gating CI. To determine when a flaky test has become reliable, it is useful to run it in realistic infrastructure for a while and gather data.
If a flaky test passes it is reported in the same way as any other passing test, but if a flaky test fails (for any reason) it is reported as flaky instead of as a failure.
A skippable test probes for some resource that it needs, which might be something too complicated to express in Restrictions (for example "/var/tmp supports extended attributes" for flatpak and ostree). If that resource is not found, the test exits 77 (a convention borrowed from Automake) and the test runner treats it as though it had been skipped based on its Restrictions.