Skip to content

btrfs: excessive disk writes (amplification)

The idle writes to disk need to be reduced. (Long running precesses can be seen with iotop -a-> and 2x left arrow key for sorting.)

  • Mount with noatime (instead of just relatime), to avoid all writing when reading.

  • For SSDs, the btrfs mount option nospace_cache (minimal speed penalty for SSDs) seems useful to reduce the "btrfs-transacti" idle-time write overhead to about 350K per minute, better, but still much just for no-op idling on sdcards.

  • /tmp in RAM

    using /etc/fstab is universal and allows for more secure mount options:

    echo "tmpfs /tmp tmpfs noatime,nosuid,noexec,nodev 0 0" >> /etc/fstab

    mkdir -p /var/cache/apt/tmp

    echo "tmpfs /var/cache/apt/tmp tmpfs noatime,nosuid,nodev 0 0" >> /etc/fstab

    And echo "APT::ExtractTemplates::TempDir \"/var/cache/apt/tmp\";" > /etc/apt/apt.conf.d/01-tempdir as only-root writable tmpfs directory that is not noexec.

    https://askubuntu.com/questions/1232004/mounting-tmp-as-tmpfs-on-ubuntu-20-04

    cp -v /usr/share/systemd/tmp.mount /etc/systemd/system/ ; systemctl enable tmp.mount

- [ ] /var/cache in RAM? (Is /etc/fstab processed too late?) echo "tmpfs /var/cache tmpfs noatime,nosuid,noexec,nodev 0 0" >> /etc/fstab

  • !2032 (closed)

  • echo -e "[Definition]\ndbfile = :memory:\n" > /etc/fail2ban.local

  • NetworkManger seems to write frequently to some file(s) in /var/lib/NetworkManager/? Maybe some cache-like file(s) can be moved and symlinked to /var/run/NetworkManager/?

  • Changing all the "persistence" lines in /etc/nscd.conf to "no" stops nscd from constantly doing cache writes, as seen in iotop.

  • periodic slapd writes

  • periodic python3 ... .sshd writes

Further conclusion from the discussion and experiments below:

Considering the use of sdcards (btrfs 30x write amplification + the device internal write amplification) I think it's advisable to have sdcard images with root on F2FS, and support to create a separate btrfs filesystem:

  • To hold just (central) important /home user-data (external disk).
  • And optionally to also move the rootfs of the installed system from the F2FS into a btrfs subvolume to the created btrfs partition (adding nospace_cache on SSDs).

Some BTRFS workarounds:

  • The process "btrfs-transacti" bursts out several MBs per minute even when idling during a day, as can be seen with iotop (and pressing a for accumulation, and 2x<- to sort).

    See also https://superuser.com/questions/1211324/btrfs-transacti-writes-to-disk-every-30-seconds

    As auto-defragmentation stays disabled by default these days, that answer only suggests it could be due to the copy-on-write /var/log directory.

    Possible fixes/improvements:

    • Enable btrfs compression to reduce write-volume in general?

    • Disabling all (see below) snapshots in plinth (and deleting existing) does reduce the hourly writes considerably

      Boot snapshots workaround: systemctl disable snapper-boot.timer (#2037)

    • Set up chattr +C on the /var/log directory or subvolume, before the creation or copying of any log files (some move,re-create,copy-back pivot is required (#2034 (comment 226841)) to convert preexisting logfiles on upgrades)

      • Note, /var/log/journal already ships with the C attribute set. Converting the entire /var/log (the classic log files) did not seem to reduce the idle overhead writes. => So, not worth to drop COW, compression and checksumming for classic text log files?
    • Also move other parts of the system (like the user-data) into separate subvolumes, to reduce the system snapshot size, and have these parts snapshoted separately from the system installs and rollbacks.

    • Possibly mount with option commit=600 to flush all data to the disk only every 10 minutes (/etc/fstab). (To aggregate fluctuating writes, and write larger chunks.) However, not well suited for user data like /home.


Related: Similar fedora issue: https://pagure.io/fedora-btrfs/project/issue/36

Edited by Q.-A. Nick