1. 26 Apr, 2016 1 commit
  2. 23 Mar, 2016 1 commit
  3. 20 Mar, 2016 1 commit
  4. 10 Jan, 2016 1 commit
  5. 01 Nov, 2015 2 commits
  6. 13 Aug, 2015 1 commit
  7. 08 Feb, 2014 1 commit
  8. 24 Aug, 2013 1 commit
  9. 10 Apr, 2013 1 commit
  10. 08 Apr, 2013 1 commit
  11. 13 Mar, 2013 1 commit
    • Brian Behlendorf's avatar
      Change zfs-kmod-devel install path · 775f2d34
      Brian Behlendorf authored
      Install the common zfs kernel development headers under
      /usr/src/zfs-<version>/ rather than in a kernel specific
      directory.  The kernel specific build products such as
      zfs_config.h and Modules.symvers are left installed under
      This was done to be consistent with where dkms expects
      kernel module source to be packaged.  It also allows for
      a common zfs-kmod-devel package which includes the headers,
      and per-kernel zfs-kmod-devel-<kernel> packages.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
  12. 06 Mar, 2013 1 commit
  13. 05 Mar, 2013 1 commit
    • Eric Dillmann's avatar
      Add snapdev=[hidden|visible] dataset property · 0b4d1b58
      Eric Dillmann authored
      The new snapdev dataset property may be set to control the
      visibility of zvol snapshot devices.  By default this value
      is set to 'hidden' which will prevent zvol snapshots from
      appearing under /dev/zvol/ and /dev/<dataset>/.  When set to
      'visible' all zvol snapshots for the dataset will be visible.
      This functionality was largely added because when automatic
      snapshoting is enabled large numbers of read-only zvol snapshots
      will be created.  When creating these devices the kernel will
      attempt to read their partition tables, and blkid will attempt
      to identify any filesystems on those partitions.  This leads
      to a variety of issues:
      1) The zvol partition tables will be read in the context of
         the `modprobe zfs` for automatically imported pools.  This
         is undesirable and should be done asynchronously, but for
         now reducing the number of visible devices helps.
      2) Udev expects to be able to complete its work for a new
         block devices fairly quickly.  When many zvol devices are
         added at the same time this is no longer be true.  It can
         lead to udev timeouts and missing /dev/zvol links.
      3) Simply having lots of devices in /dev/ can be aukward from
         a management standpoint.  Hidding the devices your unlikely
         to ever use helps with this.  Any snapshot device which is
         needed can be made visible by changing the snapdev property.
      NOTE: This patch changes the default behavior for zvols which
            was effectively 'snapdev=visible'.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #1235
      Closes #945
      Issue #956
      Issue #756
  14. 04 Mar, 2013 1 commit
    • Richard Yao's avatar
      Constify structures containing function pointers · b01615d5
      Richard Yao authored
      The PaX team modified the kernel's modpost to report writeable function
      pointers as section mismatches because they are potential exploit
      targets. We could ignore the warnings, but their presence can obscure
      actual issues. Proper const correctness can also catch programming
      Building the kernel modules against a PaX/GrSecurity patched Linux 3.4.2
      kernel reports 133 section mismatches prior to this patch. This patch
      eliminates 130 of them. The quantity of writeable function pointers
      eliminated by constifying each structure is as follows:
      vdev_opts_t             52
      zil_replay_func_t       24
      zio_compress_info_t     24
      zio_checksum_info_t     9
      space_map_ops_t         7
      arc_byteswap_func_t     5
      The remaining 3 writeable function pointers cannot be addressed by this
      patch. 2 of them are in zpl_fs_type. The kernel's sget function requires
      that this be non-const. The final writeable function pointer is created
      by SPL_SHRINKER_DECLARE. The kernel's set_shrinker() and
      remove_shrinker() functions also require that this be non-const.
      Signed-off-by: default avatarRichard Yao <ryao@cs.stonybrook.edu>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #1300
  15. 01 Mar, 2013 1 commit
    • Brian Behlendorf's avatar
      Fix hot spares · 8128bd89
      Brian Behlendorf authored
      The issue with hot spares in ZoL is because it opens all leaf
      vdevs exclusively (O_EXCL).  On Linux, exclusive opens cause
      subsequent exclusive opens to fail with EBUSY.
      This could be resolved by not opening any of the devices
      exclusively, which is what Illumos does, but the additional
      protection offered by exclusive opens is desirable.  It cleanly
      prevents you from accidentally adding an in-use non-ZFS device
      to your pool.
      To fix this we very slightly relaxed the usage of O_EXCL in
      the following ways.
      1) Functions which open the device but only read had the
         O_EXCL flag removed and were updated to use O_RDONLY.
      2) A common holder was added to the vdev disk code.  This
         allow the ZFS code to internally open the device multiple
         times but non-ZFS callers may not.
      3) An exception was added to make_disks() for hot spare when
         creating partition tables.  For hot spare devices which
         are already opened exclusively we skip creating the partition
         table because this must already have been done when the disk
         was originally added as a hot spare.
      Additional minor changes include fixing check_in_use() to use
      a partition instead of a slice suffix.  And is_spare() was moved
      above make_disks() to avoid adding a forward reference.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #250
  16. 29 Jan, 2013 3 commits
    • Brian Behlendorf's avatar
      Retire zpool_id infrastructure · dbf763b3
      Brian Behlendorf authored
      In the interest of maintaining only one udev helper to give vdevs
      user friendly names, the zpool_id and zpool_layout infrastructure
      is being retired.  They are superseded by vdev_id which incorporates
      all the previous functionality.
      Documentation for the new vdev_id(8) helper and its configuration
      file, vdev_id.conf(5), can be found in their respective man pages.
      Several useful example files are installed under /etc/zfs/.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #981
    • Brian Behlendorf's avatar
      Remove NPTL_GUARD_WITHIN_STACK · 79c6e4c4
      Brian Behlendorf authored
      Commit 4b2f65b2 increased the user
      space stack by 4x to resolve certain stack overflows.  As such it
      no longer makes sense to worry about a single extra page which
      might or might not be part of the process stack.  There is now
      ample headroom for normal usage.
      By eliminating this configure check we are also resolving the
      following segfault which intentionally occurs at configure time
      and may be logged in dmesg.
        conftest[22156]: segfault at 7fbf18a47e48 ip 00000000004007fe
        sp 00007fbf18a4be50 error 6 in conftest[400000+1000]
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
    • Eric Dillmann's avatar
      Illumos #3035 LZ4 compression support in ZFS and GRUB · 9759c60f
      Eric Dillmann authored
      3035 LZ4 compression support in ZFS and GRUB
      Reviewed by: Matthew Ahrens <mahrens@delphix.com>
      Reviewed by: Christopher Siden <christopher.siden@delphix.com>
      Reviewed by: George Wilson <george.wilson@delphix.com>
      Approved by: Christopher Siden <csiden@delphix.com>
      This patch has been slightly modified from the upstream Illumos
      version to be compatible with Linux.  Due to the very limited
      stack space in the kernel a lz4 workspace kmem cache is used.
      Since we are using gcc we are also able to take advantage of the
      gcc optimized __builtin_ctz functions.
      Support for GRUB has been dropped from this patch.  That code
      is available but those changes will need to made to the upstream
      GRUB package.
      Lastly, several hunks of dead code were dropped for clarity.  They
      include the functions real_LZ4_uncompress(), LZ4_compressBound()
      and the Visual Studio specific hunks wrapped in _MSC_VER.
      Ported-by: default avatarEric Dillmann <eric@jave.fr>
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #1217
  17. 28 Jan, 2013 1 commit
    • Brian Behlendorf's avatar
      Linux 2.6.26 compat, lookup_bdev() · 2b7ab9d4
      Brian Behlendorf authored
      It's doubtful many people were impacted by this but commit 6c285672
      accidentally broke ZFS builds for 2.6.26 and earlier kernels.  This
      commit depends on the lookup_bdev() function which exists in 2.6.26
      but wasn't exported until 2.6.27.
      The availability of the function isn't critical so a wrapper is
      introduced which returns ERR_PTR(-ENOTSUP) when the function isn't
      defined.  This will have the effect of causing zvol_is_zvol() to
      always fail for 2.6.26 kernels.  This in turn means vdevs will
      always get opened concurrently which is good for normal usage.
      This will only become an issue if your using a zvol as a vdev in
      another pool.  In which case you really should be using a newer
      kernel anyway.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #1205
  18. 25 Jan, 2013 1 commit
  19. 24 Jan, 2013 2 commits
    • Brian Behlendorf's avatar
      Add d_clear_d_op() compatibility · 876ef85d
      Brian Behlendorf authored
      Added d_clear_d_op() helper function which clears some flags and the
      registered dentry->d_op table.  This is required because d_set_d_op()
      issues a warning when the dentry operations table is already set.
      For the .zfs control directory to work properly we must be able to
      override the default operations table and register custom .d_automount
      and .d_revalidate callbacks.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarNed Bass <bass6@llnl.gov>
      Closes #1230
    • Brian Behlendorf's avatar
      Add d_clear_d_op() compatibility · bf01b5e6
      Brian Behlendorf authored
      Added d_clear_d_op() helper function which clears some flags and the
      registered dentry->d_op table.  This is required because d_set_d_op()
      issues a warning when the dentry operations table is already set.
      For the .zfs control directory to work properly we must be able to
      override the default operations table and register custom .d_automount
      and .d_revalidate callbacks.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarNed Bass <bass6@llnl.gov>
      Closes #1230
  20. 17 Jan, 2013 2 commits
    • Brian Behlendorf's avatar
      Fix 'zfs rollback' on mounted file systems · 7b3e34ba
      Brian Behlendorf authored
      Rolling back a mounted filesystem with open file handles and
      cached dentries+inodes never worked properly in ZoL.  The
      major issue was that Linux provides no easy mechanism for
      modules to invalidate the inode cache for a file system.
      Because of this it was possible that an inode from the previous
      filesystem would not get properly dropped from the cache during
      rolling back.  Then a new inode with the same inode number would
      be create and collide with the existing cached inode.  Ideally
      this would trigger an VERIFY() but in practice the error wasn't
      handled and it would just NULL reference.
      Luckily, this issue can be resolved by sprucing up the existing
      Solaris zfs_rezget() functionality for the Linux VFS.
      The way it works now is that when a file system is rolled back
      all the cached inodes will be traversed and refetched from disk.
      If a version of the cached inode exists on disk the in-core
      copy will be updated accordingly.  If there is no match for that
      object on disk it will be unhashed from the inode cache and
      marked as stale.
      This will effectively make the inode unfindable for lookups
      allowing the inode number to be immediately recycled.  The inode
      will then only be accessible from the cached dentries.  Subsequent
      dentry lookups which reference a stale inode will result in the
      dentry being invalidated.  Once invalidated the dentry will drop
      its reference on the inode allowing it to be safely pruned from
      the cache.
      Special care is taken for negative dentries since they do not
      reference any inode.  These dentires will be invalidate based
      on when they were added to the dentry cache.  Entries added
      before the last rollback will be invalidate to prevent them
      from masking real files in the dataset.
      Two nice side effects of this fix are:
      * Removes the dependency on spl_invalidate_inodes(), it can now
        be safely removed from the SPL when we choose to do so.
      * zfs_znode_alloc() no longer requires a dentry to be passed.
        This effectively reverts this portition of the code to its
        upstream counterpart.  The dentry is not instantiated more
        correctly in the Linux ZPL layer.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Signed-off-by: default avatarNed Bass <bass6@llnl.gov>
      Closes #795
    • Ned Bass's avatar
      Fix false ENOENT on snapshot control dentries · f1a05fa1
      Ned Bass authored
      Lookups in the snapshot control directory for an existing snapshot
      fail with ENOENT if an earlier lookup failed before the snapshot was
      created.  This is because the earlier lookup causes a negative dentry
      to be cached which is never invalidated.
      The bug can be reproduced as follows (the second ls should succeed):
       $ ls /tank/.zfs/snapshot/s
       ls: cannot access /tank/.zfs/snapshot/s: No such file or directory
       $ zfs snap tank@s
       $ ls /tank/.zfs/snapshot/s
       ls: cannot access /tank/.zfs/snapshot/s: No such file or directory
      To remedy this, always invalidate cached dentries in the snapshot
      control directory.  Since these entries never exist on disk there is
      no significant performance penalty for the extra lookups.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #1192
  21. 14 Jan, 2013 1 commit
  22. 08 Jan, 2013 6 commits
  23. 07 Jan, 2013 1 commit
  24. 20 Dec, 2012 1 commit
  25. 19 Dec, 2012 1 commit
    • Brian Behlendorf's avatar
      Remove TSD zfs_fsyncer_key · 31f2b5ab
      Brian Behlendorf authored
      It's my understanding that the zfs_fsyncer_key TSD was added as
      a performance omtimization to reduce contention on the zl_lock
      from zil_commit().  This issue manifested itself as very long
      (100+ms) fsync() system call times for fsync() heavy workloads.
      However, under Linux I'm not seeing the same contention that
      was originally described.  Therefore, I'm removing this code
      in order to ween ourselves off any dependence on TSD.  If the
      original performance issue reappears on Linux we can revisit
      fixing it without resorting to TSD.
      This just leaves one small ZFS TSD consumer.  If it can be
      cleanly removed from the code we'll be able to shed the SPL
      TSD implementation entirely.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes zfsonlinux/spl#174
  26. 18 Dec, 2012 1 commit
    • Jorgen Lundman's avatar
      Fix using zvol as slog device · 6c285672
      Jorgen Lundman authored
      During the original ZoL port the vdev_uses_zvols() function was
      disabled until it could be properly implemented.  This prevented
      a zpool from use a zvol for its slog device.
      This patch implements that missing functionality by adding a
      zvol_is_zvol() function to zvol.c.  Given the full path to a
      device it will lookup the device and verify its major number
      against the registered zvol major number for the system.  If
      they match we know the device is a zvol.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #1131
  27. 14 Dec, 2012 1 commit
    • Brian Behlendorf's avatar
      Update SAs when an inode is dirtied · 8780c539
      Brian Behlendorf authored
      Revert the portion of commit d3aa3ea9 which always resulted in the
      SAs being update when an mmap()'ed file was closed.  That change
      accidentally resulted in unexpected ctime updates which upset tools
      like git.  That was always a horrible hack and I'm happy it will
      never make it in to a tagged release.
      The right fix is something I initially resisted doing because I
      was worried about the additional overhead.  However, in hindsight
      the overhead isn't as bad as I feared.
      This patch implemented the sops->dirty_inode() callback which is
      unsurprisingly called when an inode is dirtied.  We leverage this
      callback to keep the znode SAs strictly in sync with the inode.
      However, for now we're going to go slowly to avoid introducing
      any new unexpected issues by only updating the atime, mtime, and
      ctime.  This will cover the callpath of most concern to us.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #764
      Closes #1140
  28. 12 Dec, 2012 1 commit
    • Brian Behlendorf's avatar
      Linux 3.7 compat, schedule_delayed_work() · 2ae10319
      Brian Behlendorf authored
      Linux kernel commit d8e794d accidentally broke the delayed work
      APIs for non-GPL callers.   While the APIs to schedule a delayed
      work item are still available to all callers, it is no longer
      possible to initialize the delayed work item.
      I'm cautiously optimistic we could get the delayed_work_timer_fn
      exported for all callers in the upstream kernel.  But frankly
      the compatibility code to use this kernel interface has always
      been problematic.
      Therefore, this patch abandons direct use the of the Linux
      kernel interface in favor of the new delayed taskq interface.
      It provides roughly the same functionality as delayed work queues
      but it's a stable interface under our control.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #1053
  29. 03 Dec, 2012 1 commit
    • Brian Behlendorf's avatar
      Directory xattr znodes hold a reference on their parent · e89260a1
      Brian Behlendorf authored
      Unlike normal file or directory znodes, an xattr znode is
      guaranteed to only have a single parent.  Therefore, we can
      take a refernce on that parent if it is provided at create
      time and cache it.  Additionally, we take care to cache it
      on any subsequent zfs_zaccess() where the parent is provided
      as an optimization.
      This allows us to avoid needing to do a zfs_zget() when
      setting up the SELinux security xattr in the create path.
      This is critical because a hash lookup on the directory
      will deadlock since it is locked.
      The zpl_xattr_security_init() call has also been moved up
      to the zpl layer to ensure TXs to create the required
      xattrs are performed after the create TX.  Otherwise we
      run the risk of deadlocking on the open create TX.
      Ideally the security xattr should be fully constructed
      before the new inode is unlocked.  However, doing so would
      require far more extensive changes to ZFS.
      This change may also have the benefitial side effect of
      ensuring xattr directory znodes are evicted from the cache
      before normal file or directory znodes due to the extra
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #671
  30. 27 Nov, 2012 1 commit
    • Brian Behlendorf's avatar
      Increase ZFS_OBJ_MTX_SZ to 256 · 30315d23
      Brian Behlendorf authored
      Increasing this limit costs us 6144 bytes of memory per mounted
      filesystem, but this is small price to pay for accomplishing
      the following:
      * Allows for up to 256-way concurreny when performing lookups
        which helps performance when there are a large number of
      * Minimizes the likelyhood of encountering the deadlock
        described in issue #1101.  Because vmalloc() won't strictly
        honor __GFP_FS there is still a very remote chance of a
        deadlock.  See the zfsonlinux/spl@043f9b57 commit.
      Signed-off-by: default avatarBrian Behlendorf <behlendorf1@llnl.gov>
      Closes #1101