1. 12 Sep, 2018 1 commit
    • Mark Fasheh's avatar
      Update our printed text · d2a228da
      Mark Fasheh authored
      Make almost all bare printf's 'qprintf'. Exceptions:
      
      - blocksize and hash type prints are verbose instead
      
      - The 'kernel processed data' print becomes a vprintf. It's not
        typically useful.
      Signed-off-by: 's avatarMark Fasheh <mfasheh@suse.de>
      d2a228da
  2. 01 May, 2018 1 commit
    • Mark Fasheh's avatar
      Initialize 'end' in walk_dupe_block · 748cc9da
      Mark Fasheh authored
      Fix these warnings. There's no bug, but they're distracting:
      
      find_dupes.c: In function ‘find_dupes_worker’:
      find_dupes.c:216:21: warning: ‘end[1]’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        struct file_block *end[2];
                           ^~~
      find_dupes.c:216:21: warning: ‘end[0]’ may be used uninitialized in this function [-Wmaybe-uninitialized]
      Signed-off-by: 's avatarMark Fasheh <mfasheh@suse.de>
      748cc9da
  3. 26 Sep, 2016 1 commit
  4. 21 Sep, 2016 5 commits
  5. 16 Sep, 2016 11 commits
  6. 02 Sep, 2016 1 commit
  7. 02 Aug, 2016 1 commit
    • Mark Fasheh's avatar
      Track when we run dedupe · 6f2c873a
      Mark Fasheh authored
      We were using FILEREC_RESCANNED in dedupe to tell us whether a file could be
      skipped. This is problematic however - FILEREC_RESCANNED only told us if the
      file was rescanned during our current run. As a result files that were added
      to the hashfile prior to a run with the dedupe flag set were being ignored
      as candidates for dedupe.
      
      We don't want to go about writing flags after every extent dedupe so use a
      global sequence (dedupe_seq) instead. dedupe_seq starts at 0 and is
      incremented only when we run with dedupe. On scan/rescan, files get a
      sequence of dedupe_seq + 1. The dupe finding code then can simply ask
      whether file->dedupe_seq <= dedupe_seq to see if a file can be skipped.
      
      This patch completely replaces the FILEREC_RESCANNED flag.
      Signed-off-by: 's avatarMark Fasheh <mfasheh@suse.de>
      6f2c873a
  8. 26 Jul, 2016 1 commit
  9. 22 Jul, 2016 1 commit
    • Mark Fasheh's avatar
      Only dedupe files whose mtime has changed · 7d6f5e67
      Mark Fasheh authored
      This involves a lot of moving parts. We make hashfiles reusable, allowing
      the user to re-run duperemove with the same hashes from a previous run.
      
      As a result, we have to manage the information in our db instead of just
      dumping data as we do now. In particular, our list of dedupe targets now
      comes from the database as well as the command line. In order to avoid extra
      stats, we handle the command line files first, then walk the files in our
      database. It's at this time that we can compare mtime in our filerecs
      agains that of the database.
      
      Lastly, at the find-dupes and dedupe stages, we avoid comparing/deduping
      against files that haven't changed, with the exception that we'll pick at
      least one target so there's always the ability to dedupe an extent.
      Signed-off-by: 's avatarMark Fasheh <mfasheh@suse.de>
      7d6f5e67
  10. 12 Jul, 2016 1 commit
  11. 10 Jul, 2016 2 commits
    • Mark Fasheh's avatar
      Dedupe within a file · c70ea27d
      Mark Fasheh authored
      block-dedupe already does this by default so we add code there to handle the
      opposite case to keep it in line with the way extent-dedupe works.
      
      Otherwise the change to extent-dedupe is small and untested.
      Signed-off-by: 's avatarMark Fasheh <mfasheh@suse.de>
      c70ea27d
    • Mark Fasheh's avatar
      Block dedupe to skip extent finding stage · f242479a
      Mark Fasheh authored
      We have a find-extents stage which is intended to collate duplicate blocks
      into extent lists. The idea is to reduce the total number of dedupe calls,
      and hopefully cut down on fragmentation.
      
      On workloads with large hash buckets this algorithm can sometimes take a
      long time. Block dedupe skips that stage and goes directly to the dedupe
      portion. We're still deduping at pretty large blocks, and the target code in
      dedupe_extent_list() will ensure requests go to the same block. So while we
      won't get things lined up as perfectly, it should (in thoery) still go ok.
      Signed-off-by: 's avatarMark Fasheh <mfasheh@suse.de>
      f242479a
  12. 03 Jul, 2016 1 commit
    • Mark Fasheh's avatar
      Create a file list for dupe compare · aeacc0d8
      Mark Fasheh authored
      Change walk_dupe_hashes to walk the dupe block list only once, gathering a
      lsit of files for compare. We then walk the file list, doing our file by
      file compare. Existing checks ensure that no two files are compared to each
      other more than once.
      
      This performs far better when we have a high amount of duplication, where
      the dupe buckets can easily grow into the many millions. In those cases the
      block-by-block walk/rewalk was doing enormous amounts of work so it makes
      sense to just build a list of files to compare directly.
      Signed-off-by: 's avatarMark Fasheh <mfasheh@suse.de>
      aeacc0d8
  13. 06 Jul, 2015 1 commit
  14. 01 Jul, 2015 1 commit
  15. 04 Jun, 2015 1 commit
  16. 04 May, 2015 1 commit
  17. 03 Apr, 2015 1 commit
  18. 24 Feb, 2015 1 commit
  19. 22 Nov, 2014 1 commit