1. 29 Sep, 2016 1 commit
  2. 16 Sep, 2016 2 commits
  3. 05 Aug, 2016 4 commits
  4. 14 Jul, 2016 1 commit
    • Mark Fasheh's avatar
      block-dedupe: break up large buckets at dedupe time · 6ec940a0
      Mark Fasheh authored
      We want block dedupe because very large hash buckets can cause the
      extent-finding code to take inordinate amounts of time. Passing the same 2+
      million element bucket to one thread in the dedupe pool simply pushes the
      problem into the dedupe layer. Our thread deduping the large bucket will
      often continue well after the other threads have shut down. Solve this by
      breaking up large list at dedupe time and spreading their elements out
      amongst all io threads.
      Signed-off-by: 's avatarMark Fasheh <mfasheh@suse.de>
      6ec940a0
  5. 10 Jul, 2016 1 commit
    • Mark Fasheh's avatar
      Block dedupe to skip extent finding stage · f242479a
      Mark Fasheh authored
      We have a find-extents stage which is intended to collate duplicate blocks
      into extent lists. The idea is to reduce the total number of dedupe calls,
      and hopefully cut down on fragmentation.
      
      On workloads with large hash buckets this algorithm can sometimes take a
      long time. Block dedupe skips that stage and goes directly to the dedupe
      portion. We're still deduping at pretty large blocks, and the target code in
      dedupe_extent_list() will ensure requests go to the same block. So while we
      won't get things lined up as perfectly, it should (in thoery) still go ok.
      Signed-off-by: 's avatarMark Fasheh <mfasheh@suse.de>
      f242479a
  6. 04 Jun, 2015 1 commit
  7. 04 May, 2015 1 commit
  8. 27 Feb, 2015 1 commit
  9. 24 Feb, 2015 1 commit
  10. 22 Nov, 2014 1 commit
  11. 16 Nov, 2014 1 commit
  12. 22 Oct, 2014 2 commits
    • Mark Fasheh's avatar
      Keep file blocks on per blocklist/filerec list · 5497f7e1
      Mark Fasheh authored
      We have a performance issue when visiting very large buckets - walking
      hundreds of thousands of hashes can take a lot of time. Previously
      this was solved by pushing large buckets to an alternative algorithm. This had
      a couple problems, the most obvious being how to define a 'very large
      bucket'. It also didn't attempt solve the underlying problem.
      
      A better solution is to have each dupe_blocks_list maintain a tree of
      per-filerec hash blocks. The extent finding code then can walk the
      per-filerec list of blocks, thus avoiding visiting any nodes that it
      won't care about.
      
      This is also much faster than the previous solution. The time for an
      extent search from hashfile of my home directory (~750 gigs) dropped
      from 55 minutes to 22 minutes.
      Signed-off-by: 's avatarMark Fasheh <mfasheh@suse.de>
      5497f7e1
    • Mark Fasheh's avatar
      Cleanup for_each_dupe · d69c775a
      Mark Fasheh authored
      No actual functionality is changed. for_each_dupe() is moved into
      duperemove.c with the rest of the extent finding code. The function callback
      pointer is removed in favor of just calling walk_dupe_block directly. As a
      result, we no longer need the dupe_walk_ctxt so that is unraveled. Naming of
      filerec and block variables in these functions are changed to be more
      consistent.
      
      This makes reading through the extent finding code far easier.
      Signed-off-by: 's avatarMark Fasheh <mfasheh@suse.de>
      d69c775a
  13. 25 Sep, 2014 3 commits
  14. 04 Sep, 2014 2 commits
  15. 21 Aug, 2014 1 commit
  16. 04 Aug, 2014 1 commit
  17. 01 Aug, 2014 1 commit
  18. 24 Apr, 2014 1 commit
  19. 11 Apr, 2014 1 commit
  20. 24 Apr, 2013 2 commits
  21. 18 Apr, 2013 1 commit
  22. 16 Apr, 2013 1 commit