jdupes.1 10.4 KB
Newer Older
1
.TH JDUPES 1
2 3 4 5 6 7 8 9 10
.\" NAME should be all caps, SECTION should be 1-8, maybe w/ subsection
.\" other parms are allowed: see man(7), man(1)
.SH NAME
jdupes \- finds and performs actions upon duplicate files
.SH SYNOPSIS
.B jdupes
[
.I options
]
11
.I FILES and/or DIRECTORIES
12 13 14 15 16
\|.\|.\|.

.SH "DESCRIPTION"
Searches the given path(s) for duplicate files. Such files are found by
comparing file sizes, then partial and full file hashes, followed by a
17 18 19
byte-by-byte comparison. The default behavior with no other "action
options" specified (delete, summarize, link, dedupe, etc.) is to print
sets of matching files.
20 21 22

.SH OPTIONS
.TP
23 24 25
.B -@ --loud
output annoying low-level debug info while running
.TP
26
.B -0 --printnull
27 28 29 30
when printing matches, use null bytes instead of CR/LF bytes, just
like 'find -print0' does. This has no effect with any action mode other
than the default "print matches" (delete, link, etc. will still print
normal line endings in the output.)
31
.TP
32 33 34
.B -1 --one-file-system
do not match files that are on different filesystems or devices
.TP
35 36 37 38 39 40 41 42
.B -A --nohidden
exclude hidden files from consideration
.TP
.B -B --dedupe
issue the btrfs same-extents ioctl to trigger a deduplication on
disk. The program must be built with btrfs support for this option
to be available
.TP
43 44 45 46 47
.B -C --chunksize=\fIBYTES\fR
set the I/O chunk size manually; larger values may improve performance
on rotating media by reducing the number of head seeks required, but
also increases memory usage and can reduce performance in some cases
.TP
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
.B -D --debug
if this feature is compiled in, show debugging statistics and info
at the end of program execution
.TP
.B -d --delete
prompt user for files to preserve, deleting all others (see
.B CAVEATS
below)
.TP
.B -f --omitfirst
omit the first file in each set of matches
.TP
.B -H --hardlinks
normally, when two or more files point to the same disk area they are
treated as non-duplicates; this option will change this behavior
.TP
.B -h --help
displays help
.TP
.B -i --reverse
reverse (invert) the sort order of matches
.TP
.B -I --isolate
isolate each command-line parameter from one another; only match if the
files are under different parameter specifications
.TP
74
.B -L --linkhard
75 76 77 78
replace all duplicate files with hardlinks to the first file in each set
of duplicates
.TP
.B -m --summarize
79 80 81 82
summarize duplicate file information
.TP
.B -M --printwithsummary
print matches and summarize the duplicate file information at the end
83 84 85 86 87 88
.TP
.B -N --noprompt
when used together with \-\-delete, preserve the first file in each set of
duplicates and delete the others without prompting the user
.TP
.B -n --noempty
89 90
exclude zero-length files from consideration; this option is the default
behavior and does nothing (also see \fB\-z/--zeromatch\fP)
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105
.TP
.B -O --paramorder
parameter order preservation is more important than the chosen sort; this
is particularly useful with the \fB\-N\fP option to ensure that automatic
deletion behaves in a controllable way
.TP
.B -o --order\fR=\fIWORD\fR
order files according to WORD:
time - sort by modification time
name - sort by filename (default)
.TP
.B -p --permissions
don't consider files with different owner/group or permission bits as
duplicates
.TP
106 107 108 109 110 111
.B -P --print=type
print extra information to stdout; valid options are:
early - matches that pass early size/permission/link/etc. checks
partial - files whose partial hashes match
fullhash - files whose full hashes match
.TP
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
.B -Q --quick
.B [WARNING: RISK OF DATA LOSS, SEE CAVEATS]
skip byte-for-byte verification of duplicate pairs (use hashes only)
.TP
.B -q --quiet
hide progress indicator
.TP
.B -R --recurse:
for each directory given after this option follow subdirectories
encountered within (note the ':' at the end of option; see the
Examples section below for further explanation)
.TP
.B -r --recurse
for every directory given follow subdirectories encountered within
.TP
127
.B -l --linksoft
128 129 130
replace all duplicate files with symlinks to the first file in each set
of duplicates
.TP
131 132 133 134 135 136
.B -S --size
show size of duplicate files
.TP
.B -s --symlinks
follow symlinked directories
.TP
137 138 139 140
.B -T --partial-only
.B [WARNING: EXTREME RISK OF DATA LOSS, SEE CAVEATS]
match based on hash of first block of file data, ignoring the rest
.TP
141 142 143
.B -v --version
display jdupes version and compilation feature flags
.TP
144
.B -x --xsize=[+]SIZE (NOTE: deprecated in favor of \-X)
145 146 147
exclude files of size less than SIZE from consideration, or if SIZE is
prefixed with a '+' i.e.
jdupes -x +226 [files]
148 149 150 151
then exclude files larger than SIZE. Suffixes K/M/G can be used.
.TP
.B -X --exclude=spec:info
exclude files based on specified criteria; supported specs are:
152
.RS
153 154 155
.IP `size[+-=]:number[suffix]'
Match only if size is greater (+), less than (-), or equal to (=) the
specified number, with an optional multiplier suffix. The +/- and =
156
specifiers can be combined; ex :"size+=:4K" will match if size is greater
157 158 159 160
than or equal to four kilobytes (4096 bytes). Suffixes supported are
K/M/G/T/P/E with a B or iB extension (all case-insensitive); no extension
or an IB extension specify binary multipliers while a B extension
specifies decimal multipliers (ex: 4K or 4KiB = 4096, 4KB = 4000.)
161 162
.RE
.TP
163 164 165 166
.B -z --zeromatch
consider zero-length files to be duplicates; this replaces the old
default behavior when \fB\-n\fP was not specified
.TP
167 168 169 170 171 172 173 174 175 176 177 178
.B -Z --softabort
if the user aborts the program (as with CTRL-C) act on the matches that
were found before the abort was received. For example, if -L and -Z are
specified, all matches found prior to the abort will be hard linked. The
default behavior without -Z is to abort without taking any actions.

.SH NOTES
A set of arrows are used in hard linking to show what action was taken on
each link candidate. These arrows are as follows:

.TP
.B ---->
179
This file was successfully hard linked to the first file in the duplicate
180 181
chain
.TP
182 183 184
.B -@@->
This file was successfully symlinked to the first file in the chain
.TP
185 186 187 188
.B -==->
This file was already a hard link to the first file in the chain
.TP
.B -//->
189
Linking this file failed due to an error during the linking process
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206

.PP
Duplicate files are listed together in groups with each file displayed on a
separate line. The groups are then separated from each other by blank lines.

.SH EXAMPLES
.TP
.B jdupes a --recurse: b
will follow subdirectories under b, but not those under a.
.TP
.B jdupes a --recurse b
will follow subdirectories under both a and b.
.TP
.B jdupes -O dir1 dir3 dir2
will always place 'dir1' results first in any match set (where relevant)

.SH CAVEATS
207 208 209 210 211 212 213 214 215

Using
.B \-1
or
.BR \-\-one\-file\-system
prevents matches that cross filesystems, but a more relaxed form of this
option may be added that allows cross-matching for all filesystems that
each parameter is present on.

216 217 218 219 220 221 222 223 224 225 226
When using
.B \-d
or
.BR \-\-delete ,
care should be taken to insure against accidental data loss.

.B \-Z
or
.BR \-\-softabort
used to be --hardabort in jdupes prior to v1.5 and had the opposite behavior.
Defaulting to taking action on abort is probably not what most users would
227
expect. The decision to invert rather than reassign to a different option
228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261
was made because this feature was still fairly new at the time of the change.

The
.B \-O
or
.BR \-\-paramorder
option allows the user greater control over what appears in the first
position of a match set, specifically for keeping the \fB\-N\fP option
from deleting all but one file in a set in a seemingly random way. All
directories specified on the command line will be used as the sorting
order of result sets first, followed by the sorting algorithm set by
the \fB\-o\fP or \fB\-\-order\fP option. This means that the order of
all match pairs for a single directory specification will retain the
old sorting behavior even if this option is specified.

When used together with options
.B \-s
or
.BR \-\-symlink ,
a user could accidentally preserve a symlink while deleting the file it
points to.

The
.B \-Q
or
.BR \-\-quick
option only reads each file once, hashes it, and performs comparisons
based solely on the hashes. There is a small but significant risk of a
hash collision which is the purpose of the failsafe byte-for-byte
comparison that this option explicitly bypasses. Do not use it on ANY data
set for which any amount of data loss is unacceptable. This option is not
included in the help text for the program due to its risky nature.
.B You have been warned!

262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284
The
.B \-T
or
.BR \-\-partial\-only
option produces results based on a hash of the first block of file data
in each file, ignoring everything else in the file. Partial hash checks
have always been an important exclusion step in the jdupes algorithm,
usually hashing the first 4096 bytes of data and allowing files that are
different at the start to be rejected early. In certain scenarios it may
be a useful heuristic for a user to see that a set of files has the same
size and the same starting data, even if the remaining data does not
match; one example of this would be comparing files with data blocks that
are damaged or missing such as an incomplete file transfer or checking a
data recovery against known-good copies to see what damaged data can be
deleted in favor of restoring the known-good copy. This option is meant
to be used with informational actions and
.B can result in EXTREME DATA LOSS
if used with options that delete files, create hard links, or perform
other destructive actions on data based on the matching output. Because
of the potential for massive data destruction,
.B this option MUST BE SPECIFIED TWICE
to take effect and will error out if it is only specified once.

285 286 287 288 289 290 291 292 293 294 295 296 297 298 299
Using the
.B \-C
or
.BR \-\-chunksize
option to override I/O chunk size can increase performance on rotating
storage media by reducing "head thrashing," reading larger amounts of
data sequentially from each file. This tunable size can have bad side
effects; the default size maximizes algorithmic performance without
regard to the I/O characteristics of any given device and uses a modest
amount of memory, but other values may greatly increase memory usage or
incur a lot more system call overhead. Try several different values to
see how they affect performance for your hardware and data set. This
option does not affect match results in any way, so even if it slows
down the file matching process it will not hurt anything.

300 301 302 303 304 305 306 307 308
.SH REPORTING BUGS
Send all bug reports to jody@jodybruchon.com or use the Issue tracker at
http://github.com/jbruchon/jdupes/issues

.SH AUTHOR
jdupes is a fork of 'fdupes' which is maintained by and contains
extra code copyrighted by Jody Bruchon <jody@jodybruchon.com>

Based on 'fdupes' created by Adrian Lopez <adrian2@caribe.net>