...
 
Commits (6)
......@@ -20,6 +20,7 @@
/build_date.h
/jdupes
/jdupes*.exe
/jdupes-standalone
/*.pkg.tar.xz
test_temp
output.log
......
jdupes 1.12
- Small reductions in memory usage
- Add "standalone" jdupes C file which has no external requirements
- Add ability to toggle -Z with a USR1 signal (not available on Windows)
- Add -t/-no-tocttou option to disable file change safety checks
jdupes 1.11.1
- Disable build date embedding by default to make reproducible builds easier
......
......@@ -64,3 +64,20 @@ all. To use it, type:
./compare_jdupes.sh [options]
A stand-alone version of jdupes that consolidates most of the program's
functionality into a single C file is included with this source code. Major
differences include reduction or elimination of some text strings, using an
embedded 32-bit jody_hash implementation instead of relying on xxHash64,
removal of all DEBUG/LOUD and Windows support code, replacement of fancy
numeric sorting with the faster but naive strcmp() sort method, and other
minor adjustments and consolidations appropriate for single-file compilation.
This version of the program is suitable for inclusion in "Swiss army knife"
projects such as BusyBox and Toybox.
The standalone version is not meant to work on Windows; it has all of the
quirks for Windows support stripped out and there's no real advantage to
using it on Windows anyway. However, if you need added stress in your life
and you understand that this is NOT SUPPORTED and YOU'RE 100% ON YOUR OWN,
you can compile it with this make command and it'll even partially work:
make standalone NO_UNICODE=1 CFLAGS_EXTRA='-DNO_PERMS -DNO_SYMLINKS -DNO_HARDLINKS'
......@@ -45,6 +45,7 @@ MAN_EXT = 1
INSTALL = install # install : UCB/GNU Install compatiable
#INSTALL = ginstall
RM = rm -f
RMDIR = rmdir -p
MKDIR = mkdir -p
#MKDIR = mkdirhier
#MKDIR = mkinstalldirs
......@@ -125,6 +126,8 @@ OBJS += act_deletefiles.o act_linkfiles.o act_printmatches.o act_summarize.o
OBJS += xxhash.o
OBJS += $(ADDITIONAL_OBJECTS)
OBJS_CLEAN += jdupes-standalone
all: $(PROGRAM_NAME)
$(PROGRAM_NAME): $(OBJS)
......@@ -133,6 +136,8 @@ $(PROGRAM_NAME): $(OBJS)
winres.o : winres.rc winres.manifest.xml
windres winres.rc winres.o
standalone: jdupes-standalone
installdirs:
test -e $(DESTDIR)$(BIN_DIR) || $(MKDIR) $(DESTDIR)$(BIN_DIR)
test -e $(DESTDIR)$(MAN_DIR) || $(MKDIR) $(DESTDIR)$(MAN_DIR)
......@@ -141,6 +146,14 @@ install: $(PROGRAM_NAME) installdirs
$(INSTALL_PROGRAM) $(PROGRAM_NAME) $(DESTDIR)$(BIN_DIR)/$(PROGRAM_NAME)
$(INSTALL_DATA) $(PROGRAM_NAME).1 $(DESTDIR)$(MAN_DIR)/$(PROGRAM_NAME).$(MAN_EXT)
uninstalldirs:
-test -e $(DESTDIR)$(BIN_DIR) && $(RMDIR) $(DESTDIR)$(BIN_DIR)
-test -e $(DESTDIR)$(MAN_DIR) && $(RMDIR) $(DESTDIR)$(MAN_DIR)
uninstall: uninstalldirs
$(RM) $(DESTDIR)$(BIN_DIR)/$(PROGRAM_NAME)
$(RM) $(DESTDIR)$(MAN_DIR)/$(PROGRAM_NAME).$(MAN_EXT)
test:
./test.sh
......
......@@ -16,7 +16,7 @@ Why use jdupes instead of the original fdupes or other duplicate finders?
The biggest reason is raw speed. In testing on various data sets, jdupes is
over 7 times faster than fdupes-1.51 on average.
jdupes provides a native Wndows port. Most duplicate scanners built on
jdupes provides a native Windows port. Most duplicate scanners built on
Linux and other UNIX-like systems do not compile for Windows out-of-the-box
and even if they do, they don't support Unicode and other Windows-specific
quirks and features.
......@@ -118,6 +118,7 @@ option is specified (delete, summarize, link, dedupe, etc.)
the end of the option, manpage for more details)
-s --symlinks follow symlinks
-S --size show size of duplicate files
-t --no-tocttou disable security check for file changes (aka TOCTTOU)
-T --partial-only match based on partial hashes only. WARNING:
EXTREMELY DANGEROUS paired with destructive actions!
-T must be specified twice to work. Read the manual!
......@@ -129,14 +130,32 @@ option is specified (delete, summarize, link, dedupe, etc.)
Exclusions are cumulative: -X dir:abc -X dir:efg
-z --zeromatch consider zero-length files to be duplicates
-Z --softabort If the user aborts (i.e. CTRL-C) act on matches so far
You can send SIGUSR1 to the program to toggle this
For sizes, K/M/G/T/P/E[B|iB] suffixes can be used (case-insensitive)
The -t/-no-tocttou option disables checks for file changes during and after
scanning. This opens a security vulnerability that is called a TOCTTOU (time
of check to time of use) vulnerability. The program normally runs checks
immediately before scanning or taking action upon a file to see if the file
has changed in some way since it was last checked. With this option enabled,
the program will not run any of these checks, making the algorithm slightly
faster, but also increasing the risk that the program scans a file, the file
is changed after the scan, and the program still acts like the file was in
its previous state. This is particularly dangerous when considering actions
such as linking and deleting. In the most extreme case, a file could be
deleted during scanning but match other files prior to that deletion; if the
file is the first in the list of duplicates and auto-delete is used, all of
the remaining matched files will be deleted as well. This option was added
due to user reports of some filesystems (particularly network filesystems)
changing the reported file information inappropriately, rendering the entire
program unusable on such filesystems.
The -n/--noempty option was removed for safety. Matching zero-length files as
duplicates now requires explicit use of the -z/--zeromatch option instead.
Duplicate files are listed together in groups with each file displayed on a
Separate line. The groups are then separated from each other by blank lines.
separate line. The groups are then separated from each other by blank lines.
The -s/--symlinks option will treat symlinked files as regular files, but
direct symlinks will be treated as if they are hard linked files and the
......@@ -162,6 +181,17 @@ most users would expect. The decision to invert rather than reassign to a
different option was made because this feature was still fairly new at the
time of the change.
On non-Windows platforms that support SIGUSR1, you can toggle the state of
the -Z option by sending a SIGUSR1 to the program. This is handy if you want
to abort jdupes, didn't specify -Z, and changed your mind and don't want to
lose all the work that was done so far. Just do 'killall -USR1 jdupes' and
you wll be able to abort with -Z. This works in reverse: if you want to
prevent a -Z from happening, a SIGUSR1 will toggle it back off. That's a lot
less useful because you can just stop and kill the program to get the same
effect, but it's there if you want it for some reason. Sending the signal
twice while the program is stopped will behave as if it was only sent once,
as per normal POSIX signal behavior.
The -O or --paramorder option allows the user greater control over wha
appears in the first position of a match set, specifically for keeping the -N
option from deleting all but one file in a set in a seemingly random way. All
......
......@@ -24,17 +24,15 @@ static const char *readonly_msg[] = {
};
static char *dedupeerrstr(int err) {
static char buf[256];
buf[sizeof(buf)-1] = '\0';
tempname[sizeof(tempname)-1] = '\0';
if (err == BTRFS_SAME_DATA_DIFFERS) {
snprintf(buf, sizeof(buf), "BTRFS_SAME_DATA_DIFFERS (data modified in the meantime?)");
return buf;
snprintf(tempname, sizeof(tempname), "BTRFS_SAME_DATA_DIFFERS (data modified in the meantime?)");
return tempname;
} else if (err < 0) {
return strerror(-err);
} else {
snprintf(buf, sizeof(buf), "Unknown error %d", err);
return buf;
snprintf(tempname, sizeof(tempname), "Unknown error %d", err);
return tempname;
}
}
......
......@@ -10,6 +10,9 @@
#include "jody_win_unicode.h"
#include "act_deletefiles.h"
/* For interactive deletion input */
#define INPUT_SIZE 512
extern void deletefiles(file_t *files, int prompt, FILE *tty)
{
unsigned int counter, groups;
......
......@@ -31,7 +31,6 @@ extern void linkfiles(file_t *files, const int hard)
static unsigned int symsrc;
static char rel_path[PATHBUF_SIZE];
#endif
static char temp_path[PATHBUF_SIZE];
LOUD(fprintf(stderr, "Running linkfiles(%d)\n", hard);)
curfile = files;
......@@ -188,17 +187,17 @@ extern void linkfiles(file_t *files, const int hard)
name_len = strlen(dupelist[x]->d_name) + 14;
if (name_len > PATHBUF_SIZE) continue;
/* Assemble a temporary file name */
strcpy(temp_path, dupelist[x]->d_name);
strcat(temp_path, ".__jdupes__.tmp");
strcpy(tempname, dupelist[x]->d_name);
strcat(tempname, ".__jdupes__.tmp");
/* Rename the source file to the temporary name */
#ifdef UNICODE
if (!M2W(temp_path, wname2)) {
if (!M2W(tempname, wname2)) {
fprintf(stderr, "error: MultiByteToWideChar failed: "); fwprint(stderr, srcfile->d_name, 1);
continue;
}
i = MoveFileW(wname, wname2) ? 0 : 1;
#else
i = rename(dupelist[x]->d_name, temp_path);
i = rename(dupelist[x]->d_name, tempname);
#endif
if (i != 0) {
fprintf(stderr, "warning: cannot move link target to a temporary name, not linking:\n-//-> ");
......@@ -207,7 +206,7 @@ extern void linkfiles(file_t *files, const int hard)
#ifdef UNICODE
MoveFileW(wname2, wname);
#else
rename(temp_path, dupelist[x]->d_name);
rename(tempname, dupelist[x]->d_name);
#endif
continue;
}
......@@ -256,37 +255,37 @@ extern void linkfiles(file_t *files, const int hard)
fprintf(stderr, "' -> '"); fwprint(stderr, srcfile->d_name, 0);
fprintf(stderr, "': %s\n", strerror(errno));
#ifdef UNICODE
if (!M2W(temp_path, wname2)) {
fprintf(stderr, "error: MultiByteToWideChar failed: "); fwprint(stderr, temp_path, 1);
if (!M2W(tempname, wname2)) {
fprintf(stderr, "error: MultiByteToWideChar failed: "); fwprint(stderr, tempname, 1);
continue;
}
i = MoveFileW(wname2, wname) ? 0 : 1;
#else
i = rename(temp_path, dupelist[x]->d_name);
i = rename(tempname, dupelist[x]->d_name);
#endif
if (i != 0) {
fprintf(stderr, "error: cannot rename temp file back to original\n");
fprintf(stderr, "original: "); fwprint(stderr, dupelist[x]->d_name, 1);
fprintf(stderr, "current: "); fwprint(stderr, temp_path, 1);
fprintf(stderr, "current: "); fwprint(stderr, tempname, 1);
}
continue;
}
/* Remove temporary file to clean up; if we can't, reverse the linking */
#ifdef UNICODE
if (!M2W(temp_path, wname2)) {
fprintf(stderr, "error: MultiByteToWideChar failed: "); fwprint(stderr, temp_path, 1);
if (!M2W(tempname, wname2)) {
fprintf(stderr, "error: MultiByteToWideChar failed: "); fwprint(stderr, tempname, 1);
continue;
}
i = DeleteFileW(wname2) ? 0 : 1;
#else
i = remove(temp_path);
i = remove(tempname);
#endif
if (i != 0) {
/* If the temp file can't be deleted, there may be a permissions problem
* so reverse the process and warn the user */
fprintf(stderr, "\nwarning: can't delete temp file, reverting: ");
fwprint(stderr, temp_path, 1);
fwprint(stderr, tempname, 1);
#ifdef UNICODE
i = DeleteFileW(wname) ? 0 : 1;
#else
......@@ -298,12 +297,12 @@ extern void linkfiles(file_t *files, const int hard)
#ifdef UNICODE
i = MoveFileW(wname2, wname) ? 0 : 1;
#else
i = rename(temp_path, dupelist[x]->d_name);
i = rename(tempname, dupelist[x]->d_name);
#endif
if (i != 0) {
fprintf(stderr, "\nwarning: couldn't revert the file to its original name\n");
fprintf(stderr, "original: "); fwprint(stderr, dupelist[x]->d_name, 1);
fprintf(stderr, "current: "); fwprint(stderr, temp_path, 1);
fprintf(stderr, "current: "); fwprint(stderr, tempname, 1);
}
}
}
......
jdupes (1.12-1) unstable; urgency=medium
* New upstream version 1.12.
* Using new DH level format. Consequently:
- debian/compat: removed.
- debian/control: changed from 'debhelper' to 'debhelper-compat' in
Build-Depends field and bumped level to 12.
* debian/control: bumped Standards-Version to 4.3.0.
* debian/copyright: updated upstream and packaging copyright years.
-- Joao Eriberto Mota Filho <eriberto@debian.org> Tue, 26 Feb 2019 11:42:47 -0300
jdupes (1.11.1-2) unstable; urgency=medium
* debian/tests/control: added a new test.
......
......@@ -2,8 +2,8 @@ Source: jdupes
Section: utils
Priority: optional
Maintainer: Joao Eriberto Mota Filho <eriberto@debian.org>
Build-Depends: debhelper (>= 11)
Standards-Version: 4.2.1
Build-Depends: debhelper-compat (= 12)
Standards-Version: 4.3.0
Homepage: https://github.com/jbruchon/jdupes
Vcs-Browser: https://salsa.debian.org/debian/jdupes
Vcs-Git: https://salsa.debian.org/debian/jdupes.git
......
......@@ -5,7 +5,7 @@ Source: https://github.com/jbruchon/jdupes
Files: *
Copyright: 1999-2018 Adrian Lopez <adrian2@caribe.net>
2014-2018 Jody Lee Bruchon <jody@jodybruchon.com>
2014-2019 Jody Lee Bruchon <jody@jodybruchon.com>
License: MIT
Comment: jdupes is based in fdupes. Adrian is the fdupes upstream.
......@@ -15,7 +15,7 @@ Copyright: 2012-2016 Yann Collet <cyan@fb.com>
License: BSD-2-Clause
Files: debian/*
Copyright: 2016-2018 Joao Eriberto Mota Filho <eriberto@debian.org>
Copyright: 2016-2019 Joao Eriberto Mota Filho <eriberto@debian.org>
License: MIT
License: MIT
......
/* jdupes (C) 2015-2018 Jody Bruchon <jody@jodybruchon.com>
Derived from fdupes (C) 1999-2018 Adrian Lopez
Permission is hereby granted, free of charge, to any person
obtaining a copy of this software and associated documentation files
(the "Software"), to deal in the Software without restriction,
including without limitation the rights to use, copy, modify, merge,
publish, distribute, sublicense, and/or sell copies of the Software,
and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */
#include <dirent.h>
#include <errno.h>
#include <fcntl.h>
#include <getopt.h>
#include <inttypes.h>
#include <libgen.h>
#include <limits.h>
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <signal.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
/* Optional btrfs support */
#ifdef ENABLE_BTRFS
#include <linux/btrfs.h>
#include <sys/ioctl.h>
#include <sys/utsname.h>
#endif
#define JODY_HASH_WIDTH 32
typedef uint32_t jodyhash_t;
/* Set hash type (change this if swapping in a different hash function) */
typedef jodyhash_t jdupes_hash_t;
typedef ino_t jdupes_ino_t;
typedef mode_t jdupes_mode_t;
#define ISFLAG(a,b) ((a & b) == b)
#define SETFLAG(a,b) (a |= b)
#define CLEARFLAG(a,b) (a &= (~b))
/* Behavior modification flags */
#define F_RECURSE 0x00000001U
#define F_HIDEPROGRESS 0x00000002U
#define F_SOFTABORT 0x00000004U
#define F_FOLLOWLINKS 0x00000008U
#define F_DELETEFILES 0x00000010U
#define F_INCLUDEEMPTY 0x00000020U
#define F_CONSIDERHARDLINKS 0x00000040U
#define F_SHOWSIZE 0x00000080U
#define F_OMITFIRST 0x00000100U
#define F_RECURSEAFTER 0x00000200U
#define F_NOPROMPT 0x00000400U
#define F_SUMMARIZEMATCHES 0x00000800U
#define F_EXCLUDEHIDDEN 0x00001000U
#define F_PERMISSIONS 0x00002000U
#define F_HARDLINKFILES 0x00004000U
#define F_EXCLUDESIZE 0x00008000U
#define F_QUICKCOMPARE 0x00010000U
#define F_USEPARAMORDER 0x00020000U
#define F_DEDUPEFILES 0x00040000U
#define F_REVERSESORT 0x00080000U
#define F_ISOLATE 0x00100000U
#define F_MAKESYMLINKS 0x00200000U
#define F_PRINTMATCHES 0x00400000U
#define F_ONEFS 0x00800000U
#define F_PRINTNULL 0x01000000U
#define F_PARTIALONLY 0x02000000U
#define F_NO_TOCTTOU 0x04000000U
/* Per-file true/false flags */
#define F_VALID_STAT 0x00000001U
#define F_HASH_PARTIAL 0x00000002U
#define F_HASH_FULL 0x00000004U
#define F_HAS_DUPES 0x00000008U
#define F_IS_SYMLINK 0x00000010U
/* Extra print flags */
#define P_PARTIAL 0x00000001U
#define P_EARLYMATCH 0x00000002U
#define P_FULLHASH 0x00000004U
typedef enum {
ORDER_NAME = 0,
ORDER_TIME
} ordertype_t;
/* For interactive deletion input */
#define INPUT_SIZE 512
/* Per-file information */
typedef struct _file {
struct _file *duplicates;
struct _file *next;
char *d_name;
dev_t device;
jdupes_mode_t mode;
off_t size;
jdupes_ino_t inode;
jdupes_hash_t filehash_partial;
jdupes_hash_t filehash;
time_t mtime;
uint32_t flags; /* Status flags */
#ifndef NO_USER_ORDER
unsigned int user_order; /* Order of the originating command-line parameter */
#endif
#ifndef NO_HARDLINKS
nlink_t nlink;
#endif
#ifndef NO_PERMS
uid_t uid;
gid_t gid;
#endif
} file_t;
typedef struct _filetree {
file_t *file;
struct _filetree *left;
struct _filetree *right;
} filetree_t;
/* -X exclusion parameter stack */
struct exclude {
struct exclude *next;
unsigned int flags;
int64_t size;
char param[];
};
/* Exclude parameter flags */
#define X_DIR 0x00000001U
#define X_SIZE_EQ 0x00000002U
#define X_SIZE_GT 0x00000004U
#define X_SIZE_LT 0x00000008U
/* The X-than-or-equal are combination flags */
#define X_SIZE_GTEQ 0x00000006U
#define X_SIZE_LTEQ 0x0000000aU
/* Size specifier flags */
#define XX_EXCL_SIZE 0x0000000eU
/* Flags that use numeric offset instead of a string */
#define XX_EXCL_OFFSET 0x0000000eU
/* Flags that require a data parameter */
#define XX_EXCL_DATA 0x0000000fU
/* Exclude definition array */
struct exclude_tags {
const char * const tag;
const uint32_t flags;
};
/* Suffix definitions (treat as case-insensitive) */
struct size_suffix {
const char * const suffix;
const int64_t multiplier;
};
const char *FILE_MODE_RO = "rb";
const char dir_sep = '/';
/* Behavior modification flags */
static uint_fast32_t flags = 0, p_flags = 0;
static const char *program_name;
/* This gets used in many functions */
static struct stat s;
/* Larger chunk size makes large files process faster but uses more RAM */
#ifndef CHUNK_SIZE
#define CHUNK_SIZE 32768
#endif
#define PARTIAL_HASH_SIZE 4096
/* Maximum path buffer size to use; must be large enough for a path plus
* any work that might be done to the array it's stored in. PATH_MAX is
* not always true. Read this article on the false promises of PATH_MAX:
* http://insanecoding.blogspot.com/2007/11/pathmax-simply-isnt.html
*/
#define PATHBUF_SIZE 4096
/* Size suffixes - this gets exported */
static const struct size_suffix size_suffix[] = {
/* Byte (someone may actually try to use this) */
{ "b", 1 },
{ "k", 1024 },
{ "kib", 1024 },
{ "m", 1048576 },
{ "mib", 1048576 },
{ "g", (uint64_t)1048576 * 1024 },
{ "gib", (uint64_t)1048576 * 1024 },
{ "t", (uint64_t)1048576 * 1048576 },
{ "tib", (uint64_t)1048576 * 1048576 },
{ "p", (uint64_t)1048576 * 1048576 * 1024},
{ "pib", (uint64_t)1048576 * 1048576 * 1024},
{ "e", (uint64_t)1048576 * 1048576 * 1048576},
{ "eib", (uint64_t)1048576 * 1048576 * 1048576},
/* Decimal suffixes */
{ "kb", 1000 },
{ "mb", 1000000 },
{ "gb", 1000000000 },
{ "tb", 1000000000000 },
{ "pb", 1000000000000000 },
{ "eb", 1000000000000000000 },
{ NULL, 0 },
};
/* Tree to track each directory traversed */
struct travdone {
struct travdone *left;
struct travdone *right;
jdupes_ino_t inode;
dev_t device;
};
static struct travdone *travdone_head = NULL;
/* Exclusion tree head and static tag list */
struct exclude *exclude_head = NULL;
static const struct exclude_tags exclude_tags[] = {
{ "dir", X_DIR },
{ "size+", X_SIZE_GT },
{ "size+=", X_SIZE_GTEQ },
{ "size-=", X_SIZE_LTEQ },
{ "size-", X_SIZE_LT },
{ "size=", X_SIZE_EQ },
{ NULL, 0 },
};
/* Required for progress indicator code */
static uintmax_t filecount = 0;
static uintmax_t progress = 0, item_progress = 0, dupecount = 0;
/* Number of read loops before checking progress indicator */
#define CHECK_MINIMUM 256
/* File tree head */
static filetree_t *checktree = NULL;
/* Directory/file parameter position counter */
static unsigned int user_item_count = 1;
/* registerfile() direction options */
enum tree_direction { NONE, LEFT, RIGHT };
/* Sort order reversal */
static int sort_direction = 1;
/* Signal handler */
static int interrupt = 0;
/* Progress indicator time */
struct timeval time1, time2;
/* for temporary path mangling */
static char tempname[PATHBUF_SIZE * 2];
/***** End definitions, begin code *****/
/* Catch CTRL-C and either notify or terminate */
void sighandler(const int signum)
{
(void)signum;
if (interrupt || !ISFLAG(flags, F_SOFTABORT)) {
fprintf(stderr, "\n");
exit(EXIT_FAILURE);
}
interrupt = 1;
return;
}
void sigusr1(const int signum)
{
(void)signum;
if (!ISFLAG(flags, F_SOFTABORT)) SETFLAG(flags, F_SOFTABORT);
else CLEARFLAG(flags, F_SOFTABORT);
return;
}
/* Out of memory */
static void oom(const char * const restrict msg)
{
fprintf(stderr, "\nout of memory: %s\n", msg);
exit(EXIT_FAILURE);
}
/* Null pointer failure */
static void nullptr(const char * restrict func)
{
static const char n[] = "(NULL)";
if (func == NULL) func = n;
fprintf(stderr, "\ninternal error: NULL pointer passed to %s\n", func);
exit(EXIT_FAILURE);
}
/* Jody Bruchon's fast hashing function
* Copyright (C) 2014-2017 by Jody Bruchon <jody@jodybruchon.com>
* Released under The MIT License
*/
#define JODY_HASH_SHIFT 14
#define JODY_HASH_CONSTANT 0x1f3d5b79U
static const jodyhash_t tail_mask[] = {
0x00000000,
0x000000ff,
0x0000ffff,
0x00ffffff,
0xffffffff,
};
static jodyhash_t jody_block_hash(const jodyhash_t * restrict data,
const jodyhash_t start_hash, const size_t count)
{
jodyhash_t hash = start_hash;
jodyhash_t element;
jodyhash_t partial_salt;
size_t len;
/* Don't bother trying to hash a zero-length block */
if (count == 0) return hash;
len = count / sizeof(jodyhash_t);
for (; len > 0; len--) {
element = *data;
hash += element;
hash += JODY_HASH_CONSTANT;
hash = (hash << JODY_HASH_SHIFT) | hash >> (sizeof(jodyhash_t) * 8 - JODY_HASH_SHIFT); /* bit rotate left */
hash ^= element;
hash = (hash << JODY_HASH_SHIFT) | hash >> (sizeof(jodyhash_t) * 8 - JODY_HASH_SHIFT);
hash ^= JODY_HASH_CONSTANT;
hash += element;
data++;
}
/* Handle data tail (for blocks indivisible by sizeof(jodyhash_t)) */
len = count & (sizeof(jodyhash_t) - 1);
if (len) {
partial_salt = JODY_HASH_CONSTANT & tail_mask[len];
element = *data & tail_mask[len];
hash += element;
hash += partial_salt;
hash = (hash << JODY_HASH_SHIFT) | hash >> (sizeof(jodyhash_t) * 8 - JODY_HASH_SHIFT);
hash ^= element;
hash = (hash << JODY_HASH_SHIFT) | hash >> (sizeof(jodyhash_t) * 8 - JODY_HASH_SHIFT);
hash ^= partial_salt;
hash += element;
}
return hash;
}
/* Compare two hashes like memcmp() */
#define HASH_COMPARE(a,b) ((a > b) ? 1:((a == b) ? 0:-1))
static inline char **cloneargs(const int argc, char **argv)
{
static int x;
static char **args;
args = (char **)malloc(sizeof(char *) * (unsigned int)argc);
if (args == NULL) oom("cloneargs() start");
for (x = 0; x < argc; x++) {
args[x] = (char *)malloc(strlen(argv[x]) + 1);
if (args[x] == NULL) oom("cloneargs() loop");
strcpy(args[x], argv[x]);
}
return args;
}
static int findarg(const char * const arg, const int start,
const int argc, char **argv)
{
int x;
for (x = start; x < argc; x++)
if (strcmp(argv[x], arg) == 0)
return x;
return x;
}
/* Find the first non-option argument after specified option. */
static int nonoptafter(const char *option, const int argc,
char **oldargv, char **newargv)
{
int x;
int targetind;
int testind;
int startat = 1;
targetind = findarg(option, 1, argc, oldargv);
for (x = optind; x < argc; x++) {
testind = findarg(newargv[x], startat, argc, oldargv);
if (testind > targetind) return x;
else startat = testind;
}
return x;
}
/* Update progress indicator if requested */
static void update_progress(const char * const restrict msg, const int file_percent)
{
static int did_fpct = 0;
/* The caller should be doing this anyway...but don't trust that they did */
if (ISFLAG(flags, F_HIDEPROGRESS)) return;
gettimeofday(&time2, NULL);
if (progress == 0 || time2.tv_sec > time1.tv_sec) {
fprintf(stderr, "\rProgress [%" PRIuMAX "/%" PRIuMAX ", %" PRIuMAX " pairs matched] %" PRIuMAX "%%",
progress, filecount, dupecount, (progress * 100) / filecount);
if (file_percent > -1 && msg != NULL) {
fprintf(stderr, " (%s: %d%%) ", msg, file_percent);
did_fpct = 1;
} else if (did_fpct != 0) {
fprintf(stderr, " ");
did_fpct = 0;
}
fflush(stderr);
}
time1.tv_sec = time2.tv_sec;
return;
}
/* Check file's stat() info to make sure nothing has changed
* Returns 1 if changed, 0 if not changed, negative if error */
static int file_has_changed(file_t * const restrict file)
{
/* If -t/--no-tocttou specified then completely bypass this code */
if (ISFLAG(flags, F_NO_TOCTTOU)) return 0;
if (file == NULL || file->d_name == NULL) nullptr("file_has_changed()");
if (!ISFLAG(file->flags, F_VALID_STAT)) return -66;
if (stat(file->d_name, &s) != 0) return -2;
if (file->inode != s.st_ino) return 1;
if (file->size != s.st_size) return 1;
if (file->device != s.st_dev) return 1;
if (file->mtime != s.st_mtime) return 1;
if (file->mode != s.st_mode) return 1;
#ifndef NO_PERMS
if (file->uid != s.st_uid) return 1;
if (file->gid != s.st_gid) return 1;
#endif
#ifndef NO_SYMLINKS
if (lstat(file->d_name, &s) != 0) return -3;
if ((S_ISLNK(s.st_mode) > 0) ^ ISFLAG(file->flags, F_IS_SYMLINK)) return 1;
#endif
return 0;
}
static inline int getfilestats(file_t * const restrict file)
{
if (file == NULL || file->d_name == NULL) nullptr("getfilestats()");
/* Don't stat the same file more than once */
if (ISFLAG(file->flags, F_VALID_STAT)) return 0;
SETFLAG(file->flags, F_VALID_STAT);
if (stat(file->d_name, &s) != 0) return -1;
file->inode = s.st_ino;
file->size = s.st_size;
file->device = s.st_dev;
file->mtime = s.st_mtime;
file->mode = s.st_mode;
#ifndef NO_HARDLINKS
file->nlink = s.st_nlink;
#endif
#ifndef NO_PERMS
file->uid = s.st_uid;
file->gid = s.st_gid;
#endif
#ifndef NO_SYMLINKS
if (lstat(file->d_name, &s) != 0) return -1;
if (S_ISLNK(s.st_mode) > 0) SETFLAG(file->flags, F_IS_SYMLINK);
#endif
return 0;
}
static void add_exclude(const char *option)
{
char *opt, *p;
struct exclude *excl = exclude_head;
const struct exclude_tags *tags = exclude_tags;
const struct size_suffix *ss = size_suffix;
if (option == NULL) nullptr("add_exclude()");
opt = malloc(strlen(option) + 1);
if (opt == NULL) oom("add_exclude option");
strcpy(opt, option);
p = opt;
while (*p != ':' && *p != '\0') p++;
/* Split tag string into *opt (tag) and *p (value) */
if (*p == ':') {
*p = '\0';
p++;
}
while (tags->tag != NULL && strcmp(tags->tag, opt) != 0) tags++;
if (tags->tag == NULL) goto bad_tag;
/* Check for a tag that requires a value */
if (tags->flags & XX_EXCL_DATA && *p == '\0') goto spec_missing;
/* *p is now at the value, NOT the tag string! */
if (exclude_head != NULL) {
/* Add to end of exclusion stack if head is present */
while (excl->next != NULL) excl = excl->next;
excl->next = malloc(sizeof(struct exclude) + strlen(p));
if (excl->next == NULL) oom("add_exclude alloc");
excl = excl->next;
} else {
/* Allocate exclude_head if no exclusions exist yet */
exclude_head = malloc(sizeof(struct exclude) + strlen(p));
if (exclude_head == NULL) oom("add_exclude alloc");
excl = exclude_head;
}
/* Set tag value from predefined tag array */
excl->flags = tags->flags;
/* Initialize the new exclude element */
excl->next = NULL;
if (excl->flags & XX_EXCL_OFFSET) {
/* Exclude uses a number; handle it with possible suffixes */
*(excl->param) = '\0';
/* Get base size */
if (*p < '0' || *p > '9') goto bad_size_suffix;
excl->size = strtoll(p, &p, 10);
/* Handle suffix, if any */
if (*p != '\0') {
while (ss->suffix != NULL && strcasecmp(ss->suffix, p) != 0) ss++;
if (ss->suffix == NULL) goto bad_size_suffix;
excl->size *= ss->multiplier;
}
} else {
/* Exclude uses string data; just copy it */
excl->size = 0;
strcpy(excl->param, p);
}
free(opt);
return;
spec_missing:
fprintf(stderr, "Exclude spec missing or invalid: -X spec:data\n");
exit(EXIT_FAILURE);
bad_tag:
fprintf(stderr, "Invalid exclusion tag was specified\n");
exit(EXIT_FAILURE);
bad_size_suffix:
fprintf(stderr, "Invalid -X size suffix specified; use B or KMGTPE[i][B]\n");
exit(EXIT_FAILURE);
}
static int getdirstats(const char * const restrict name,
jdupes_ino_t * const restrict inode, dev_t * const restrict dev,
jdupes_mode_t * const restrict mode)
{
if (name == NULL || inode == NULL || dev == NULL) nullptr("getdirstats");
if (stat(name, &s) != 0) return -1;
*inode = s.st_ino;
*dev = s.st_dev;
*mode = s.st_mode;
if (!S_ISDIR(s.st_mode)) return 1;
return 0;
}
/* Check a pair of files for match exclusion conditions
* Returns:
* 0 if all condition checks pass
* -1 or 1 on compare result less/more
* -2 on an absolute exclusion condition met
* 2 on an absolute match condition met */
static int check_conditions(const file_t * const restrict file1, const file_t * const restrict file2)
{
if (file1 == NULL || file2 == NULL || file1->d_name == NULL || file2->d_name == NULL) nullptr("check_conditions()");
#ifndef NO_USER_ORDER
/* Exclude based on -I/--isolate */
if (ISFLAG(flags, F_ISOLATE) && (file1->user_order == file2->user_order)) return -1;
#endif /* NO_USER_ORDER */
/* Exclude based on -1/--one-file-system */
if (ISFLAG(flags, F_ONEFS) && (file1->device != file2->device)) return -1;
/* Exclude files by permissions if requested */
if (ISFLAG(flags, F_PERMISSIONS) &&
(file1->mode != file2->mode
#ifndef NO_PERMS
|| file1->uid != file2->uid
|| file1->gid != file2->gid
#endif
)) {
return -1;
}
/* Hard link and symlink + '-s' check */
#ifndef NO_HARDLINKS
if ((file1->inode == file2->inode) && (file1->device == file2->device)) {
if (ISFLAG(flags, F_CONSIDERHARDLINKS)) return 2;
else return -2;
}
#endif
/* Exclude files that are not the same size */
if (file1->size > file2->size) return -1;
if (file1->size < file2->size) return 1;
/* Fall through: all checks passed */
return 0;
}
/* Check for exclusion conditions for a single file (1 = fail) */
static int check_singlefile(file_t * const restrict newfile)
{
char * restrict tp = tempname;
int excluded;
if (newfile == NULL) nullptr("check_singlefile()");
/* Exclude hidden files if requested */
if (ISFLAG(flags, F_EXCLUDEHIDDEN)) {
if (newfile->d_name == NULL) nullptr("check_singlefile newfile->d_name");
strcpy(tp, newfile->d_name);
tp = basename(tp);
if (tp[0] == '.' && strcmp(tp, ".") && strcmp(tp, "..")) return 1;
}
/* Get file information and check for validity */
const int i = getfilestats(newfile);
if (i || newfile->size == -1) return 1;
if (!S_ISDIR(newfile->mode)) {
/* Exclude zero-length files if requested */
if (newfile->size == 0 && !ISFLAG(flags, F_INCLUDEEMPTY)) return 1;
/* Exclude files based on exclusion stack size specs */
excluded = 0;
for (struct exclude *excl = exclude_head; excl != NULL; excl = excl->next) {
uint32_t sflag = excl->flags & XX_EXCL_SIZE;
if (
((sflag == X_SIZE_EQ) && (newfile->size != excl->size)) ||
((sflag == X_SIZE_LTEQ) && (newfile->size <= excl->size)) ||
((sflag == X_SIZE_GTEQ) && (newfile->size >= excl->size)) ||
((sflag == X_SIZE_GT) && (newfile->size > excl->size)) ||
((sflag == X_SIZE_LT) && (newfile->size < excl->size))
) excluded = 1;
}
if (excluded) return 1;
}
return 0;
}
static file_t *init_newfile(const size_t len, file_t * restrict * const restrict filelistp)
{
file_t * const restrict newfile = (file_t *)malloc(sizeof(file_t));
if (!newfile) oom("init_newfile() file structure");
if (!filelistp) nullptr("init_newfile() filelistp");
memset(newfile, 0, sizeof(file_t));
newfile->d_name = (char *)malloc(len);
if (!newfile->d_name) oom("init_newfile() filename");
newfile->next = *filelistp;
#ifndef NO_USER_ORDER
newfile->user_order = user_item_count;
#endif
newfile->size = -1;
newfile->duplicates = NULL;
return newfile;
}
/* Create a new traversal check object and initialize its values */
static struct travdone *travdone_alloc(const jdupes_ino_t inode, const dev_t device)
{
struct travdone *trav;
trav = (struct travdone *)malloc(sizeof(struct travdone));
if (trav == NULL) return NULL;
trav->left = NULL;
trav->right = NULL;
trav->inode = inode;
trav->device = device;
return trav;
}
/* Add a single file to the file tree */
static inline file_t *grokfile(const char * const restrict name, file_t * restrict * const restrict filelistp)
{
file_t * restrict newfile;
if (!name || !filelistp) nullptr("grokfile()");
/* Allocate the file_t and the d_name entries */
newfile = init_newfile(strlen(name) + 2, filelistp);
strcpy(newfile->d_name, name);
/* Single-file [l]stat() and exclusion condition check */
if (check_singlefile(newfile) != 0) {
free(newfile->d_name);
free(newfile);
return NULL;
}
return newfile;
}
/* Count the following statistics:
- Maximum number of files in a duplicate set (length of longest dupe chain)
- Number of non-zero-length files that have duplicates (if n_files != NULL)
- Total number of duplicate file sets (groups) */
static unsigned int get_max_dupes(const file_t *files, unsigned int * const restrict max,
unsigned int * const restrict n_files) {
unsigned int groups = 0;
if (files == NULL || max == NULL) nullptr("get_max_dupes()");
*max = 0;
if (n_files) *n_files = 0;
while (files) {
unsigned int n_dupes;
if (ISFLAG(files->flags, F_HAS_DUPES)) {
groups++;
if (n_files && files->size) (*n_files)++;
n_dupes = 1;
for (file_t *curdupe = files->duplicates; curdupe; curdupe = curdupe->duplicates) n_dupes++;
if (n_dupes > *max) *max = n_dupes;
}
files = files->next;
}
return groups;
}
/* BTRFS deduplication of file blocks */
#ifdef ENABLE_BTRFS
/* Message to append to BTRFS warnings based on write permissions */
static const char *readonly_msg[] = {
"",
" (no write permission)"
};
static char *dedupeerrstr(int err) {
tempname[sizeof(tempname)-1] = '\0';
if (err == BTRFS_SAME_DATA_DIFFERS) {
snprintf(tempname, sizeof(tempname), "BTRFS_SAME_DATA_DIFFERS (data modified in the meantime?)");
return tempname;
} else if (err < 0) {
return strerror(-err);
} else {
snprintf(tempname, sizeof(tempname), "Unknown error %d", err);
return tempname;
}
}
static void dedupefiles(file_t * restrict files)
{
struct utsname utsname;
struct btrfs_ioctl_same_args *same;
char **dupe_filenames; /* maps to same->info indices */
file_t *curfile;
unsigned int n_dupes, max_dupes, cur_info;
unsigned int cur_file = 0, max_files, total_files = 0;
int fd;
int ret, status, readonly;
/* Refuse to dedupe on 2.x kernels; they could damage user data */
if (uname(&utsname)) {
fprintf(stderr, "Failed to get kernel version! Aborting.\n");
exit(EXIT_FAILURE);
}
if (*(utsname.release) == '2' && *(utsname.release + 1) == '.') {
fprintf(stderr, "Refusing to dedupe on a 2.x kernel; data loss could occur. Aborting.\n");
exit(EXIT_FAILURE);
}
/* Find the largest dupe set, alloc space to hold structs for it */
get_max_dupes(files, &max_dupes, &max_files);
/* Kernel dupe count is a uint16_t so exit if the type's limit is exceeded */
if (max_dupes > 65535) {
fprintf(stderr, "Largest duplicate set (%d) exceeds the 65535-file dedupe limit.\n", max_dupes);
fprintf(stderr, "Ask the program author to add this feature if you really need it. Exiting!\n");
exit(EXIT_FAILURE);
}
same = calloc(sizeof(struct btrfs_ioctl_same_args) +
sizeof(struct btrfs_ioctl_same_extent_info) * max_dupes, 1);
dupe_filenames = malloc(max_dupes * sizeof(char *));
if (!same || !dupe_filenames) oom("dedupefiles() structures");
/* Main dedupe loop */
while (files) {
if (ISFLAG(files->flags, F_HAS_DUPES) && files->size) {
cur_file++;
if (!ISFLAG(flags, F_HIDEPROGRESS)) {
fprintf(stderr, "Dedupe [%u/%u] %u%% \r", cur_file, max_files,
cur_file * 100 / max_files);
}
/* Open each file to be deduplicated */
cur_info = 0;
for (curfile = files->duplicates; curfile; curfile = curfile->duplicates) {
int errno2;
/* Never allow hard links to be passed to dedupe */
if (curfile->device == files->device && curfile->inode == files->inode) continue;
dupe_filenames[cur_info] = curfile->d_name;
readonly = 0;
if (access(curfile->d_name, W_OK) != 0) readonly = 1;
fd = open(curfile->d_name, O_RDWR);
/* If read-write open fails, privileged users can dedupe in read-only mode */
if (fd == -1) {
/* Preserve errno in case read-only fallback fails */
errno2 = errno;
fd = open(curfile->d_name, O_RDONLY);
if (fd == -1) {
fprintf(stderr, "Unable to open '%s': %s%s\n", curfile->d_name,
strerror(errno2), readonly_msg[readonly]);
continue;
}
}
same->info[cur_info].fd = fd;
same->info[cur_info].logical_offset = 0;
cur_info++;
total_files++;
}
n_dupes = cur_info;
same->logical_offset = 0;
same->length = (uint64_t)files->size;
same->dest_count = (uint16_t)n_dupes; /* kernel type is __u16 */
fd = open(files->d_name, O_RDONLY);
if (fd == -1) {
fprintf(stderr, "unable to open(\"%s\", O_RDONLY): %s\n", files->d_name, strerror(errno));
goto cleanup;
}
/* Call dedupe ioctl to pass the files to the kernel */
ret = ioctl(fd, BTRFS_IOC_FILE_EXTENT_SAME, same);
if (close(fd) == -1) fprintf(stderr, "Unable to close(\"%s\"): %s\n", files->d_name, strerror(errno));
if (ret < 0) {
fprintf(stderr, "dedupe failed against file '%s' (%d matches): %s\n", files->d_name, n_dupes, strerror(errno));
goto cleanup;
}
for (cur_info = 0; cur_info < n_dupes; cur_info++) {
status = same->info[cur_info].status;
if (status != 0) {
if (same->info[cur_info].bytes_deduped == 0) {
fprintf(stderr, "warning: dedupe failed: %s => %s: %s [%d]%s\n",
files->d_name, dupe_filenames[cur_info], dedupeerrstr(status),
status, readonly_msg[readonly]);
} else {
fprintf(stderr, "warning: dedupe only did %" PRIdMAX " bytes: %s => %s: %s [%d]%s\n",
(intmax_t)same->info[cur_info].bytes_deduped, files->d_name,
dupe_filenames[cur_info], dedupeerrstr(status), status, readonly_msg[readonly]);
}
}
}
cleanup:
for (cur_info = 0; cur_info < n_dupes; cur_info++) {
if (close((int)same->info[cur_info].fd) == -1) {
fprintf(stderr, "unable to close(\"%s\"): %s", dupe_filenames[cur_info],
strerror(errno));
}
}
} /* has dupes */
files = files->next;
}
if (!ISFLAG(flags, F_HIDEPROGRESS)) fprintf(stderr, "Deduplication done (%d files processed)\n", total_files);
free(same);
free(dupe_filenames);
return;
}
#endif /* ENABLE_BTRFS */
/* Delete duplicate files automatically or interactively */
static void deletefiles(file_t *files, int prompt, FILE *tty)
{
unsigned int counter, groups;
unsigned int curgroup = 0;
file_t *tmpfile;
file_t **dupelist;
unsigned int *preserve;
char *preservestr;
char *token;
char *tstr;
unsigned int number, sum, max, x;
size_t i;
if (!files) return;
groups = get_max_dupes(files, &max, NULL);
max++;
dupelist = (file_t **) malloc(sizeof(file_t*) * max);
preserve = (unsigned int *) malloc(sizeof(int) * max);
preservestr = (char *) malloc(INPUT_SIZE);
if (!dupelist || !preserve || !preservestr) oom("deletefiles() structures");
for (; files; files = files->next) {
if (ISFLAG(files->flags, F_HAS_DUPES)) {
curgroup++;
counter = 1;
dupelist[counter] = files;
if (prompt) {
printf("[%u] %s\n", counter, files->d_name);
}
tmpfile = files->duplicates;
while (tmpfile) {
dupelist[++counter] = tmpfile;
if (prompt) {
printf("[%u] %s\n", counter, tmpfile->d_name);
}
tmpfile = tmpfile->duplicates;
}
if (prompt) printf("\n");
/* preserve only the first file */
if (!prompt) {
preserve[1] = 1;
for (x = 2; x <= counter; x++) preserve[x] = 0;
} else do {
/* prompt for files to preserve */
printf("Set %u of %u: keep which files? (1 - %u, [a]ll, [n]one)",
curgroup, groups, counter);
if (ISFLAG(flags, F_SHOWSIZE)) printf(" (%" PRIuMAX " byte%c each)", (uintmax_t)files->size,
(files->size != 1) ? 's' : ' ');
printf(": ");
fflush(stdout);
/* treat fgets() failure as if nothing was entered */
if (!fgets(preservestr, INPUT_SIZE, tty)) preservestr[0] = '\n';
i = strlen(preservestr) - 1;
/* tail of buffer must be a newline */
while (preservestr[i] != '\n') {
tstr = (char *)realloc(preservestr, strlen(preservestr) + 1 + INPUT_SIZE);
if (!tstr) oom("deletefiles() prompt string");
preservestr = tstr;
if (!fgets(preservestr + i + 1, INPUT_SIZE, tty))
{
preservestr[0] = '\n'; /* treat fgets() failure as if nothing was entered */
break;
}
i = strlen(preservestr) - 1;
}
for (x = 1; x <= counter; x++) preserve[x] = 0;
token = strtok(preservestr, " ,\n");
if (token != NULL && (*token == 'n' || *token == 'N')) goto preserve_none;
while (token != NULL) {
if (*token == 'a' || *token == 'A')
for (x = 0; x <= counter; x++) preserve[x] = 1;
number = 0;
sscanf(token, "%u", &number);
if (number > 0 && number <= counter) preserve[number] = 1;
token = strtok(NULL, " ,\n");
}
for (sum = 0, x = 1; x <= counter; x++) sum += preserve[x];
} while (sum < 1); /* save at least one file */
preserve_none:
printf("\n");
for (x = 1; x <= counter; x++) {
if (preserve[x]) {
printf(" [+] %s\n", dupelist[x]->d_name);
} else {
if (file_has_changed(dupelist[x])) {
printf(" [!] %s", dupelist[x]->d_name);
printf("-- file changed since being scanned\n");
} else if (remove(dupelist[x]->d_name) == 0) {
printf(" [-] %s\n", dupelist[x]->d_name);
} else {
printf(" [!] %s", dupelist[x]->d_name);
printf("-- unable to delete file\n");
}
}
}
printf("\n");
}
}
free(dupelist);
free(preserve);
free(preservestr);
return;
}
/* Hard link or symlink files */
/* Compile out link code if no linking support is built in */
#if !(defined NO_HARDLINKS && defined NO_SYMLINKS)
static void linkfiles(file_t *files, const int hard)
{
static file_t *tmpfile;
static file_t *srcfile;
static file_t *curfile;
static file_t ** restrict dupelist;
static unsigned int counter;
static unsigned int max = 0;
static unsigned int x = 0;
static size_t name_len = 0;
static int i, success;
#ifndef NO_SYMLINKS
static unsigned int symsrc;
#endif
curfile = files;
while (curfile) {
if (ISFLAG(curfile->flags, F_HAS_DUPES)) {
counter = 1;
tmpfile = curfile->duplicates;
while (tmpfile) {
counter++;
tmpfile = tmpfile->duplicates;
}
if (counter > max) max = counter;
}
curfile = curfile->next;
}
max++;
dupelist = (file_t**) malloc(sizeof(file_t*) * max);
if (!dupelist) oom("linkfiles() dupelist");
while (files) {
if (ISFLAG(files->flags, F_HAS_DUPES)) {
counter = 1;
dupelist[counter] = files;
tmpfile = files->duplicates;
while (tmpfile) {
counter++;
dupelist[counter] = tmpfile;
tmpfile = tmpfile->duplicates;
}
/* Link every file to the first file */
if (hard) {
#ifndef NO_HARDLINKS
x = 2;
srcfile = dupelist[1];
#endif
} else {
#ifndef NO_SYMLINKS
x = 1;
/* Symlinks should target a normal file if one exists */
srcfile = NULL;
for (symsrc = 1; symsrc <= counter; symsrc++) {
if (!ISFLAG(dupelist[symsrc]->flags, F_IS_SYMLINK)) {
srcfile = dupelist[symsrc];
break;
}
}
/* If no normal file exists, abort */
if (srcfile == NULL) continue;
#endif
}
if (!ISFLAG(flags, F_HIDEPROGRESS)) {
printf("[SRC] %s\n", srcfile->d_name);
}
for (; x <= counter; x++) {
if (hard == 1) {
/* Can't hard link files on different devices */
if (srcfile->device != dupelist[x]->device) {
fprintf(stderr, "warning: hard link target on different device, not linking:\n-//-> %s\n", dupelist[x]->d_name);
continue;
} else {
/* The devices for the files are the same, but we still need to skip
* anything that is already hard linked (-L and -H both set) */
if (srcfile->inode == dupelist[x]->inode) {
/* Don't show == arrows when not matching against other hard links */
if (ISFLAG(flags, F_CONSIDERHARDLINKS))
if (!ISFLAG(flags, F_HIDEPROGRESS)) {
printf("-==-> %s\n", dupelist[x]->d_name);
}
continue;
}
}
} else {
/* Symlink prerequisite check code can go here */
/* Do not attempt to symlink a file to itself or to another symlink */
#ifndef NO_SYMLINKS
if (ISFLAG(dupelist[x]->flags, F_IS_SYMLINK) &&
ISFLAG(dupelist[symsrc]->flags, F_IS_SYMLINK)) continue;
if (x == symsrc) continue;
#endif
}
/* Do not attempt to hard link files for which we don't have write access */
if (access(dupelist[x]->d_name, W_OK) != 0)
{
fprintf(stderr, "warning: link target is a read-only file, not linking:\n-//-> %s\n", dupelist[x]->d_name);
continue;
}
/* Check file pairs for modification before linking */
/* Safe linking: don't actually delete until the link succeeds */
i = file_has_changed(srcfile);
if (i) {
fprintf(stderr, "warning: source file modified since scanned; changing source file:\n[SRC] %s\n", dupelist[x]->d_name);
srcfile = dupelist[x];
continue;
}
if (file_has_changed(dupelist[x])) {
fprintf(stderr, "warning: target file modified since scanned, not linking:\n-//-> %s\n", dupelist[x]->d_name);
continue;
}
/* Make sure the name will fit in the buffer before trying */
name_len = strlen(dupelist[x]->d_name) + 14;
if (name_len > PATHBUF_SIZE) continue;
/* Assemble a temporary file name */
strcpy(tempname, dupelist[x]->d_name);
strcat(tempname, ".__jdupes__.tmp");
/* Rename the source file to the temporary name */
i = rename(dupelist[x]->d_name, tempname);