May
20
Easily Manage Duplicate Files and Save Storage Space
May 20, 2009 | By: UbuntuLinuxHelp | 1 Comment
Posted in How to...
Have you ever had several copies of the same file on your PC? I tend to make duplicates of files before I edit them, this is especially true with graphic or video files - even config files for that matter. That way, when I make serious mistakes, I can always go back to square one.
The only problem I have, is that there are several revisions of files wasting my drive space, files I've had for years, that I've never cleaned up. Time for some spring cleaning.
I can of course, manually search, compare and delete duplicate copies, but that would take me a few weeks to complete. Let's face it, if you were as messy with file copies as I was, you'd probably look for some automated tool. Using Google, I found "fdupes", which lead to several others on Wikipedia. Before I jump into fdupes, I just wanted to give mention some of the other duplicate file finders I found, as each of them had one unique benefit (in red type).
dupmerge: From their site "...Dupmerge reads a list of files from standard input (eg., as produced by "find . -print") and looks for identical files. When it finds two or more identical files, all but one are unlinked to reclaim the disk space and recreated as hard links to the remaining copy..."
dupmerge is not in the Ubuntu repositories (at least not for Hardy), but you can get the tar gz from Freshmeat.
Rdfind: From their site "... finds duplicate files. It is useful for compressing backup directories or just finding duplicate files. It compares files based on content, NOT on name. When I want to change some file, I am often nervous to break something and therefore copy all the old files to some directory named app_2006xxxx or whatever. The same when I switch computer system and am afraid to lose my old stuff. This makes all my files exist in numerous places, and I never feel like cleaning up. This is where rdfind comes in handy. It will find those files and report them to you. Optionally, erase them or replace them with links (hard or symbolic). Rdfind is a command line tool – that means no GUI..."
Again, this is not in the Ubuntu (Hardy - 8.04) repositories. But you can get an Ubuntu deb package (or install from source) at the authors site.
Freedup: From their site: "...walks through the file trees (directories) you specify. When it finds two identical files on the same device, it hard links them together. In this case two or more files still exist in their respective directories, but only one copy of the data is stored on disk; both directory entries point to the same data blocks..."
Freedup is not in the Ubuntu (Hardy) repositories either, so you can install form source (or use alien to convert the rpm into a deb).
Fslint: From their site "...find and clean various forms of lint on a filesystem. I.E. unwanted or problematic cruft in your files or file names. For example, one form of lint it finds is duplicate files. It has both GUI and command line modes..."
You can grab a copy off the site or click this link: apt:fslint
I should point out a special mention - I have a lot of thumbs.db files throughout my system. They were left over from Windows. It would be nice to have a fast, simple, scripted way to delete them. The tuxero blog has a great post "How to delete useless Windows Files in Ubuntu Linux" with the following script to do this:
find /my_path -type f -name "Thumbs.db" -exec rm -f {} \;
Please read tuxero's post that contains more information and caveats.
However as to fdupes, it's in the Ubuntu repositories, install it via apt:fdupes, download a copy or simply use a terminal command:
sudo aptitude install fdupes

One thing I noticed, there did not appear to be much documentation of the options (switches) for fdupes. I eventually found a read me with the documentation instide the tar gz. For ease of reference:
Usage: fdupes [options] DIRECTORY...
-r --recurse
Include files residing in subdirectories
-s --symlinks
Follow symlinks
-H --hardlinks
Normally, when two or more files point to the same disk area they are treated as non-duplicates; this option will change this behavior
-n --noempty
Exclude zero-length files from consideration
-f --omitfirst
Omit the first file in each set of matches
-1 --sameline
List each set of matches on a single line
-S --size
Show size of duplicate files
-q --quiet
Hide progress indicator
-d --delete
Prompt user for files to preserve and delete all others; important: under particular circumstances, data may be lost when using this option together with -s or --symlinks, or when specifying a particular directory more than once; refer to the fdupes documentation for additional information.
So... to recursively find duplicates in your home folder, list the file size of duplicates and be prompted to preserve one of the copies, the terminal command would look like:
fdupes -r -S -d /home/your_account_name
Above all, if you are not sure of something, ALWAYS have a BACKUP on CD, DVD or external storage BEFORE removing files. That way, if something goes wrong, you can always recover.
Pretty neat huh? And it's fast too!
Related posts:
- How to Install VMware in Linux via Source Files
- Ubuntu Updates Versus Disk Space – Reader Questions
- Sharing the Same Files Between Two PC’s
- Duplicate Your DVD’s Using Ubuntu
- The Fast, Simple Way to Save Streaming Website Video.


(3 votes, average: 3.67 out of 5)
[...] Read more at Ubuntu Linux Help [...]