Sunday, 10 March 2019

An easy way to remove duplicate files

After years of backing up files from various harddrives and various folders into new harddrivers and additional various folders, we inevitably get a duplicate music, video and photo files - here are 2 tools that can help clean up: fdupes and exiftool

Installing tools

# Fedora $ dnf -y install fdupes perl-Image-ExifTool # debian $ apt install libimage-exiftool-perl fdupes

Renaming based on metadata

The magic from this comes from using exitfool to give us some legibility of the files (rename them!) based on the metadata and then using fdupes to remove the duplicates based on initially file size and the file hashes. Assuming we have moved all the files into a directory called backup
$ exiftool \ -d '%Y-%m-%d_%H%M%S' \ '-filename<${filemodifydate;$_=undef if $self->GetValue('DateTimeOriginal')}-%f.%le' \ -r \ -ext MOV -ext mov \ -ext MP4 -ext mp4 \ -ext JPG -ext jpg \ -ext JPEG -ext jpeg \ -ext HEIC -ext heic \ -ext PNG -ext png \ -ext NEF -ext nef \ -ext DNG -ext dng \ ./backup # use following to create 'year' directory automatically: # -d '%Y/%Y-%m-%d_%H%M%S'

Additional Control for renaming

$ exiftool \ '-filename<CreateDate' \ -d %Y-%m-%d-%H%M%%-c.%%le \ -r -ext MOV -ext mov -ext MP4 -ext mp4 -ext JPG -ext jpg ./backup # update images to incl date and model $ exiftool \ -d "%Y-%m-%d %H%M%S" \ '-filename<${datetimeoriginal}-${model}-${filename} \ *.NEF *.DNG

# and for the ones with no meta $ exiftool \ '-filename<filemodifydate' \ -d %Y-%m-%d-%H%M%%-c.%%le \ -r \ -ext MOV -ext mov \ -ext MP4 -ext mp4 \ -ext JPG -ext jpg \ ./backup
Similarly for audio files:
$ exiftool \ '-Directory<<Artist/<Album' -r \ -ext MP3 -ext mp3 \ -ext FLAC -ext flac \ -ext M4A -ext m4a \ ./backup/ $ exiftool \ '-filename<$Track - $Title.%le' \ -r \ -ext MP3 -ext mp3 \ -ext FLAC -ext flac \ -ext M4A -ext m4a \ ./backup/

Removing based on checksums

And finally remove duplicates based on file checksums:
$ fdupes -rdNsI ./backup/ $ find . -type d -empty -delete ./backup
Processing audio files may also be enhanced by using the audio data checksums and using a patched fdupes:
# audio only checksum, equivalent on same audio files with different metadata $ ffmpeg -hide_banner -i foo.m4a -c:a copy -bsf:a null -f hash - # using patched fdupes and '-a' flag $ fdupes -rdNsIa ./backup

No comments:

Post a Comment