Linux and Open Source Blog

  • Home
  • WordPress Plugins
  • About
  • Contact

How to compare the content of two folders automatically

Posted on February 6, 2013 by Linewbie.com Posted in how to .

Many of us end up, inevitably, with so many files and folders that it is impossible to keep them under control without some specialized help. Luckily, as I’ll show you in a moment, under Linux there are several, very efficient solutions to this problem.

Multiple copies of many files, scattered all over the computer, waste space, create confusion, and slow down desktop indexers like DocFetcher. I have already explained how to find and remove the unwanted extra copies here.

When it comes time to clean up your folders and files, a common problem crops up: how can I find where duplicate files and folders exist between multiple directories? The problem is both more complex and much more common than it may appear at first sight. A directory may contain many, many levels of sub-directories, each with thousands of files of all sorts. Trying to figure out manually the differences between two directory trees like those could take days.

One reason why you need to know the differences between directories is so you can ensure that all your backups are working as expected! What if the automated backup procedure you run every day has a bug? What if a sector of the external drive(s), DVDs, or remote computer to which you continuously copy all your precious folders suddenly (and silently) broke? Would you notice it before actually needing those backups? This is the main reason to be able to quickly find out if the contents of two folders differ. Let’s see how to make this easy.
Automatic comparison

It is important to be able to run certain checks automatically from a shell script. Especially if all you want is a quick yes or no answer and automatic notifications. Here are a few command line utilities that you may use as a basis for scripts that perform such checks. You may then run those scripts either as automatic cron jobs, or whenever you feel like checking if that DVD or external drive is still free from errors.
find

This pipe of commands:

find $FOLDER -type f | cut -d/ -f2- | sort > /tmp/file_list_$FOLDER

will save in /tmp/file_list_$FOLDER an alphabetically ordered list of all the files inside $FOLDER, complete with the corresponding sub-folders, e.g. something like this:

family/health_insurance.pdf

family/holiday_quote.pdf

pictures/2012/graduation.jpg

work/linux-review.odt

Running the pipe on more directories and comparing the corresponding file lists will not find all the differences between them. You will only spot missing files, or folders containing sets of files with different names. Files with the same names and in the same subfolders, but with different content, will not show in the lists. Still, this may be a very quick way to spot certain mismatches.
diff

Diff is normally used to compare two files, but can do much more than that. The options “r” and “q” make it work recursively and quietly, that is, only mentioning differences, which is just what we are looking for:

marco #> diff -rq todo_orig/ todo_backup/

Only in todo_orig/essays: Digital-Citizenship-tech4engage-summit-report.pdf

Files todo_orig/copyright/copyright_licensing.t2t and todo_sync/copyright/copyright_licensing.t2t differ

diff: todo_orig/embedded_linux/init.d/led_driver: No such file or directory

diff: todo_backup/embedded_linux/init.d/led_driver: No such file or directory

Files todo_orig/strider/food/backpacking_food.t2t and todo_sync/strider/food/backpacking_food.t2t differ

…

As you can see, all the differences between two directory trees appear, be they files only present in one of them, or files that are different. Even files that, like “led_driver”, are present in both folders but don’t really exist, because they are links to other files that were canceled, are listed. Counting the number of lines generated by such an invocation of diff shows immediately if the two trees differ, as in this pseudo Bash code:

DIFF_NUM=`diff -rq $DIR_1 $DIR_2 | wc -l`

if [ “$DIFF_NUM” -gt “0” ]

do

# send me an email listing all the differences

done

rsync

Rsync can produce a difference report that you may parse and use in the same way as the one from diff:

marco #>rsync -rvnc –delete todo_sync/ todo_orig/

sending incremental file list

deleting essays/Digital-Citizenship-tech4engage-summit-report.pdf

copyright/copyright_licensing.t2t

skipping non-regular file “embedded_linux/init.d/led_driver”

strider/food/backpacking_food.t2t

sent 148763 bytes received 473 bytes 27133.82 bytes/sec

total size is 854518613 speedup is 5725.95 (DRY RUN)

The four command line switches r, v, c and n tell rsync (check the man page for details) to perform a verbose, recursive, checksum-based synchronization of the two directories, but only for show: -n, in fact, displays what rsync would do IF you did let it free to make the second folder a perfect copy of the first one. The huge advantage of rsync over rdiff is that the former can compare local directories with remote ones.

Author info:

Marco Fioretti is a freelance writer and teacher whose work focuses on open digital technologies.

1 Comment
« Top 5 reasons to start experimenting with Linux
How to import very large sql dump via phpmyadmin »

One Response

  1. Cadence Nagy says
    February 10, 2014 at 10:44 am

    I’m curious to find out what blog platform you’re utilizing?
    I’m experiencing some minor security problems with my latest
    blog and I’d like to find something more secure.
    Do you have any solutions?

    Reply

Leave a comment

Leave a comment Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • Does an un-confirmed Bitcoin transaction expire?
  • Looting of the Fox: The Story of Sabotage at ShapeShift
  • Decentralization, Scalability, and Fault Tolerance of Bitcoin
  • Stripe will soon accept Bitcoin payments
  • Zynga announces Bitcoin acceptance in game
  • How to import very large sql dump via phpmyadmin
  • How to compare the content of two folders automatically
  • Top 5 reasons to start experimenting with Linux
  • The day our mind became open sourced
  • Mark Shuttleworth wants to turn canonical (ubuntu) into the next Apple Inc.

Categories

  • applications/software (26)
    • browsers (2)
    • development (1)
    • information management (1)
    • Mobility (1)
    • multimedia (5)
    • office suites (2)
    • security (6)
    • servers (6)
    • system (2)
  • audio/video/pics (3)
  • Bitcoin (3)
  • books & literature (1)
  • cms/portals (1)
  • desktop environments (7)
    • gnome (2)
    • kde (5)
  • events/shows (3)
    • interviews (1)
    • people (1)
    • surveys (1)
  • games & gaming (2)
  • general topics (4)
  • guides (112)
    • how to (105)
    • tips (87)
    • tutorials (86)
  • hardware (8)
    • desktop & laptop pc (5)
    • gadgets & mobiles (2)
  • howtoforge (47)
  • internet/web (4)
    • design & development (2)
  • linux and open source blog (49)
  • linux.com (76)
  • linux/unix/os distros (113)
    • debian/ubuntu based (10)
    • mac/osx (2)
    • other distros (3)
  • news (217)
  • open source (8)
    • business & foss (2)
  • other (26)
    • uncategorized (26)
  • Programming (3)
    • PHP (2)
  • quotes & thoughts (10)
  • random stuff (4)
    • cool stuff (3)
    • funny stuff (1)
  • review/preview/tests (7)
  • wordpress/blogging (3)

Archives

  • July 2016
  • April 2016
  • January 2015
  • April 2014
  • January 2014
  • November 2013
  • February 2013
  • November 2012
  • April 2012
  • March 2012
  • January 2012
  • December 2011
  • August 2011
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2006

Recent Posts

  • Does an un-confirmed Bitcoin transaction expire?
  • Looting of the Fox: The Story of Sabotage at ShapeShift
  • Decentralization, Scalability, and Fault Tolerance of Bitcoin
  • Stripe will soon accept Bitcoin payments
  • Zynga announces Bitcoin acceptance in game
  • How to import very large sql dump via phpmyadmin
  • How to compare the content of two folders automatically
  • Top 5 reasons to start experimenting with Linux
  • The day our mind became open sourced
  • Mark Shuttleworth wants to turn canonical (ubuntu) into the next Apple Inc.

Categories

  • applications/software (26)
    • browsers (2)
    • development (1)
    • information management (1)
    • Mobility (1)
    • multimedia (5)
    • office suites (2)
    • security (6)
    • servers (6)
    • system (2)
  • audio/video/pics (3)
  • Bitcoin (3)
  • books & literature (1)
  • cms/portals (1)
  • desktop environments (7)
    • gnome (2)
    • kde (5)
  • events/shows (3)
    • interviews (1)
    • people (1)
    • surveys (1)
  • games & gaming (2)
  • general topics (4)
  • guides (112)
    • how to (105)
    • tips (87)
    • tutorials (86)
  • hardware (8)
    • desktop & laptop pc (5)
    • gadgets & mobiles (2)
  • howtoforge (47)
  • internet/web (4)
    • design & development (2)
  • linux and open source blog (49)
  • linux.com (76)
  • linux/unix/os distros (113)
    • debian/ubuntu based (10)
    • mac/osx (2)
    • other distros (3)
  • news (217)
  • open source (8)
    • business & foss (2)
  • other (26)
    • uncategorized (26)
  • Programming (3)
    • PHP (2)
  • quotes & thoughts (10)
  • random stuff (4)
    • cool stuff (3)
    • funny stuff (1)
  • review/preview/tests (7)
  • wordpress/blogging (3)

Archives

  • July 2016
  • April 2016
  • January 2015
  • April 2014
  • January 2014
  • November 2013
  • February 2013
  • November 2012
  • April 2012
  • March 2012
  • January 2012
  • December 2011
  • August 2011
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2006
Privacy Policy

Est. 2002

linewbie.com serving the linux and open source community since April 09, 2002

CyberChimps WordPress Themes

© Linux and Open Source Blog