Using MD5 to check equality between files


28th of October 2005

To some Python users this is old-school old-news stuff but since I've never used it before I found it worth mentioning.

I have a script that scans a rather large tree of folders filled with files. None of the folders have the same name but they can mistakably contain the same files eg:

 folder XYZ-2005-11-27/
    email1.bin
    email2.bin
 folder CBA-2005-07-10/
    email1.bin
    email2.bin

Sometimes two different folders contain the same file names exactly. Sometimes, the file sizes as equal too. But in some of those cases, even though the file sizes and names are the same they are different files. But! If they are the same files just in different locations I want to find them. How to do that?

The trick is to use the md5 module in Python, like this:

 f1 = file(os.path.join(path_1, os.listdir(path_1)[0]) ,'rb')
 f2 = file(os.path.join(path_2, os.listdir(path_2)[0]) ,'rb')
 print md5.new(f1.read()).digest() == md5.new(f2.read()).digest()

UPDATE As "cableguy" pointed out, the files should be opened in binary form.



Comment

Show all 10 comments
 
Name:
Email:
hide my email address.

Your email address will be encoded to prevent email-extraction spiders from reading it so you won't get spammed if you decide to show your email address.