Whenever I am stuck on deciding what to write for my next article, I usually can depend on some of my friends to give me some ideas albeit they themselves won’t know of this! A lot of times I do get ideas on some topics but decided to skip on them because I keep thinking that it’s simple stuff and that most people will know of it already. So, when a friend emailed me asking on how they can go about verifying a file they receive hasn’t been tampered with during transit, I immediately found my next article write-up. My friend is a semi technical individual so when he asked for this favor, I was a bit shocked that he didn’t know what a MD5 or SHA1 hash is. If you personally have a need to verify that what you’ve downloaded is the exact same match as the original, running that file through a hashing algorithm can do just that. Most users will never need to do this throughout their time on the Internet as many of them immediately believe that what they downloaded is the same bits and pieces as the original file. This may or may not be the case and for many of users, they really don’t care. However, for more important documents and whatnot, you definitely want to check whether the file you have on your computer now is the same as the one you downloaded from.
So, why in the world would you actually need to verify a piece of file that you’ve downloaded? If something is wrong, can’t I just simply redownload it? The answer is yes, you simply can. However, how would you know that the file you have right now has the same bits of 1’s and 0’s as the original file? That is a bit more challenging because if the file is not the exact match, there is no popup box or warning message that tells you so. The file or program may still open and function as expected even if a couple of bits are corrupted and that is why most users don’t care about this kind of stuff. If it works, then that’s all I need to know. In other situations and scenarios, however, you have a need to verify that everything and I mean everything is the same as the original. Not one single bit is corrupted, missing or altered.
For example, what if you had to sign a very important document that was sent to you by a lawyer? Wouldn’t you want to have a peace of mind that the document you received via email is the exact same document as the one your lawyer sent you and that nothing has been altered between the time he/she sent you that email and you signing it? If you downloaded a program and somehow it’s not working as expected, you’d normally be asked to verify whether or not the download was “corrupted” and you’d check that by running the file through a hashing algorithm and comparing the results to a known good hash of the same program.
MD5 and SHA1
In order to perform the actual verification, we need to run our files through a hashing algorithm. The two most popular one’s today are MD5 and SHA1. What’s the good news? The good news is that you yourself do no necessarily have to be concerned with how these algorithms actually work behind the scenes. Just place your trust in the mathematical geniuses that came up with these algorithms. SHA1 actually was created by the National Security Agency of the United States government. Don’t trust our government? Not a problem. I have no doubt that SHA1 has been closely inspected and dissected by many advance security researchers around the world and I’m sure they approve of it as well.
Think of these hashing algorithms as more like a formula or some sort of machine that has both a input and output end. Whenever you want to verify a file, you’d throw the file into the input end of the “machine”. This machine will then perform its calculations on the file. The larger your file, the longer it will take to compute. The faster your CPU is the less time it will take. But one thing is for certain and that is the outcome. When the machine has finished with its calculations based on the hashing algorithm or “formula”, it will spit out what would seem like a whole bunch of random letters and numbers. However, these numbers and letters are anything but random. In fact this is how verification takes place.
Let’s take a look at the SHA1 hashing algorithm. The beauty of the algorithm is that anytime you run a file through it, the output will always give you a string of 40 characters. No more and no less. 40 characters. This is set in stone I believe and it doesn’t matter if your file is 1KB is size or 1GB. The second more important part to understand is that any time you run a file of any kind through this algorithm, the output characters will always be the same provided that nothing has been altered. This means that if you run a file under the SHA1 algorithm, you’d get the expected output of 40 characters. If you run the same file again under SHA1 but using a different computer or system, as long as it is run under the same algorithm, the result should always be the same 40 characters. If even just a tiny change has been made on the file, running that file again under the SHA1 hashing algorithm will produce a totally different output of characters. The only similarity is that the output still consists of 40 characters. That never changes.
As you can see, an individual can easily verify a file or document with the original owner by simply asking him or her to run the file under the same hashing algorithm. If both 40 character outputs are an identical match, then you can have complete confidence that both files are exact duplicates and nothing has been altered on either side.Hashing is a one way function. This means that it is very hard to reconstruct the original document based on the given hash results itself. This is different from encryption because encryption is usually a two way function. When you encrypt a document, you or someone else is able to reverse the process and decrypt it back to its original form. It’s usually not necessary to verify all 40 characters. In most cases, you can just compare the last 5-6 characters. With these algorithms, a single change can alter the resulting hash in a big way and so if the last 6 characters or so are identical, there’s a massive good chance that the other characters will be identical as well or vice-versa. Other utilities also allow you to simply paste in a resulting hash and compare it with a file. It will then notify you if the two hashes are identical or not without you having to “eyeball” it.
So how do we go about verifying our files? Simple. All we need is an utility that allows us to use the SHA1 hashing algorithm! There’s no doubt in my mind that many of them exists but one of my personal favorites is a utility called HashTab by Implbits. This nifty utility, once installed, will seamlessly integrate itself with Windows Explorer. Any time you right-click a file and head over to its Properties menu, you’ll find yourself with an extra tab called File Hashes. As soon as you click on this tab, the utility will immediately begin to run the file you’ve right-clicked under the SHA1, MD5 and CRC32 hashing algorithms by default. Once finished (again, the time this takes depends on the size of the file and your processor’s speed), the results will be displayed right in that same tab! No messing around with the command line and typing of any kind.You can download HashTab from here. Personal use of the software is completely free but use of the software in a professional environment is not.
Below are two screenshots. The first is a sample text file and the other is the file’s SHA1 output using HastTab:
I mentioned earlier that even a single minor change to the document will produce a totally different SHA1 output. Below, I make a tiny change to the same document. I simply deleted the first comma in the first line of text. This is after the word “amet”. As you can see for yourself, the SHA1 output and not to mention all the other outputs of different algorithms as well are very different from the first:
If you want HashTab to calculate hash results for other hashing algorithms, simply select the Settings link and you’ll have the ability to add/remove the other algorithms. For most use cases, SHA1 and MD5 is usually the most popular and that is what most users will go with.Please folks, do not be gullible enough to upload your documents and whatnot to an online service that “promises” to calculate the hash for you! Once you upload your important documents, you have no idea whose hands will get to them. When it comes to these important functions, its imperative that you perform them locally right on your own personal computer.
So What Now?
Well, now that you know how to verify if two files are an exact match, it’s up to you on how to use it. For example, if you’d like to distribute some kind of file to other public users, you can simply run the file through an algorithm and post the hash on your website. If a person downloading your file chooses to run the same hashing algorithm on the file for verfication, they’ll know whether or not the copy they have is an exact duplicate.
There are also a couple of other things you should be aware of:
Hashing does not equal encryption= Hashing a file results in an output of letters and numbers that allows you to verify its authenticity by comparing it with another good copy. There is no confidentiality built in because the original file does not get altered in any way, shape or form. What hashing allows you to do is figure out whether or not the file is authentic and whether or not it has been changed by the original owner.If encryption is what you fancy, then I have written two previous articles on just that. One teaches you how to encrypt your emails with OpenPGP. By doing so, not only does the email gets transferred cryptically but a hash is also run on your email message to prove to the recipient that the message has not been altered in any way, similar to what I showed you here with regular files and documents. The other article shows you how to encrypt individual files and documents so that only the recipient you chose have the ability to decrypt and open it.
Hashing does not mean a file is malware free = Just because an anonymous uploader on the Internet gives you a hash result of the original file he uploaded means that the file is safe! The hashing algorithm does not care what gets thrown in or who uses it. It’s main job is to calculate the hash results and give that to you. Therefore, the file you download can still be malicious in nature even though the given hash results are known. What this can prevent is dubious copies of software. For example, a lot of users download the Microsoft operating system ISO images for testing purposes via Torrent. If the original image created by Microsoft has a hash result of say 1234abcd, then an ISO image uploaded by another individual claiming to be the same as Microsoft’s ISO image can have a different hash result of say abcd1234. This immediately lets you know that the second copy is bogus and not to be trusted.
With this information at hand, it’s up to you on whether to use it or not and when. Hashing can serve a very important purpose and if used at the right times, it can actually help prevent a lot of headaches down the road.