• Servers and Tools

What is a SHA1 Hash?

From the class:  Inside Git

We're going to take a quick detour from Git so that I can explain a core technical concept that you'll need to understand to understand how Git is storing objects. And that is, what is a SHA-1 hash, and what does it give us? If you've worked with Git at all, you've probably seen those commit hashes. And those are examples of SHA-1 hashes. And in the last video, you saw the Objects folder, which contains a bunch of files. And those file names are also SHA-1 hashes.

To show you what a SHA-1 hash is, I'm going to use the Ruby language. And the reason is because it's installed on most machines, on Macs and Linux machines, automatically. And it also has a really nice library for working with this stuff really simply. So we could do this in another language or another framework, but I'm just going to use Ruby because it's easy.

So let's just start playing with some examples. And to do that, I need to require a library, so I can require the SHA-1 library. And let's calculate a SHA-1 hash. To do that, I'll use the SHA-1 module, and then I'll call the hex digest method.

Now, what we need to do for a SHA-1 function is we pass it some content. So this could be anything. It could be hello world, or it could be any string we want. It could be the contents of a file. And when I do that, what we're going to get back from this function is a hash key. And notice that it's a 40-character string of just kind of seemingly random characters.

But the characters aren't actually random. These characters uniquely identify this piece of content. So if I give this content to the hex digest function again, notice that the value that we get back is identical. So these two keys here are identical. What this is called, it's called a hash function. And a hash function in computing means that if we pass something into the hash function, what we get back is an identifier for that content that uniquely identifies the content.

And in this case, with SHA-1, it's almost virtually impossible that if we pass different content in as a parameter we would get back the same key. So in other words, these keys are unique. They uniquely identify some piece of content.

Let's try a different string as the input. Maybe it'll just be hello world 2. The key that I get back from that is going to be different. It's going to uniquely identify hello world 2. And it'll be different from the previous hash values that we got up here. Let's do it again. Let's try hello world 2. And notice, this time we get back the same key that we got before. And every time I change this value here, I get back a different key.

So far, we've been passing in a string directly to the SHA-1 function, or the hash function. But you could imagine that this string could actually be the contents of a file. We could read the file, and then just pass that in to the hashing function directly. You see these examples of SHA-1 hashes all over the place. For instance, if you use a web framework, you might see that the CSS or JavaScript files are compiled, and they have this big long hash string at the end of them.

One of the reasons that hashes are so useful is that we can use them to quickly decide whether a file has changed, because we can say, are the two hash keys equal? If they're equal, then the content must be the same. And if they're different, then the content must be different. So SHA-1 hashes give us kind of a quick and dirty way to determine whether or not a file has changed, or a piece of content has changed, simply by whether or not the key is the same or different.

Let's actually write one of these values out to a file, so I know we're using the same vocabulary. I'll create a piece of content here, and maybe I'll just-- I'll call it value. And we'll just say that this is some piece of code. It could be any code. Obviously, it's not valid code, just some string.

And the key for this will be the hash value, so I'll say digest SHA-1 hex digest and pass in this value as a parameter. And we get back this key. Now, what I want to do is to write out to a file with this key being the file name and the value being the value. So I can use the write method of the file, and we're going to use the key as the file name. And the value will just be the value.

OK, so if I exit out of this and take a look at the contents of this directory, notice we have one file in here. And the name of the file is the hash key that was computed from the SHA-1 hash. And this key, it serves as both the file name and as a unique identifier to the content. I can use the cat command to take a look at the contents of the file. And notice, we get back some code. This is the value that we stored up here.