It’s been a while since I’ve created this blog but until now, I haven’t really talked about one of my favorite topic where networking and computers in general are concerned: the domain name system or better known as DNS. I remember being asked recently on just how does the Internet work? Many users know that the Internet itself is very big but they don’t have a clue as to just how computers from one side of the planet can talk to another computer on the other side. To put it in more broad terms, do you have any idea on just what the heck happens between the time you enter in an URL address in your favorite browser and pressing the Enter key to the webpage actually appearing on your screen? Have you ever wondered just how did your computer, sitting somewhere in the world, was able to communicate with a web server belonging to Facebook or Twitter, which also could be located anywhere in the world as well? If so, then this article series is definitely for you! While DNS is definitely a huge topic, the aim of this three article series is not meant to turn you into a DNS administrator or expert by the time you’re finished with it! What I promise will happen is you getting a lot more clearer picture at just how computers can manage find each other in the biggest computer network in existence today, the big ol’ Internet!
Humans Are Stupid
One of the first things you should understand about DNS is how computers actually communicate with each other. You see, computer systems believe that us humans are dumb. Why? Because computers locate each other via numbers which they believe is much more efficient than using letters, symbols and numbers. However, us humans are very bad at remembering a large quantity of numbers. Heck, I’m the type of guy who have a hard time remembering a single phone number! For us, remembering a name such as Yahoo.com is so much more easier than trying to remember 126.96.36.199! That long string of number is called an IP address and it actually maps to the Yahoo.com domain name. So, when a user types in Yahoo.com into their browser, what the computer actually needs to do is translate or map that domain name to an IP address. This process is called name resolution. Well, that was just one example. There are literally millions and millions of web pages out there but each and every time you enter in an URL address, name resolution is performed.
So now you’re probably thinking that solving this problem is very simple. Just create a master database with all the name and IP address mappings listed within and call it a day. Anytime a computer needed to resolve a name, just have it consult this master database file. Surely that will work, right? The answer is yes, it definitely will. In fact, that’s how it originally worked as I talk and explain about the HOST file in the next section. The next question you then have to ask yourself is just who in the world will manage this database?! Due to the sheer size of the Internet, it will be next to impossible to keep this master database file updated. I’m sure every single second that goes on there is some sort of change within the thousands and thousands of individual computer networks that have a presence on the Internet. Good luck having someone interview for that job!
The Dreaded HOST File
To put things in perspective and give you an idea of just why a system such as DNS was very much needed, you’d have to go back to the late 60’s and early 70’s where the military successfully created one of the first computer network. This network was called the ARPANET. What this network did is of no great importance where DNS is concerned. What is important in our discussion here however is to understand that at this time, there weren’t many computers that the administrators needed to keep track of because the network was considered private. The Internet as we know of it today obviously didn’t exist at that specific time period. If there were only 15 computers on the network at any given time, communicating with them is as simple as using a file to map the computer’s host name to their known IP address. In fact, that’s exactly what they did! This simple database or file was called the HOST file and its main job is to allow a computer to find the IP address of another computer on the network via its host name. For example, if my computer needed to find the IP address for a host computer called COMPUTER01, then it would look inside the HOST file. Within this file, there are two pieces of information and those simply are the the IP address for a given computer host or name. So, once having found the host name for COMPTUER01 within the HOST file, the computer would also learn of its IP address! Therefore, if you had 15 computers on the network, then you would have 15 entries within the HOST file. Simple but efficient. Most importantly, it actually worked great……..at first.
For the curious, the HOST file on a Windows computer is usually located at:
You can use Notepad to open the file. Below is a picture of how an unaltered HOST file will look like:Ever wonder what happens when a computer is given only the IP address and instead needs to look up a host name instead of the other way around? Well, the same thing happens. A name resolution would be required. This process, however, rather than being labeled as a “forward lookup”, would be labeled as a “reverse lookup” instead. The focal point of this article and the next is solely on forward lookup because that type of lookup is what DNS servers all over the globe have to perform the majority of the time.
Where Did it All Go Wrong?!
I guess the main theme with the ARPANET was simplicity. Everything needed to be as simple as possible. Why go through the trouble of creating a complex communication system when only a few computers were joined to the network? Now that you know just what exactly the HOST file is and how it looks like, your next question should be just how did they manage it? The answer, as you might have guessed already, is manually by hand! A person responsible for the HOST file made sure that any new computer hosts that are joined the network as well as was deleted or had their host name and/or IP address changed were also reflected in the HOST file. The file would then either be placed on a central server or distributed manually to all the other computers on the network. If COMPUTER01 changed its IP address, then the administrator for the master HOST file had to make sure to make this change as quickly as possible otherwise other computers on the network wouldn’t be able to communicate with it! Even if the administrator quickly made the change to the file, there still could be problems because a computer might not have updated their HOST file to this newest version! As you can quickly see, even on a small network such as the ARPANET how a much more efficient method for name resolution was needed. As the ARPANET grew in size (again, it’s not imperative to know why or how it grew but just the fact that it grew to a much more bigger size is enough), so did the need for a more efficient system for computers to map host names to IP addresses!
Make it Go Away!
Well, by now, you should realize that having a person manually updating the HOST file for hundreds and millions of computer hosts on a network is literally asking that person to commit career suicide. It’s just not feasible nor is it probably even possible! Well, luckily in 1983, a computer scientist named Paul Mockapetris created the Domain Name System and you can thank or worship whomever deity you choose so that he did! It’s a brilliant system and it works extremely well. I’m guessing by the way that we still use it today is a testament of the system’s reliability and more importantly, scalability strengths. At this point, you’d expect me to completely drop the subject of the dreaded HOST file but that is where you are wrong my dear readers. You see, the HOST file is actually still in use today even with DNS succeeding it! For backward compatibility purposes and for very specific scenarios, the HOST file still exists within our systems. In fact, this may come as a shock to some but as I’ll explain in the next article, a computer resolving a host name actually looks at the HOST file for the answer first before making an attempt at using DNS!
The Domain Name System
Alright, enough talk about the HOST file. Every talk about DNS can’t be complete without mentioning the HOST file because it is important you understand how name resolution worked in the past to really understand how DNS truly saves the day. With DNS, you can call it a hierarchical system with many different levels and branches. Think of it like this. If you have a really big task to accomplish, wouldn’t it make sense to break that task up into smaller portions and delegate that task to different groups of individuals? Well, this is the building block for DNS. As the network grew, it simply was not possible to have one governing body to rule them all. Instead, DNS breaks the namespace into more manageable chunks so that different organizations manage a specific portion of the namespace. Well, OK, what I just said wasn’t really all that true. There is actually a governing body that rule over the Internet namespace, sort of.
Immediately below the top level domains, we have our second level domain and here is where things get more interesting. Second level domains are where mere mortals like us actually get to own a piece of the Internet, sort to speak. At each level of the DNS pyramid or domain level, they each are maintained by different organizations. If this wasn’t the case, I’m sure mass chaos would ensue! The root domain is maintained by a very special group of people. They in turn delegate authority of the .com, .net, .info, .mil and all the other second level domains to other organizations. These organizations in turn delegate authority of second level domains to normal businesses and companies that want to have a public presence on the Internet. In most cases, this level is also where Internet Service Providers (ISPs) reside at.
What are some of the second level domains you ask? Here are some examples: CNN, Facebook, Twitter, Yahoo, Microsoft, ESPN and a host of others. I’m sure you get the idea. If that still doesn’t ring a bell, how about looking at it from this angle: cnn.com, facebook.com, twitter.com, yahoo.com and microsoft.com. Looks more familiar right? Well of course it does! This is how we get to websites within our browsers everyday! What this means is that those companies actually took the time to register their company names within the .com top level domain. They either paid a yearly fee to the organization that manages the .com top level domain or through some other third party organization. This allows them to have a public presence on the Internet because whenever a client wants to reach a server located within the Microsoft.com domain, the .com DNS servers have the necessary information to point the user to the right location. This will be much clearer in my next article.Although these companies have registered their domain names publicly on the Internet, there is nothing stopping me from creating a test network or lab using the same name! For example, I could easily create my own local network with a domain name of microsoft.com. and I know for sure I won’t be receiving any letters in the mail from Microsoft themselves to see me in court. The problem with this approach is anytime I need a public presence on the Internet. As you might have suspected, Internet registrars will not let me register for the Microsoft.com domain name because it has already been registered by the Microsoft team themselves.
Sub-domains and FQDN’s
By now, you should have a better understanding of the DNS system, if just a bit more. Continuing on, your next question would probably be just where the heck does the WWW portion come into play? So far, I’ve talked about the root, top level and second level domains. So is WWW another domain level on the DNS pyramid? To better understand the answer, we now focus our attention on subdomains and fully qualified domain names. Let’s use Microsoft as the example here. If Microsoft registered for the Microsoft domain name within the .com top level domain, wouldn’t it make sense for Microsoft to be in charge of any other domains they want to create under the Microsoft.com parent domain name? Of course it does! When Microsoft wants to create a new domain under Microsoft.com, what they are doing is creating a sub or child domain. For example, Microsoft could decide to create a new domain within their company for their sales department and name it Sales. The sales child domain would now fall under its parent domain of Microsoft.com. Together, the entire domain would be sales.microsoft.com. Microsoft doesn’t really need permission to create this child domain. They just need to make sure that users can reach it. If users can connect to Microsoft.com, which is the “root” domain at Microsoft headquarters, then it is the responsibility of Microsoft themselves to make sure that users can also reach computers within the sales.microsoft.com domain. The .com domain is just responsible for directing users to Microsoft.com, in most cases.
The last piece of the DNS puzzle is the computer hosts themselves and how they fit into DNS. This part is very important to understand because it forms the basis of name resolution. Continuing the Microsoft example, they can have a number of hosts within the Microsoft domain and similarly so within their Sales domain. If a physical computer in the parent domain (Microsoft) is labeled Alice, how would you think this computer’s label within the DNS hierarchy would look like? Simple. Once again, we just add another dot after the label to separate the different levels of the DNS pyramid. So, the complete computer name for a computer named Alice within the Microsoft.com domain would be: Alice.Microsoft.com. When labeled this way, this can also be considered the fully qualified domain name (FQDN) of the computer. A FQDN label is basically a computer’s name from the most bottom part of the DNS pyramid all the way up to the root domain of DNS. In other words, a simple look at a FQDN tells you where it is that specific computer host sits within the DNS pyramid. One look at the FQDN I’ve given earlier immediately lets me know that there is a computer with a name of Alice within the Microsoft domain, which is registered under the .com domain and of course, that in turn is under the root domain.
Going with our child domain example, how would the FQDN of a computer host with a name of Bob look within the Sales domain? Simple. Once again, we just tact on the extra information. So, the FQDN would look like: Bob.Sales.Microsoft.com. Once again, given this information, we can easily see how this specific computer fits in the DNS hierarchy from way down bottom all the way back up to the root.You should remember that the topmost domain in the DNS pyramid, the root, is an actual domain and it’s is not there just to look pretty! When talking about FQDN, the root domain actually gets appended to the label as well. Because the root domain is just represented as a single dot, a FQDN should always end with a dot as well. Microsoft.com is incomplete. Microsoft.com. is the actual FQDN. However, most browsers automatically append this special “.” for you when you enter in a URL address because while many users know about top level domains such as .com and .net, they most likely have no clue about the root domain, which sits above the top level domain! You most likely don’t belong in this category anymore after reading reading this article! Hey, you’re now considered smarter than the rest of the average Joes where name resolution is concerned!
Hold Up, Wait a Minute…
By now you may have noticed a simple pattern when looking at a FQDN. The left most portion (or the beginning) of the FQDN represents an actual computer host name. In other words, it represents an actual computer on a network. By now something should have struck you as very odd and peculiar. If what I just said was true, then am I actually telling you that when you type in a URL address of www.cnn.com that the www part is actually a real computer behind the scenes? Well, yes, that is exactly what I’m saying! When you enter in an URL address such as www.cnn.com, what your computer actually does is request the actual IP address for the computer named www within the cnn.com domain. In almost all cases, the address returned is the IP address for the computer named www, which in all likelihood is a web server of some sort. This isn’t always the case as companies deploy many security solutions to protect their resources but for the nature of this discussion, you can go ahead and believe just that to simplify things. In the next article, I will actually go more into the details of the name resolution process so you can see exactly what happens.
Coming Up Next…
In the next article, I’ll actually be explaining what name servers are and the data that stored within them. In this article, I’ve laid down the very basics of the DNS structure and namespace. This was obviously not meant to be a technical article and I’ve tried my best to really make things as simple as possible without overloading you with different terminologies. Here are some of the key pieces of information you need to understand from this article prior to continuing on to the next:
- Understand how computers communicate at a very high level. The key takeaway is that humans use names such as www.cnn.com while computers use IP addresses, or numbers, such as 192.168.1.1, to represent the same piece of information. This ultimately leads to a need for name resolution.
- Understand how the HOST file works. Although this file is rarely used in all but the most specific of scenarios and circumstances, it gives you a good understanding for why a system such as the DNS was sorely needed.
- Understand how the DNS pyramid, or hierarchy to be more precise, looks like. You should understand that the system is broken down to different levels, which can be managed by different organizations.
- Understand what a FQDN looks like and how it is used to map a specific host from the most bottom of the DNS hierarchy all the way back up to the root domain and vice-versa.
Once you are confident in your knowledge, you can safely move on to the next article where things get a bit more technical!