TCH-Ryan Posted August 21, 2008 Posted August 21, 2008 We take all security very seriously here at TCH and we recently began taking another look at some of our core infrastructure that we manage including DNS which highlighted our still excessive use of DNS Recursion. Now, although this in the capacity that TCH uses it is not that big a deal, remote reporting services such as DNSreport and intoDNS feel differently on the matter and it gives clients a conflicting sense of security. The first thing you should understand is that Recursive DNS was created to speed up the performance of DNS on the Internet by allowing subordinate DNS servers to cache results from neighboring DNS servers on domains they host, which reduces traffic and response time for DNS requests. The TCH network generates over 60 million DNS queries every single day and to simply abandon something that has allowed our networks to maintain very efficient DNS performance over the years was not a simple task. What makes Recursive DNS queries dangerous is the fact that if certain situations exist it can allow the cache on the DNS server to be poisoned with forged records then any domains hosted on that DNS server would then begin reporting those forged records. This however is a symptom of broader mismanagement of DNS servers and recursive queries alone do not make for an insecure DNS server. We feel strongly on this matter that security is part of a larger and layered approach, to depend on the status of a single feature or resources (i.e recursive dns) as a measure of security is really very misleading. For further reading on Recursive DNS & Recent DNS protocol vulnerability please see: http://www.totalchoicehosting.com/forums/index.php?showtopic=32238&st=0&p=226849&&do=findComment&comment=22684 In any case, we felt that the time had come to evolve our DNS infrastructure to the next level that would provide public assurance to the integrity of our DNS servers, of which we presently manage 6 public DNS servers. This meant removing any conflicting doubt in the security setup, which in turn translated into disabling recursive DNS queries across the board. The major goals we undertook in this process were finding out what to do with the 60 million+ DNS queries our networks generate every day and how do we go about mitigating performance impacts from removing Recursive DNS. In addition we also proceeded with a full audit of the records on all our DNS servers, rebuilt fresh configuration files (these are massive 125k + line files) and ensured the proper operation of synchronization mechanisms between DNS servers. The first tasks we went ahead with was the actual auditing process of the DNS servers as we felt this was a building block position to begin with that would allow us to conduct later tasks with confidence that DNS servers were operating properly. - This involved a complete review of all the domain names we presently host and the consistency of the records versus those stored on the actual web servers that generate them. - We then went along to ensuring that ownership permissions of all DNS related data was set properly thus ensuring all components of our DNS systems can properly access data. - This was then followed by rebuilding the full configuration files that load the tens of thousands of domain configuration files files for the DNS servers. The newly generated configuration files numbered in at 125,000 lines versus the old configuration files being bloated at 175,000~ lines, this was mostly due to lots of erroneous spacing and old record references that no longer existed. - Finally we made sure that the records and configuration files matched up between our 3 pairs of DNS servers indicating proper synchronization of DNS changes. With the basics of the auditing phase behind us we then began to implement changes to all our servers in how they perform DNS resolution requests, we altered this setup so that all DNS queries for domains we do not manage are forwarded to a special set of DNS caches. These are high performance DNS systems that host no domain names directly and as such do not fall victim to the issues of DNS Cache poisoning with respect to Recursive DNS, similar to how openDNS.com operates. This is a different approach than most web hosts take as they typically "filter" recursive queries from the world so that only internal web servers and other related systems may perform recursive lookups, which presents the very real risk that any compromised system within your network can potentially poison your DNS servers. These new DNS cache systems now absorb all our external DNS load and in the process we have increased the performance of DNS resolution on our network by two fold. We did not come to this conclusion with a simple test of a couple of DNS requests, we came to it with the resolution of over 75,000 DNS requests of both valid and invalid domain records against our old and new setup. The timing of requests against our old DNS setup came out to roughly 381 requests per/second average and the performance of our new setup comes out to 736 requests per/second average which translates to handling the load of almost 3 million requests per hour during peak weekday hours without a single hiccup. These are relative to a single DNS server so taking these numbers into perspective across multiple DNS servers means by increasing performance we have in turn increased the scalability of our DNS infrastructure for the future. Having this system in place and performance tested, we then proceeded to disable recursive lookups on all our DNS servers, which was a quick and painless task that involved no direct downtime of any resources. In summary, we have increased the consistency and ease of management of our DNS servers in addition to the end goal of retaining and even increasing performance while removing our dependency on recursive DNS. You may use http://www.intodns.com to check and valid the status of recursive lookups for your domain against our DNS servers, that in fact they are disabled. The only exception to this is resellers using vanity(custom) DNS, these DNS servers are separate of our core infrastructure (26 smaller scale DNS servers) and are being updated over the next 24hours, so please check back if reseller controlled domains report recursion still enabled.
Recommended Posts