Negative Cash Negative TTL

Lecture



In computer programming, a negative cache is a cache that also stores “negative” responses, that is, failures. This means that the program remembers the result indicating a failure even after the cause is eliminated. Usually a negative cache is a design choice, but it can also be a software bug.

Examples

Consider a web browser that tries to load a page while the network is unavailable. The browser will receive an error code indicating the problem and may display this error message to the user instead of the requested page. However, the browser incorrectly places the error message in the page cache, as this will cause the error to re-appear when the user tries to load the same page - even after the network is restored. The error message should not be cached by the URL of the page; until the browser can successfully load the page, whenever the user tries to load the page, the browser must make a new attempt.

The bad aspect of negative caches is that the user can make a lot of effort to fix the problem, and then, after identifying and fixing the root cause, the error still does not disappear.

There are times when fault tolerant states need to be cached. For example, DNS requires that caching name servers remember negative responses as well as positive ones. If the authoritative name server returns a negative response, indicating that the name does not exist, it is cached. A negative response may be perceived as a failure at the application level; however, to caching the name server, this is not a failure. The caching time for negative and positive caching can be adjusted independently.

Negative TTL for a domain The user can specify a number from 60 to 28800 (by default, 3600 - 1 hour).

When the recognizer receives a response to the request, it caches it for the TTL time specified in the record. For positive responses, the entry contains a TTL, but for negative responses (response code NXDOMAIN) there is no answer to the question. In this case, the response contains a SOA-record of the zone in the authority section. Negative caching is specified in RFC 2308 for at least the TTL SOA record and the minimum SOA field. For example, the original SOA record of the .ch zone looked like this:

dig +nocmd +noall +answer @a.nic.ch ch. soa
ch. 3600 IN SOA a.nic.ch. helpdesk.nic.ch. 2016041421 900 600 1123200 3600

SOA TTL is 3600, and the minimum SOA time is also set to 3600. The minimum of these two values ​​is, of course, 3600. This means that the negative caching time for any search for a .ch domain is one hour.

Lower negative caching time is more convenient.
People who are going to register a new domain name can also search for a name by DNS. However, this means that they simply cached the non-existence of the name in the resolver they used. A domain can be registered within a few minutes, and this may prevent them from using a domain name in their network for the duration of a negative caching time.

Another example is the domain abuse process for .ch / .li. We notify domain owners of hacked websites that are subject to abuse during a drive infection or host a phishing website. Some domain owners who do not take action to remove malicious content within a specified period of time temporarily suspend their domain name, as a result of which the delegation of the domain name is removed from the .ch / .li zone. Low caching time helps restore the domain faster when delegation is returned to the zone.

These are just two examples showing that, simply put, lower negative caching time is more convenient. On the other hand, less negative caching time also means a higher load on the name server.

Negative caching time on other TLDs.
I was interested to know what negative caching time was chosen by other TLDs, so I tested all TLDs that are currently delegated to the root zone. The table below shows the range and percentage of all negative caching times.

Negative Cash Negative TTL

Distribution of negative caching time among all TLDs

I was surprised that approximately 40% of all TLDs use high value in one day (86400 seconds). The old value for .ch / .li of one hour is actually not that bad compared to the fact that they still use the TLD. Fortunately for end users, recursive DNS resolutions enforce the TTL limits. For negative caching times, the maximum default values ​​from known converters are as follows:

  • BIND: 10800 (3 hours)
  • Unbound: 3600 (1 hour)
  • PowerDNS: 3600 (1 hour)
  • Windows DNS: 900 (15 minutes)

This means that it has few advantages when installed for more than 3 hours, if in any case the discriminators do not obey this value. In the spring of 2015, the DNS-OARC seminar was a great presentation by Microsoft on the topic “ Caching negative DNS records ”, which also explored this behavior.

Impact of loading queries on our name servers
In the past two weeks, we have reduced the time of negative caching from its source from 3600 to 1800 and, finally, 900 seconds. For comparison, I built the number of responses to the request with the response code NXDOMAIN. As you can see, the value for the selected name server slightly increased during the week. There are some fluctuations in the graph, the problem is that the measurement period is too short to compensate for other traffic noises.

Negative Cash Negative TTL

The response rate of NXDOMAIN on the selected server for three periods of time with different negative caching times.

Aside from a slightly larger number of requests for non-existent domain names, I also expected an increase in requests such as DS request from verifier verification. There is an interesting presentation from JPRS from the DNS-OARC workshop in the spring of 2013 that explored this issue in more detail. The plot in our case suggests that this is not a big problem for .ch / .li at the moment either (the negative caching line 900 behaves unpredictably. Again, I attribute this to a too short measurement period).

Negative Cash Negative TTL

Query query type query on the selected server for three periods of time with different negative caching times.

A small increase in requests per second is insignificant. We observe a much greater growth in the load of requests through the "natural" growth of the zone itself (the number of delegations). Each node has spare capacity for many thousands of requests per second, and this change does not cause any concern. Having said that, it would probably be reasonable for many other TLDs to reduce negative caching time.

Description

A negative cache is usually required only if the crash is very expensive and the error condition occurs automatically without user action. This creates a situation where the user cannot isolate the cause of the failure: in spite of the fact that he still can come up with the program, the program still refuses to work. When a crash is cached, the program should clearly indicate what should be done to clear the cache, in addition to describing the cause of the error. Under such conditions, a negative cache is an example of an anti-pattern design.

A negative cache can still recover if the cached entries expire.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Computer networks

Terms: Computer networks