Privacy-friendly EDNS Client Subnet

21. siječnja 2024. 7 min čitanja

This article explains how we use ECS (EDNS Client Subnet) within AdGuard DNS, a mechanism employed by DNS servers to deliver location-specific results to clients. For those preferring a visual explanation, my talk at DNS OARC 41 covers similar content and can be viewed on YouTube.

ECS, crucial for providing tailored DNS responses, comes with privacy and performance concerns. AdGuard DNS currently serves over 100 million users globally (and this number grows rapidly), with 16 points of presence worldwide. The challenge we face is to develop a solution that would enable us to deliver accurate responses, regardless of server location, while addressing these concerns.

What is EDNS Client Subnet

Traditionally, CDN providers have steered users to content servers closest to them, and they would do that based on the location of the recursive DNS server. This method was effective when most people used the recursive DNS servers provided by their ISPs. Over time, however, as more people began using public resolvers like AdGuard DNS, the DNS server location stopped being a less reliable indicator for user’s real whereabouts.

Consider this scenario: an AdGuard DNS user from India is likely routed to our Singapore server. With traditional DNS, nameservers can only rely on the DNS server’s location and would therefore serve a response best suited for a Singapore user. This means there’s no guarantee that the IP address provided would be optimal, or even functional, for a user in India.

Explaining what could go wrong without taking user location into consideration.

Fortunately, there’s a solution to this problem: EDNS Client Subnet, or ECS. This feature’s design is straightforward: it includes the client’s subnet information in DNS queries sent to authoritative nameservers. With ECS, the nameserver can ignore the query’s originating IP address and instead use the ECS to determine the most appropriate response.

What could go wrong
How ECS works.

How is this useful? The most common application of ECS is Geo-steering, which is used by virtually all CDN providers. Unlike Anycast, DNS-based Geo-steering is more reliable and manageable.

ECS also has other applications, such as in online advertising and access restrictions. For example, I’ve observed instances where a nameserver would only respond to queries if their IP address indicates that they originated from a specific country, or included an ECS from that country.

The most popular case — using ECS for Geo-steering.

Interestingly, EDNS Client Subnet is extremely popular. Our observations show that about 67% of DNS queries involve domain names whose nameservers support ECS.

ECS is very popular
Comparing queries to ECS-enabled domains with other queries.

ECS issues

The concept of EDNS Client Subnet (ECS) is, at its core, both reasonable and understandable. However, while ECS presents several improvements in terms of providing more tailored responses to users, it is not without its challenges.

Privacy issues

In the realm of privacy concerns, a primary issue with ECS is the level of user data exposure. Although ECS transmits only a portion of the user’s IP address, specifically the subnet rather than the full address, this still constitutes a significant amount of information — more than what nameservers typically require or should have access to.

Another significant concern revolves around user consent. The complexities of DNS operations and the workings of ECS are not common knowledge among the general public. Many users are not aware that making DNS requests could result in sharing a part of their IP address with authoritative DNS servers, a process they have not explicitly agreed to.

In fact, these privacy and consent issues are not just theoretical; they are acknowledged within the very framework of ECS. The RFC7871 governing ECS acknowledges these challenges and consequently advises that ECS “should be disabled by default to uphold privacy standards.”

What part of IP address leaks via ECS
Demonstration of what part of your IP will be received by nameservers.

Negative impact on caching

Privacy concerns are not the sole deterrent to implementing EDNS Client Subnet (ECS) in recursive resolvers. Another significant, and perhaps even more critical issue, is the detrimental effect that ECS has on caching efficiency. For example, real-life data demonstrates how DNS cache efficiency is compromised when ECS is activated. While improvements are possible, the overall efficiency is largely contingent on the granularity of the nameservers’ responses.

Key challenges include:

Reduced effectiveness of caching: ECS can significantly diminish the efficiency of DNS caching mechanisms.
Potential for cache pollution: with ECS, there’s an increased risk of contaminating the cache with irrelevant data.
Incorrect usage by nameservers: nameservers often misuse ECS, further complicating the issue.

Our tests have provided concrete evidence of this impact. We observed that the cache hit rate plummeted from approximately 80-85% to a mere 35-40% when ECS was enabled. While these figures can vary based on factors like the DNS server load, the downward trend is clear: ECS substantially reduces the efficiency of DNS caching.

Low DNS cache efficiency when ECS is enabled
If you’re still not bothered by privacy issues, maybe these numbers will convince you.

Solving ECS issues

Can these issues be solved while keeping ECS reasonably effective? We think that yes, there are ways. Let’s see what they are.

Do we actually need ECS?

Considering these serious issues with ECS, one might wonder if it’s truly necessary. Some DNS recursor operators argue that they can manage without it. For instance, Cloudflare contends that their cache is adequately localized due to their extensive network of hundreds of PoPs worldwide.

While they might have a point, there are counterarguments to consider:

What about DNS recursors operating with a fewer number of PoPs?
Large content providers, like Netflix, often have servers within ISP networks. No matter how many PoPs a service has, it’s unlikely to have a server in every ISP network.

To summarize, if you’re operating a public DNS resolver, it’s probable that you’ll need to rely on ECS to enhance the accuracy of your responses. This is especially pertinent to services with less extensive infrastructure compared to giants like Cloudflare.

Replace subnets with AS numbers

The solution first proposed by NextDNS in 2019 presents an interesting approach. The concept is straightforward: what if, instead of using subnets, we use Autonomous System numbers (AS numbers, or ASNs)? Could this be the key to solving our problems?

Let’s consider this idea:

There are significantly fewer AS numbers compared to the variety of /24 subnets.
In theory, this approach should greatly enhance caching efficiency.
It also represents a major privacy improvement, especially when compared to sending a user’s /24 subnet.

However, it’s not as simple as directly replacing a subnet with an AS number in the EDNS Client Subnet extension; that’s not its intended function. So, what’s the alternative?

Firstly, you create a map where each AS number is keyed to a random subnet announced by that AS.

ASN to subnet map
A map where the key is AS number and value is a random /24 subnet that belongs to that AS.

When a DNS query is received, determine the AS number of the user’s IP address. This can be done using various methods, such as consulting a MaxMind database or an IP2Location database.
Once the AS number is identified, refer to the previously constructed map to find the corresponding subnet.
Use this subnet in the ECS extension of the query sent to nameservers.

Replacing subnets with ASN
Simple pseudo-code representing the algorithm.

Now that it’s operational, let’s evaluate its effectiveness. We need to assess two key factors: caching efficiency and accuracy. We’ll delve into accuracy later, but for now, let’s focus on caching efficiency. How much can we expect it to improve?

Caching efficiency improvement when using the new algorithm
Cache efficiency improvement when using the new algorithm.

Here’s what our observations show:

The hit/miss ratio increased to 75-80%, varying by location.
It’s important to note that the ECS cache is five times larger than the regular cache (used for domains without indicated ECS support in the response).
The hit/miss ratio is approximately 15% lower than that of the regular cache.

While these results are promising, the question remains: can we further optimize this solution?

Improving cache efficiency

There are several methods to enhance caching efficiency at a reasonable cost. Our initial strategy involved reducing the number of ASNs used as keys.

How did we achieve this?

For each country, we selected up to X most popular AS numbers based on our internal metrics.
If a query comes from an AS not on our list, we use the most popular AS from the user’s country.

Improving caching efficiency algorithm
Pseudo-code that demonstrates the algorithm.

After experimenting with various configurations, we settled on the following setup:

Limit the number of ASNs to 50 per country
Exclude any ASN that accounts for less than 0.1% of all queries received from that country

The resulting efficiency was approximately 8% lower than for the regular cache. Overall, the caching efficiency with this approach seems quite impressive, especially when compared to the standard ECS implementation.

Further improvement of the caching efficiency
Caching efficiency went up from 76% to 80%.

Dealing with large ISPs

Unfortunately, there’s a significant challenge with this approach: large internet providers. Certain ISPs, for example, can announce prefixes from a variety of different locations, and relying solely on AS numbers fails to provide the necessary precision. Take Comcast as an instance; an analysis of the prefixes they announce shows that their networks span across approximately 1600 different cities and 45 distinct regions (or subdivisions, according to MaxMind’s classification).

Comcast prefixes list
According to MaxMind GeoIP2 ISP database, Comcast prefixes are attributed to ~1600 different cities or 45 subdivisions (regions).

To illustrate this issue, consider the following examples. In the first image, a query using a Comcast subnet results in a response IP located on the US West Coast.

Comcast west coast
ECS from AS7922, response IP location is US West Coast.

In the second image, a different Comcast subnet is used, and the response IP is located on the US East Coast.

Comcast east coast
ECS from AS7922, response IP location is US East Coast.

So, what can be done to address this? One solution is to enhance the granularity of the cache by incorporating additional parameters. Here’s what we implemented in AdGuard DNS:

We integrated both country and subdivision into our subnet selection algorithm.
Recognizing that this issue predominantly affects ISPs in larger countries, we applied this logic selectively to just a few countries.

Pseudo-code for dealing with large ISPs
Using country and subdivision in addition to ASN.

With this approach, the responses become more precise, and caching efficiency surprisingly remains largely unchanged.

Summary

In the end, AdGuard DNS has adopted a less granular approach to ECS. This strategy enables us to provide accurate responses to the majority of our customers while protecting their subnet information from being exposed to nameservers. Additionally, this method maintains high DNS cache efficiency.

Is this the ideal solution that could eventually replace EDNS Client Subnet in the future? Probably not yet; at least, it’s not the final version. We still occasionally face inaccuracies and need to make manual adjustments. However, we believe we’re heading in the right direction.