Firezone logo light
Gabriel Steinberg

Senior Backend Engineer

How DNS Works in Firezone

user@host:~ % nslookup github.com
Server:		100.100.111.1
Address:	100.100.111.1#53

Non-authoritative answer:
Name:	github.com
Address: 100.96.0.13

Firezone's approach to DNS works a bit differently than one might expect. One question we get a lot is, "why do my DNS Resources resolve to different IPs with Firezone enabled?". Great question. Let's explain that now.

What follows is a quick recap of how DNS works, a couple of the security issues it faces, and how Firezone's DNS-based traffic routing was designed to address them.

Quick recap: DNS

Before we dive into all the fun details, let's briefly cover how DNS works.

At a high level, DNS is a hierarchical system that distributes the responsibility of resolving a fully-qualified domain name (FQDN) to a series of nameservers, each one responsible for resolving a different part.

How DNS works

Figure 1: How DNS works

Here's an abbreviated summary of how it works:

  1. An application makes a query. The first stop is the stub resolver, a small piece of software on the host that's responsible for resolving all DNS queries on the system.
  2. The stub resolver forwards the query to an upstream resolver. This is typically a caching resolver run by your ISP (or more recently, a public DNS service like NextDNS or Cloudflare).
  3. If the query misses the cache, the upstream resolver begins the full process of resolution. It forwards the query to a series of successive nameservers -- first the root nameserver, then the TLD nameserver, and finally the authoritative nameserver, each one responsible for resolving a different part of the FQDN.
  4. The authoritative nameserver responds with the IP address of the host in question, and the upstream resolver returns the final answer to the stub resolver on the host that originally made the query.
  5. The application on the host can now connect to the IP address returned by the stub resolver.

On today's internet, the whole process for resolving a query typically takes a few hundred milliseconds. Caching resolvers help to speed this up by storing the results of queries for a certain amount of time, known as the record's time-to-live (TTL). So if a host makes the same query multiple times, the upstream resolver can return the result immediately (assuming the TTL hasn't expired) without having to query the hierarchy of root, TLD, and authoritative nameservers again. This can speed up query times by orders of magnitude, to the point where upstream resolvers responding with cached responses are nearly instantaneous.

DNS works today almost exactly as it did when it was first introduced to the ARPANET in the early 1980s. But the internet has changed a lot since then, and security issues have emerged that the original design didn't account for.

Security issues with DNS

The thing is, DNS was designed when the ARPANET was a small, trusted network of research institutions and government agencies. The system was designed with the assumption that all entities on the network were known, and that the network itself was secure.

As the ARPANET grew to become the internet, however, this assumption no longer held. Two security issues in particular have become popular tools in the attacker's arsenal since DNS was first introduced: DNS spoofing and DNS enumeration.

DNS spoofing

One of the security issues immediately apparent with DNS is you have to trust the nameservers you're querying. If a malicious actor manages to compromise any part of the path between you and a nameserver, they can return false responses to your queries. Since your machine has no other way to verify the authoritative answer, it will happily connect to whatever IP address the malicious nameserver returned. This is known as a DNS spoofing attack.

In recent years, various solutions have been created that, when used properly, render this mostly a solved problem. DNSSEC ensures the integrity of responses to your query, and DNS-over-TLS (DoT) and DNS-over-HTTPS (DoH) take this a step further and prevent eavesdroppers from seeing which queries you perform.

DNS enumeration

The above solutions work well for protecting the answers to your queries, but what about protecting your nameservers from the queries themselves?

If an organization defines records for its internal services, for example, anyone can query its authoritative nameservers to map out a list of all the organization's internal services. How convenient!

This is known as DNS enumeration, and is a common first step taken by a malicious actor looking for potential entrypoints into an organization. Because of this issue, organizations often resort to running their own nameservers, configured to return different results (or sometimes none at all) depending on the source of the request. This technique is known as Split DNS.

How Split DNS works

Split DNS is a technique wherein an organization maintains two (or more) separate nameservers (or a single one configured with two "zones") -- one for internal resources and one for external resources. The internal server is only accessible to users who are connected to an organization's internal network (such as a VPN or branch office network), while the external server is accessible to the outside world.

How Split DNS works

Figure 2: How Split DNS works

As an example, let's say an organization has an internal service called gitlab.company.com which lives at the internal address 10.10.10.10. The organization's internal nameserver would be configured to respond to queries for this service to VPN or branch office workers, but the external nameserver would simply return NXDOMAIN, DNS speak for "not found". This allows the organization to publish some records publicly, like www.company.com so that its website is accessible to the public, while leaving the addresses of its private resources a secret.

All that's left is to sprinkle a little bit of DNS configuration onto your workforce's machines to make sure the right server is being queried for the right domains, and you're set.

Split DNS is a great building block for organizations looking to secure access to their own internal applications, and continues to be a popular way to mitigate enumeration attacks today.

Limitations of Split DNS

Split DNS works great when you have a clear distinction between external and internal resources. It allows you to publish public addresses for your public resources so anyone can access them, and publish private addresses for your workforce so they can connect to the private resources that you manage.

Increasingly, however, cloud-delivered services are replacing their on-premise equivalents across many organizations. The upside here is generally lower operational cost -- pay the businesses making the software to host it for you as opposed to hosting and managing it yourself, and reap the efficiency benefits in the form of cost reduction.

But the downside is resources that were once internal are now publicly exposed. Anyone with the right credentials can access your organization's code repositories, CRM data, or CI/CD secrets from anywhere in the world. Since these services are now available publicly, they no longer have internal addresses, and without internal addresses to resolve to, Split DNS isn't as helpful anymore.

Is there another way to secure access to these services?

A naive solution

As it turns out, there's a solution to this problem that's becoming more common these days: IP allowlists. Many third-party SaaS apps like GitHub, Slack, and Hubspot allow you to configure a list of source IP addresses that are allowed to access the service.

Example of IP allowlist for www.github.com

Figure 3: IP allowlist protecting access to www.github.com

Now, some readers will have already recognized the solution to our public exposure problem: route your workforce's traffic for these services through a VPN jumphost or office router, egress the traffic through a static IP added to your SaaS provider's allowlist, and problem solved, right?

Well, kind of. There's just one issue with the above approach: virtual hosting.

Virtual hosting is a technique used to host multiple services at a single IP address. It's become an essential tool in the arsenal to fight IPv4 address exhaustion, and is often used in IPv6 networks as well.

If we simply resolve github.com and then configure our VPN to route traffic for that IP address through the jumphost, we might inadvertently route traffic for gist.github.com, api.github.com, or raw.githubusercontent.com too!

Collateral damage from naive IP routing

Figure 4: Collateral damage may occur if you naively route resolved IPs

This creates a problem: any service that shares its IP address with other services can't be secured using a naive IP allowlist approach.

NAT to the rescue

So we can't simply resolve service IP addresses and route them as-is. We need to translate them somehow to make sure they don't conflict with the resolved addresses for services we don't wish to route. Enter NAT: Network Address Translation.

We can solve the problem above by intecepting the DNS query to the service, generating a unique IP address for it instead of the actual one, and then add a bit of NAT after the jumphost to convert the generated IP address back to the actual IP address of the service.

This solves the collateral damage problem of routing traffic for the wrong service, but it introduces a new problem: we need a way to intercept DNS queries for the services we're trying to secure, generate a unique IP for them on-the-fly, and then somehow route the associated traffic through a properly configured NAT gateway to the service in question.

DNS interception + NAT gateway

Figure 5: DNS interception + NAT gateway: problem solved?

Seems complicated doesn't it? Keeping all of this in sync and up-to-date would be a configuration nightmare. We've taken what was originally a DNS problem and translated it to a configuration problem. But lucky for us, configuration problems tend to be more solvable.

DNS-based traffic routing

And we're finally to the part where Firezone comes in.

Firezone's approach to DNS was designed to combine the benefits of Split DNS for internal services with the routing benefits for IP-allowlisted public services. Let's see how.

How it works

Remember the stub resolver we introduced earlier? Recall that it's a small piece of software on the host that's responsible for resolving all DNS queries on the system. Well, each Firezone Client embeds a tiny, lightweight stub resolver that works just like the one your operating system provides, but with a special twist.

For DNS queries that don't match any of your Firezone-defined Resources, it operates like any other stub resolver, forwarding the query to your system's default nameservers as if Firezone didn't exist. For DNS queries that do match a defined Resource, however, it behaves a bit differently.

Instead of forwarding the query to your default nameservers, our special stub resolver generates a special, internal IP for the Resource and responds immediately with that, storing the IP in a lookup table.

Upon seeing packets through the tunnel matching the IP in the lookup table, the Client generates a request to the Policy Engine to authorize the traffic. If approved, the Policy Engine forwards the IP and DNS name corresponding to the Resource to a Gateway that's available to serve the Resource.

The Gateway then resolves the actual IP address for the Resource (using its stub resolver) and stores a mapping between it and the special, internal IP we generated earlier.

When the Gateway sees traffic for the special, internal IP, it translates translates it back to the actual IP of the Resource and forwards the traffic along to the Resource.

If you're new to Firezone, read more about Gateways and the Policy Engine in our architecture docs.

Firezone's DNS-based traffic routing

Figure 6: Firezone's DNS-based traffic routing

Now, as the application sends packets to the dummy IP, they're routed through the newly established Firezone tunnel to the Gateway that resolved the query. The Gateway then forwards this traffic on to the public service, setting the source address to the static IP we've configured in the service's allowlist (achieving the NAT function mentioned earlier), and we've now routed traffic to the service through Firezone without affecting any other services that share its IP.

All of this happens in about the same time it would take for a query to be resolved without Firezone, so the application (and end user) are none the wiser.

The query is resolved over a secure WebSocket transport via Firezone's control plane, protecting against the spoofing attack mentioned earlier. And since the actual resolution takes place on the Gateway running in your protected environment, enumeration attacks are also mitigated.

All that's left is to add the Gateway's IP address to the service's allowlist, and you've now routed your traffic for the service through Firezone without the collateral damage problem we covered above.

How it's implemented

We glossed over lots of details above. The section below gets a bit more technical, so if you're not interested in the nitty-gritty details, feel free to skip ahead to the conclusion. If you are, well, let's dive a little deeper.

Query interception

The process described above actually starts when you sign in to the Firezone Client. When this happens, the Client reads which nameservers are available on the host (from /etc/resolv.conf for example), generates corresponding sentinel addresses for each one, and then configures the host's operating system to use these as the host's default nameservers instead.

For each IPv4 and IPv6 nameserver it finds on the host, the Client generates a matching sentinel address in the 100.100.111.0/24 and fd00:2021:1111:8000:100:100:111:0/120 ranges for IPv4 and IPv6 nameservers, respectively. This is why you'll often see nameserver 100.100.111.1 as one of your upstream resolvers in /etc/resolv.conf while the Client is connected.

A nice side effect of this one-to-one mapping approach is that it won't affect the selection algorithm your operating system uses to pick healthy nameservers -- if one is down, the corresponding sentinel address will be unresponsive, and the operating system will pick another, responsive sentinel to use instead.

Custom nameservers

As we were building all this, we thought of another feature that might be useful to organizations. Instead of using the host's default nameservers to forward non-matching queries to, we allow the admin to specify them in the Firezone admin portal instead. The Client will then generate sentinel addresses for these nameservers and use them instead of the host's for all other queries on the system.

This is useful for protecting queries that don't go through Firezone. For example, you can configure a DNS filtering provider to block malicious DNS queries across your workforce. Or you could point it to your organization's internal nameservers to resolve internal services like a more traditional Split DNS configuration.

Generating the mappings

Ok, so that covers how queries are intercepted, but how does the stub resolver generate the dummy IP addresses? Let's step through an example to illustrate.

In this example, the admin wants to secure access to Slack, but the process works the same for any third-party SaaS service.

  1. An admin defines a DNS Resource with address *.slack.com in the Firezone admin portal. Notice the wildcard -- this will route all subdomains for Slack through Firezone as well, which helps ensure all relevant Slack traffic is routed.
  2. The admin then defines a corresponding Policy with the Groups that should have access.
  3. All connected Clients affected by the Policy will immediately receive the new Resource definition.
  4. Upon receiving the Resource definition, the Client configures the stub resolver to begin intercepting queries for *.slack.com.
  5. When it sees a match, the stub resolver forwards the query to the Policy Engine, which reauthorizes the query and finds a healthy Gateway to resolve it.
  6. The Gateway resolves the query, taking note of which Client asked it, and then returns all of the resolved IP addresses for the query to the stub resolver in the Client.
  7. The stub resolver then generates a unique, mapped IP address for each resolved IP address, and the Client adds these addresses to the host's routing table.
  8. The stub resolver returns the mapped IP addresses to the application on the host that made the query.
  9. The application then begins sending packets to the dummy IP address, where they're routed through a newly-established WireGuard tunnel to Gateway we just resolved the query with.

Similar to the way the sentinel addresses work above, the stub resolver generates a single IPv4 or IPv6 address for each resolved IP address returned by the Gateway, picking a sequential address from the 100.96.0.0/11 and fd00:2021:1111:8000::/107 ranges to to map A and AAAA records respectively. This ensures that things like timeout behavior and round-robin DNS continue to function with Firezone enabled just as they did before, without affecting applications.

Conclusion

So now you know where those strange IPs are coming from -- within the Client itself. Now when you dig a service and get a response that looks like 100.96.X.X, you can be sure Firezone is working to secure access to it.

We could go on for some time about all the fun edge cases that arise from doing this sort of thing, but we'll stop here. If you really want a peek under the hood at how all this works, it's all open source -- take a look for yourself!

Firezone Newsletter

Sign up with your email to receive roadmap updates, how-tos, and product announcements from the Firezone team.

Sign up for our newsletter