Why we needed something else
To deploy our apps, we instantiate a Helm chart with the appropriate values in our Kubernetes cluster. Our ingress is managed by an alb-ingress-controller, which creates an application loadbalancer in our AWS account with all the ingresses we need. But when a loadbalancer is online, it needs a DNS name, and AWS does not allow an ALB to create DNS records itself on Route53. Until recently we used a lambda function triggered by a cloudwatch event when we create an ALB. It reads a custom tag attached to the created ALB, containing the ALB's DNS name, and creates a DNS record if it did not exist. Further complicating matters, those DNS records are merely intermediary. The actual DNS records that our clients use to access our applications are not managed in AWS Route53, and that will not change until we finish our migration to AWS. We can't abandon our good old datacenters and the associated DNS infrastructure: a lot of applications and clients are still using it, and the legacy yak-shaving is never a quick journey.
This was not usable, and we needed an other solution.
What external-dns does
Basically, external-dns is a pod that runs in your Kubernetes cluster, reads your ingresses and services and creates DNS records in a DNS zone manager accessible via API. We use AWS Route53, but it can work with all cloud providers, PowerDNS, and so on ! It supports even bind9 and active directory using DNS UPDATE messages, see RFC 2136 for more info. This all sounds very simple, but setting things correctly to fit our use-case took quite a bit of documentation reading.
Our external-dns use case
Requirements
I can't make breaking changes to the infrastructure just because I add a DNS stack. Full compatibility with my colleagues' work is a requirement: they did a great work and already deployed a lot of stuff! For instance, we have an alb-ingress-controller in our Kubernetes cluster to create load balancers: I can't decide to change variables used by this piece if I need other values in my DNS stack. Also, I need to come up with a solution that allows piece-by-piece migration: we want to migrate our DNS management deployment by deployment, as per the SRE/devops motto "atomic changes, often". I would also prefer not to redesign half our infrastructure stack just to add my DNS brick. Therefore, I need:
- to deploy external-dns with the Helm Terraform provider within the same Terraform module that deploys my EKS cluster
- a well-maintained Helm chart to deploy external-dns
- to be able to integrate with the alb-ingress-controller
- to let external-dns create DNS records in Route53 but only for specific zones
IAM and Kubernetes permissions
Given the above requirements, I have to let external-dns create and delete Route53 records.
After digging into external-dns documentation and AWS documentation, I came up with a clean solution:
- create an IAM Role that allows an external-dns Kubernetes service account to perform an assumeRole using our EKS OIDC provider
- create a service account for external-dns in Kubernetes that uses the above IAM Role
- create an IAM policy that allows modifying Route53 records and attach it to the above IAM Role
- create a Kubernetes cluster role (following external-dns docs) allowing external-dns to read the necessary information about the ingresses and services, and attach it to the service account with a cluster role binding.
- deploy external-dns with the Helm chart bitnami/external-dns, using the correct parameters.
Terraform files
We deploy our Kubernetes cluster with the EKS Terraform module like this
module "eks" { source = "terraform-aws-modules/eks/aws" version = "12.0.0" (... truncated ...) }
We use the module outputs in our resources, and we format our OIDC URL to use it as expected by the providers. You may not have the same naming or tagging policy, our purpose here is not to convince you to use ours, just to provide a possible solution. We have also decided to deploy our external-dns in the kube-system namespace. This part is totally up to you, as are the naming and the variables.
OIDC URL
locals { oidc_url = replace(module.eks.cluster_oidc_issuer_url, "https://", "") }
AWS IAM Role
resource "aws_iam_role" "external_dns" { name = "${module.eks.cluster_id}-external-dns" assume_role_policy = <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${var.aws_account_id}:oidc-provider/${local.oidc_url}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${local.oidc_url}:sub": "system:serviceaccount:kube-system:external-dns" } } } ] } EOF }
AWS IAM Role Policy
resource "aws_iam_role_policy" "external_dns" { name_prefix = "${module.eks.cluster_id}-external-dns" role = aws_iam_role.external_dns.name policy = file("${path.module}/files/external-dns-iam-policy.json") }
Kubernetes service account
resource "kubernetes_service_account" "external_dns" { metadata { name = "external-dns" namespace = "kube-system" annotations = { "eks.amazonaws.com/role-arn" = aws_iam_role.external_dns.arn } } automount_service_account_token = true }
Kubernetes cluster role
resource "kubernetes_cluster_role" "external_dns" { metadata { name = "external-dns" } rule { api_groups = [""] resources = ["services", "pods", "nodes"] verbs = ["get", "list", "watch"] } rule { api_groups = ["extensions", "networking.k8s.io"] resources = ["ingresses"] verbs = ["get", "list", "watch"] } rule { api_groups = ["networking.istio.io"] resources = ["gateways"] verbs = ["get", "list", "watch"] } }
Kubernetes cluster role binding
resource "kubernetes_cluster_role_binding" "external_dns" { metadata { name = "external-dns" } role_ref { api_group = "rbac.authorization.k8s.io" kind = "ClusterRole" name = kubernetes_cluster_role.external_dns.metadata.0.name } subject { kind = "ServiceAccount" name = kubernetes_service_account.external_dns.metadata.0.name namespace = kubernetes_service_account.external_dns.metadata.0.namespace } }
Kubernetes Helm release
resource "helm_release" "external_dns" { name = "external-dns" namespace = kubernetes_service_account.external_dns.metadata.0.namespace wait = true repository = data.helm_repository.bitnami.metadata.0.name chart = "external-dns" version = var.external_dns_chart_version set { name = "rbac.create" value = false } set { name = "serviceAccount.create" value = false } set { name = "serviceAccount.name" value = kubernetes_service_account.external_dns.metadata.0.name } set { name = "rbac.pspEnabled" value = false } set { name = "name" value = "${var.cluster_name}-external-dns" } set { name = "provider" value = "aws" } set_string { name = "policy" value = "sync" } set_string { name = "logLevel" value = var.external_dns_chart_log_level } set { name = "sources" value = "{ingress,service}" } set { name = "domainFilters" value = "{${join(",", var.external_dns_domain_filters)}}" } set_string { name = "aws.zoneType" value = var.external_dns_zoneType } set_string { name = "aws.region" value = var.aws_region } }
IAM policy file
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"route53:ChangeResourceRecordSets"
],
"Resource": [
"arn:aws:route53:::hostedzone/*"
]
},
{
"Effect": "Allow",
"Action": [
"route53:ListHostedZones",
"route53:ListResourceRecordSets"
],
"Resource": [
"*"
]
}
]
}
Variables
variable "aws_account_id" { description = "account id number" } variable "aws_region" { description = "The AWS region to deploy to (e.g. us-east-1)" } variable "external_dns_chart_version" { description = "External-dns Helm chart version to deploy. 3.0.0 is the minimum version for this function" type = string default = "3.0.0" } variable "external_dns_chart_log_level" { description = "External-dns Helm chart log leve. Possible values are: panic, debug, info, warn, error, fatal" type = string default = "warn" } variable "external_dns_zoneType" { description = "External-dns Helm chart AWS DNS zone type (public, private or empty for both)" type = string default = "" } variable "external_dns_domain_filters" { description = "External-dns Domain filters." type = list(string) }
Using external-dns in our Helm charts and releases
Quick reminder: not all our client-facing DNS records are managed by AWS. That's why we want to enable a technical domain name as an intermediary. In our current deployments, we use the ingress template to configure both our ALB and our DNS. So, in our Helm chart we ensure 2 kinds of lines are present in our ingress template:
Kubernetes ingress manifest
apiVersion: v1
items:
- apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
external-dns.alpha.kubernetes.io/hostname: incredibleapp.technicalname.technicaldomain
(...)
spec:
rules:
- host = fantasticmodule.api.commercialName.commercialDomain
(...)
- host = incrediblemodule.api.commercialName.commercialDomain
(...)
Following the above example, the host lines ensure that the ALB will be configured as expected, with the correct TLS certificates, and will respond to the expected HTTP requests.
If the commercialDomain is delegated to and managed in AWS, and the external-dns domain filters allow commercialDomain, there is no reason to add the annotation, because external-dns will be able to add all the necessary records.
If, like us, your commercialDomain zone is managed manually, you can use the annotation. External-dns will not create commercialDomain records, but create an ALIAS record incredibleapp.technicalname.technicaldomain pointing to the ALB. You only have to create a CNAME record with *.api.commercialName.commercialDomain pointing to incredibleapp.technicalname.technicaldomain and you are all set!
If the ALB is deleted and recreated with another endpoint, external-dns replaces the DNS ALIAS record and this does not require any intervention on your application.
What we learned
Debug hints
- Use DEBUG log level for external-dns if you're deploying for the first time.
- Read alb-ingress-controller logs or whatever ingress manager logs you have in your cluster: your external-dns might not always be the problem.
Automate or die
In our cloud native environment, our systems are probably going to grow constantly. So we've stuck to a simple motto.
Our cloud motto
On the cloud, it's either automated or it does not exist.
I strongly advise you to use Terraform to automate your deployments, to both have a reproducible and quickly-deployable infrastructure.
It's not perfect
Of course, there is room for improvement. For example, you could limit the DNS zones that external-dns can modify in the IAM policy. It would add a nice level of security and enforce a least privilege access policy.