September 23, 2019
Recently we changed to use an AWS NLB so we could use gRPC and HTTP/2 all the way to the microservices that answer the requests. We changed to the NLB easily enough and it seemed like everything was working fine. However after a few minutes of the new NLB running, we were approached by the other teams about X-Real-IP being missing. What was going on?
So for those that do not know when a HTTP request is forward to a service through a Load Balancer (LB). The LB will add a few request headers to the down steam request to indicate that the request has been forwarded and where it has come from. These headers include the X-Real-IP and X-Forwarded-For, as well as many others. These headers are added so the service at the end can use them to work out where the request originated.
The reason we had to change to a NLB, from an ALB, was because the AWS ALB does not support HTTP/2 or gRPC correctly. The ALB will accept HTTP/2 connections, but will forward them to the targets using HTTP/1.1. With a simple HTTP request this is not a big problem, but with gRPC it doesn’t work, so we had to change to a NLB.
Our first try with the NLB was to use it as the TLS terminator, meaning the TLS certificate would be installed in the NLB and the requests would be forwarded to the targets using unencrypted connections. This again works fine for HTTP requests but causes negotiation errors when using gRPC and HTTP/2. As these protocols need Application Layer Protocol Negotiation (ALPN). So we had to remove the TLS from the NLB and move it into the targets. Fortunately we use traefik as a router in our swarm to forward the requests to the correct service. So we installed the TLS certificates into traefik and changed the NLB to use TCP only. This made the gRPC and HTTP/2 connections work!
At this point we have the HTTP/1.1, HTTP/2 and gRPC connections all working correctly, but we are missing the X-Real-IP headers. After routing through several pages of documentation about NLBs and traefik, we discovered that the issue was in fact in neither. The problem lies in the docker swarm ingress network.
There has been an issue in this space for a while (https://github.com/moby/moby/issues/39465). The problem is that the ingress network does not support Proxy Protocol which is required to retain the headers on a request that is not a HTTP request. Meaning that once the request hit the ingress network between the NLB and Traefik the information for the X-Real-IP was lost.
Now that we had tracked the issue down to the docker ingress network, we were able to resolve the problem by moving the traefik service out of the ingress network. We followed the recommendations from docker about configuring an external load balancer (https://docs.docker.com/engine/swarm/ingress/#configure-an-external-load-balancer). We simply replaced the HAProxy with traefik, once we did this the X-Real-IP and other headers came back again.
So in conclusion, it seems that the docker ingress network is not a good choice for the initial entry point into a docker swarm cluster, if you want to maintain the real location the requests originate from. It is better, to place a service such as traefik or HAProxy at the front and then route to the services from there.
These services generally have much better configuration options for the way requests are balanced and offer any features for deploying services as microservices.
Experienced developer in various languages, currently a product owner of nerd.vision leading the back end architecture.