Cheap high availability website

Because all hosters, even the largest, have sometimes outages, I wanted to find a way to make a website really reliable but without much cost. It should use multiple hosters and multiple DNS services to even work if a whole hoster fails.

In this first attempt it will be about a static website, i.e. DNS and a website with https (by using Let’s Encrypt) using Docker, Traefik and Nginx.

DNS

First we need a reliable DNS server that has an API that Traefik can use for the Let’s Encrypt DNS-01 challenge. PowerDNS is free and is supported by Let’s Encrypt/Traefik. It will be the master for two other name server clusters of two hosters:

“Hoster 1 DNS” and “Hoster 2 DNS” will be secondary name servers that receive all changes from our PowerDNS server. We will make the PowerDNS server hidden, so that its IP is not known and it is harder or impossible to attack. Only the name servers of “Hoster 1” and “Hoster 2” are published for the domain.

PowerDNS

For setting up PowerDNS we will use docker compose. We will use Traefik, MariaDB and PowerDNS. This is a basic configuration:

version: "3.7"
services:
  reverse-proxy:
    image: traefik:v3.1
    restart: always
    command:
      - "--providers.docker"
      - "--providers.docker.exposedByDefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--certificatesresolvers.mydnsresolver-powerdns.acme.dnschallenge=true"
      - "--certificatesresolvers.mydnsresolver-powerdns.acme.dnschallenge.provider=pdns"
      - "--certificatesresolvers.mydnsresolver-powerdns.acme.email=yourletsencryptemail@mail.com"
      - "--certificatesresolvers.mydnsresolver-powerdns.acme.caserver=https://acme-v02.api.letsencrypt.org/directory"
      - "--certificatesresolvers.mydnsresolver-powerdns.acme.storage=/letsencrypt/acme_pdns.json"
    labels:
      - "traefik.enable=true"
      - "traefik.http.middlewares.ipwhitelist.ipwhitelist.sourcerange=otherwebserver/32,172.22.0.0/16"
    environment:
      - "PDNS_API_KEY=yourkey"
      - "PDNS_API_URL=http://powerdns:8081"
      - "PDNS_PROPAGATION_TIMEOUT=300"
    ports:
      - "80:80"
      - "443:443"
    logging:
      options:
        max-size: "100M"
        max-file: "10"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - letsencrypt_prod:/letsencrypt

  pdns-db:
    build:
      context: ./pdns-db
    environment:
      MYSQL_ROOT_PASSWORD: yourdbrootpassword  # Replace with your root password
      MYSQL_DATABASE: powerdns
      MYSQL_USER: powerdns
      MYSQL_PASSWORD: yourdbuserpassword  # Replace with your database user password
    volumes:
      - pdns-db-data:/var/lib/mysql
    restart: always

  powerdns:
    build:
      context: ./powerdns
      dockerfile: Dockerfile
    volumes:
      - pdns-data:/data
    ports:
      - "53:53/tcp"
      - "53:53/udp"
      - "8081/tcp"
    restart: always
    labels:
      - "traefik.enable=true"
      - "traefik.http.services.powerdns.loadbalancer.server.port=8081"
      - "traefik.http.routers.powerdns-https.rule=Host(`nameserver.your-domain.com`)"
      - "traefik.http.routers.powerdns-https.entrypoints=websecure"
      - "traefik.http.routers.powerdns-https.tls=true"
      - "traefik.http.routers.powerdns-https.tls.certresolver=mydnsresolver-powerdns"
      - "traefik.http.routers.powerdns-https.middlewares=ipwhitelist@docker"

volumes:
    letsencrypt_prod:
    pdns-data:
    pdns-db-data:

This docker-compose.yaml file now contains Traefik with the Let’s encrypt DNS-01 challenge, MariaDB and PowerDNS.

MariaDB

To use MariaDB with PowerDNS, a database and some tables have to be created:

Create a directory pdns-db. Create a “Dockerfile” in it:

FROM mariadb:10.5.19
COPY script/database /docker-entrypoint-initdb.d

Create a directory pdns-db/script/database and copy the file from https://raw.githubusercontent.com/PowerDNS/pdns/master/modules/gmysqlbackend/schema.mysql.sql into it. It will create te necessary tables for PowerDNS when the MariaDB database is started for the first time.

PowerDNS

Create a directory powerdns and save this Dockerfile into it:

FROM powerdns/pdns-auth-49:latest
WORKDIR /etc/powerdns
COPY ./pdns.conf /etc/powerdns/pdns.conf
EXPOSE 53/tcp 53/udp

Create another file called pdns.conf in the same directory:

launch=gmysql

gmysql-host=pdns-db
gmysql-port=3306
gmysql-dbname=powerdns
gmysql-user=powerdns
gmysql-password=yourdbuserpassword

primary=yes

# Here we have to enter the IP addresses of our secondary name servers
also-notify=1.2.3.4
allow-axfr-ips=1.2.3.4

receiver-threads=2

# Activate the API to let Traefik perform the DNS-01 challenge with Let's Encrypt
api=yes
api-key=yourkey  # The PDNS_API_KEY from the docker-compose.yaml
webserver=yes
webserver-address=0.0.0.0
webserver-port=8081
webserver-allow-from=127.0.0.1,::1,172.22.0.0/16

# Logging-Einstellungen (optional)
loglevel=9

Now you can start your containers with “docker-compose up -d”. Then open a shell into the PowerDNS container, e.g. with “docker exec -it your-powerdns-container-name /bin/bash”.

In the container you will find the tool pdnsutil. It is used to manage the DNS entries. You can now use it to create your DNS zone or import a zone file:

pdnsutil create-zone your-domain.com
(or pdnsutil load-zone your-domain.com /etc/powerdns/zones/your-domain.com.zone)

pdnsutil delete-rrset your-domain.com @ SOA
pdnsutil add-record your-domain.com @ SOA 300 "nameserver.yourhoster.com hostmaster.your-server.com 2024090203 3600 600 604800 1440"
pdnsutil add-record your-domain.com @ A 300 10.1.2.3
pdnsutil add-record your-domain.com nameserver A 300 10.1.2.3
...
pdnsutil rectify-zone your-domain.com
pdnsutil list-zone your-domain.com

Now your domain has been created using PowerDNS and was saved in the MariaDB database. Your name server should now already work. But we want it to be hidden and to be synced with one or more secondary name servers.

For this purpose you now have to configure your secondary name server to receive changes from your PowerDNS name server. How this works is described in the DNS documentation of your hoster.

After making changes to your domain you can send them to your secondary name servers using these commands:

pdnsutil increase-serial your-domain.com
pdnsutil rectify-zone your-domain.com
pdns_control notify your-domain.com

Later we want to make changes via the API (for the Let’s encrypt DNS-01 challenge). Then the serial number should be increased automatically. To make this work you have to run these commands once:

pdnsutil set-meta your-domain.com SOA-EDIT-API DEFAULT
pdnsutil set-kind your-domain.com master

Security

In the docker-compose.yaml we have used a whitelist to limit the access to the API to the local containers and one remote server (which we will later see in the Webserver section). Additionally you can limit the access to port 53 to only the same IPs as “allow-axfr-ips” (which you will get from the documentation of your secondary name servers). So basically the only port that is open to the public will be port 80 and 443 for the webserver (which is described below).

That’s all for configuring the name server. Now you just have to enter the names of the secondary name servers into the name server form for your domain at your registrar. When you use two hosters as secondary name servers, your DNS will continue to work even if one hoster fails.

Webserver

To have something useful we will now create two webservers that have the same content but different IPs. In the DNS we will enter both with the same domain. This way the browsers will automatically try to contact both and use the one that is faster or the one that responds if one fails.

This is actually quite simple, the difficult part is only getting the Let’s encrypt certificate. We can add this to docker-compose.yaml:

  web:
    build:
      context: ./web
    restart: always
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.web-http.rule=Host(`www.your-domain.com`)"
      - "traefik.http.routers.web-http.entrypoints=web"
      - "traefik.http.routers.web-http.tls=false"
      - "traefik.http.routers.web-https.rule=Host(`www.your-domain.com`)"
      - "traefik.http.routers.web-https.entrypoints=websecure"
      - "traefik.http.routers.web-https.tls=true"
      - "traefik.http.routers.web-https.tls.certresolver=mydnsresolver-powerdns"

Then we need to create the “web” directory and place this Dockerfile into it:

FROM nginx:alpine
COPY ./html /usr/share/nginx/html

And in the html subdirectory you can place your content, e.g. an index.html.

Add the IP address in your PowerDNS docker container and send it to the secondary name servers:

pdnsutil add-record your-domain.com www A 300 10.1.2.3

pdnsutil increase-serial your-domain.com
pdnsutil rectify-zone your-domain.com
pdns_control notify your-domain.com

Afterward you can open https://www.your-domain.com and it should work. Traefik should be able to retrieve a certificate. You can see it in the docker logs of the Traefik container.

Second webserver

To make the web server highly available, even if one server fails, we will start another webserver. Just use the same docker-compose.yaml but without the pdns-db and powerdns services. I.e. just Traefik and the web service. And you have to make one modification. Because PowerDNS does not run on the local server but a remote server, you have to change the URL:

- "PDNS_API_URL=https://nameserver.your-domain.com"

And we have to add the IP address of the second server to our PowerDNS:

pdnsutil add-record your-domain.com www A 300 10.2.3.4

pdnsutil increase-serial your-domain.com
pdnsutil rectify-zone your-domain.com
pdns_control notify your-domain.com

And you have to whitelist the IP to let it access the PowerDNS API:

- "traefik.http.middlewares.ipwhitelist.ipwhitelist.sourcerange=10.2.3.4/32,172.22.0.0/16"

That’s all. Now you can access https://www.your-domain.com and it will use just one of these servers. If you stop Traefik on one of the servers (to simulate a server failure), you won’t notice a difference when opening the URL. Only when both Traefik instances on both servers are stopped, the website will become unavailable.

So to make the website become unavailable, both servers of both hosters and also all nameservers of both hosters would have to fail at the same time. I.e. in most cases you should have an uptime of 100% now. One problem might be routing problems so it could be a good idea to have the second server in a different location, e.g. on a different continent or have even more than two servers. You can add further servers in the same way as the second server.

Dynamic content

Next it would be good to support dynamic content. First we will use a filesystem that can automatically synchronize changes between the servers. If one of the server fails and comes up again later, it will automatically sync the changes that happened in the meantime.

We will use Syncthing for this purpose. It is easy to set up and encypts the data on transit. We can add it to our docker-compose.yaml like this:

  syncthing:
    image: syncthing/syncthing
    hostname: my-syncthing
    volumes:
      - syncthing:/var/syncthing
    ports:
      - 22000:22000/tcp # TCP file transfers
      - 22000:22000/udp # QUIC file transfers
    restart: unless-stopped

If you want to see the web interface, e.g. for debugging purposes, you can also open that port using “- 8384:8384 # Web UI” but please be careful, because it will use unencrypted http and will not have a password by default. But it can be very useful to examine problems with the sync. You could also put it behind Traefik to use https but then you need to use different domain names for each instance of Syncthing to know which nodes you are currently working with.

On the first start, the service will generate a device ID and an API key that we will need. Use “docker exec” to open a shell in the container and then use these commands:

# API key:
sed -n 's/.*<apikey>\(.*\)<\/apikey>.*/\1/p' /var/syncthing/config/config.xml

# Device ID:
sed -n 's/.*<device id="\([^"]*\)".*/\1/p' /var/syncthing/config/config.xml | head -n 1

We have to do this on both devices to get the API key and device ID from both. We will call them APIKEY1, DEVICEID1, APIKEY2 and DEVICEID2 in the following commands.

Then we have to connect them with each other and configure a few things. Perform this inside the container:

# To ensure that they are just connecting directly without any relays
curl -X PATCH -H "X-API-Key: APIKEY1" -H "Content-Type: application/json" -d '
  {
  "globalAnnounceEnabled": false,
  "localAnnounceEnabled": false,
  "relaysEnabled": false,
  "natEnabled": false
  }
' http://localhost:8384/rest/config/options

# To let the node know of the other node
curl -X PUT -H "X-API-Key: APIKEY1" -H "Content-Type: application/json" -d '  
  {  
    "deviceID": "DEVICEID2",  
    "addresses": ["tcp://10.2.3.4:22000"],  
    "name": "my-syncthing2",  
    "compression": "metadata",  
    "introducer": false,  
    "paused": false,  
    "allowedNetworks": [],  
    "autoAcceptFolders": false,  
    "maxSendKbps": 0,  
    "maxRecvKbps": 0,  
    "ignoredFolders": []  
  }  
' http://localhost:8384/rest/config/devices/DEVICEID2

# To sync the default folder with the other node
apk add jq

curl -X GET -H "X-API-Key: APIKEY1" http://localhost:8384/rest/config/folders/default | \
jq '.devices += [{"deviceID": "DEVICEID2", "introducedBy": "", "encryptionPassword": ""}]' > /tmp/updated_config.json

curl -X PATCH -H "X-API-Key: APIKEY1" -H "Content-Type: application/json" \
-d @/tmp/updated_config.json http://localhost:8384/rest/config/folders/default

# To reduce the sync interval to 1 second, which means each change will appear within a second on the other device.
curl -X PATCH -H "X-API-Key: APIKEY1" -H "Content-Type: application/json" -d '  
  {  
  "fsWatcherEnabled": true,  
  "fsWatcherDelayS": 1  
  }  
' http://localhost:8384/rest/config/folders/default

You have to do this on both nodes so that they know each other. It will sync the folder /var/syncthing/Sync between both servers. When this is done, we can use it for our nginx server:

We add the volume to the “web” service on both nodes:

  web:
    build:
      context: ./web
    restart: always
[...]
    volumes:
      - syncthing:/var/syncthing

And we create a default.conf file for nginx:

server {
    listen 80;

    server_name localhost;

    root /var/syncthing/Sync/html;

    index index.html;

    location / {
        try_files $uri $uri/ =404;
    }
}

Then we change the Dockerfile of the web service:

FROM nginx:alpine
COPY ./default.conf /etc/nginx/conf.d/

Finally you have to rebuild the web container, e.g. like this:

docker-compose up -d --build --force-recreate web

When this is done, you can copy files into /var/syncthing/Sync/html in the Syncthing container on one node and it will appear on the other. Ensure that the owner of the files is “1000.1000”.

So when you now make changes to your web server files, they will automatically be available on both servers after about a second. So both nodes will serve the same files and now you can make changes to the files without re-deploying the container. E.g. you can make changes using PHP scripts.

Next Steps

The final step would be a database that can sync between the servers (maybe a MariaDB Galera Cluster).