K8s: Caddy error when generating certs

Hello.

We are setting up Open edX using Tutor in a self-hosted Kubernetes cluster using AWS EC2 machines.
Upon running the caddy pod first using tutor k8s start caddy it is unable to generate some certificates due to various errors. The pod logs are listed below.

{"level":"info","ts":1725531770.4546814,"logger":"tls.obtain","msg":"acquiring lock","identifier":"files.openedx.enchatted.com"}
{"level":"info","ts":1725531770.4614372,"logger":"tls.obtain","msg":"acquiring lock","identifier":"preview.openedx.enchatted.com"}
{"level":"info","ts":1725531770.462866,"logger":"tls.obtain","msg":"lock acquired","identifier":"openedx.enchatted.com"}
{"level":"info","ts":1725531770.4632387,"logger":"tls.obtain","msg":"obtaining certificate","identifier":"openedx.enchatted.com"}
{"level":"info","ts":1725531770.4674988,"logger":"tls.obtain","msg":"lock acquired","identifier":"studio.openedx.enchatted.com"}
{"level":"info","ts":1725531770.467891,"logger":"tls.obtain","msg":"obtaining certificate","identifier":"studio.openedx.enchatted.com"}
{"level":"info","ts":1725531770.4752223,"logger":"tls.obtain","msg":"lock acquired","identifier":"files.openedx.enchatted.com"}
{"level":"info","ts":1725531770.4756598,"logger":"tls.obtain","msg":"obtaining certificate","identifier":"files.openedx.enchatted.com"}
{"level":"info","ts":1725531770.4813,"logger":"tls.obtain","msg":"lock acquired","identifier":"preview.openedx.enchatted.com"}
{"level":"info","ts":1725531770.4815571,"logger":"tls.obtain","msg":"obtaining certificate","identifier":"preview.openedx.enchatted.com"}
{"level":"warn","ts":1725531800.4577954,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"warn","ts":1725531830.7099082,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"warn","ts":1725531860.9610581,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"error","ts":1725531860.9613905,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"minio.openedx.enchatted.com","issuer":"acme-v02.api.letsencrypt.org-directory","error":"registering account [] with server: provisioning client: performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"warn","ts":1725531860.962882,"logger":"http","msg":"missing email address for ZeroSSL; it is strongly recommended to set one for next time"}
{"level":"warn","ts":1725531890.9621415,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"error","ts":1725531890.963515,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"minio.openedx.enchatted.com","issuer":"acme.zerossl.com-v2-DV90","error":"account pre-registration callback: performing EAB credentials request: Post \"https://api.zerossl.com/acme/eab-credentials-email\": dial tcp: lookup api.zerossl.com: i/o timeout"}
{"level":"error","ts":1725531890.9637241,"logger":"tls.obtain","msg":"will retry","error":"[minio.openedx.enchatted.com] Obtain: account pre-registration callback: performing EAB credentials request: Post \"https://api.zerossl.com/acme/eab-credentials-email\": dial tcp: lookup api.zerossl.com: i/o timeout","attempt":1,"retrying_in":60,"elapsed":120.525664397,"max_duration":2592000}
{"level":"warn","ts":1725531921.2136161,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"info","ts":1725531950.9650018,"logger":"tls.obtain","msg":"obtaining certificate","identifier":"minio.openedx.enchatted.com"}
{"level":"warn","ts":1725531951.4654934,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"error","ts":1725531951.4655674,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"apps.openedx.enchatted.com","issuer":"acme-v02.api.letsencrypt.org-directory","error":"registering account [] with server: provisioning client: performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"warn","ts":1725531951.4658325,"logger":"http","msg":"missing email address for ZeroSSL; it is strongly recommended to set one for next time"}
{"level":"warn","ts":1725531981.4663975,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"error","ts":1725531981.4664683,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"apps.openedx.enchatted.com","issuer":"acme.zerossl.com-v2-DV90","error":"account pre-registration callback: performing EAB credentials request: Post \"https://api.zerossl.com/acme/eab-credentials-email\": dial tcp: lookup api.zerossl.com: i/o timeout"}
{"level":"error","ts":1725531981.4665303,"logger":"tls.obtain","msg":"will retry","error":"[apps.openedx.enchatted.com] Obtain: account pre-registration callback: performing EAB credentials request: Post \"https://api.zerossl.com/acme/eab-credentials-email\": dial tcp: lookup api.zerossl.com: i/o timeout","attempt":1,"retrying_in":60,"elapsed":211.022745085,"max_duration":2592000}
{"level":"warn","ts":1725532011.7182052,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"info","ts":1725532041.466975,"logger":"tls.obtain","msg":"obtaining certificate","identifier":"apps.openedx.enchatted.com"}
{"level":"warn","ts":1725532041.9692352,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"error","ts":1725532041.9693525,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"openedx.enchatted.com","issuer":"acme-v02.api.letsencrypt.org-directory","error":"registering account [] with server: provisioning client: performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"warn","ts":1725532041.9695723,"logger":"http","msg":"missing email address for ZeroSSL; it is strongly recommended to set one for next time"}
{"level":"error","ts":1725532071.9697645,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"openedx.enchatted.com","issuer":"acme.zerossl.com-v2-DV90","error":"account pre-registration callback: performing EAB credentials request: Post \"https://api.zerossl.com/acme/eab-credentials-email\": dial tcp: lookup api.zerossl.com: i/o timeout"}
{"level":"warn","ts":1725532071.9697728,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"error","ts":1725532071.96989,"logger":"tls.obtain","msg":"will retry","error":"[openedx.enchatted.com] Obtain: account pre-registration callback: performing EAB credentials request: Post \"https://api.zerossl.com/acme/eab-credentials-email\": dial tcp: lookup api.zerossl.com: i/o timeout","attempt":1,"retrying_in":60,"elapsed":301.506977629,"max_duration":2592000}
{"level":"warn","ts":1725532102.2220774,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}

This results in all subsequent pods and jobs failing (such as minio) since there are no secure endpoints for them.
I have also stumbled upon another thread of this (Deploying in Kubernetes env using tutor in Oracle Cloud) which was never resolved and closed due to inactivity.
What could be the cause of this? Any help will be greatly appreciated.

Thanks.

Hi @Retr0,
How are you starting the platform? You should do tutor k8s start or tutor k8s launch -I, so all containers are started. If you only start Caddy without the other pods it may fail to validate the internal subdomains.

Hello, thanks for the quick reply!

The reason I am starting Caddy first is because it is stated so in the Tutor documentation here but regardless, I have also tried to normally launch it using tutor k8s start, multiple times without avail.

All pods, services and everything else seems to be running ok but the Caddy service is still failing to get an external IP and grabbing the pod logs show the same errors I listed above.

Ok. Tutor k8s will create by default a load balancer for each namespace. Check that all load balancers are up and running. Then point your DNS to the public address of the load balancer. This should make it work.

As I said in my replies, the Caddy pod which ultimately provides to the service the necessary data needed to function is failing due to the errors I listed above.

Yes, Tutor does create the Load Balancer but there is no external IP assigned since Caddy is not running properly.

I tried fully resetting and setting up our cluster, starting from a clean slate again.
No luck in fixing this, still got similar errors while generating certs.

{"level":"warn","ts":1725963625.9582021,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"warn","ts":1725963656.2085097,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"warn","ts":1725963686.4600103,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"error","ts":1725963686.460208,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"minio.openedx.enchatted.com","issuer":"acme-v02.api.letsencrypt.org-directory","error":"registering account [] with server: provisioning client: performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"warn","ts":1725963686.4604642,"logger":"http","msg":"missing email address for ZeroSSL; it is strongly recommended to set one for next time"}
{"level":"warn","ts":1725963716.4607253,"logger":"http.acme_client","msg":"HTTP request failed; retrying","url":"https://acme-v02.api.letsencrypt.org/directory","error":"performing request: Get \"https://acme-v02.api.letsencrypt.org/directory\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
{"level":"error","ts":1725963716.4607913,"logger":"tls.obtain","msg":"could not get certificate from issuer","identifier":"minio.openedx.enchatted.com","issuer":"acme.zerossl.com-v2-DV90","error":"account pre-registration callback: performing EAB credentials request: Post \"https://api.zerossl.com/acme/eab-credentials-email\": dial tcp: lookup api.zerossl.com: i/o timeout"}
{"level":"error","ts":1725963716.460858,"logger":"tls.obtain","msg":"will retry","error":"[minio.openedx.enchatted.com] Obtain: account pre-registration callback: performing EAB credentials request: Post \"https://api.zerossl.com/acme/eab-credentials-email\": dial tcp: lookup api.zerossl.com: i/o timeout","attempt":1,"retrying_in":60,"elapsed":120.510339967,"max_duration":2592000}

I installed witout K8S, but I have exacly the same issue when activating “Activate SSL/TLS certificates for HTTPS access”

I added proxy settings in docker daemon, but it doesn’t solve the issue.

By the way, I can’t connect in HTTPS (ERR_SSL_PROTOCOL_ERROR). I guess it cames from this error but i’m not sur. If I don’t activate HTTPS, it works well in HTTP

I think caddy can’t generate an auto sign certificate for HTTPS connexions because it can’t reach it.

I tried curl the URL from lms container, and it doesn’t work. When I do the same on the host, it works.

Hello,

I solved the issue on my side.

First, you must verifie you can access to remote sites which permit to generate certificate

curl -I https://api.zerossl.com

must return something like that :

HTTP/1.1 200 Connection established

HTTP/2 401
date: Fri, 27 Sep 2024 09:07:03 GMT
content-type: application/json; Charset=UTF-8
server: nginx
access-control-allow-origin: *
access-control-allow-methods: GET, HEAD, POST, PUT, PATCH, DELETE, OPTIONS
x-apilayer-transaction-id: ed25814a-65c3-4a86-ad0c-1bc06b811407
cache-control: no-cache, private
content-security-policy: default-src 'none';
strict-transport-security: max-age=180; includeSubDomains

and then :
curl -I https://acme-v02.api.letsencrypt.org/directory

must return :

HTTP/1.1 200 Connection established

HTTP/2 200
server: nginx
date: Fri, 27 Sep 2024 09:08:00 GMT
content-type: application/json
content-length: 746
cache-control: public, max-age=0, no-cache
replay-nonce: mdHBMg8Kjemj97NOqSTVopblJtX5d91naEhKWUlR2Y6o0E9sTAk
x-frame-options: DENY
strict-transport-security: max-age=604800

If it works, we must then add certificate configuration in Caddy.

First : You must create a free account on Zero SSL website

  • Go to https://zerossl.com/
  • Pick “get Free SSL”
  • Create a “Free account”
  • Next to the process of account creation, log in and go to the “Developper” menu and in “EAB Credentials for ACME Clients”, Do “generate”
  • In a secure place, backup the inforamtions provided : EAB KID, EAB HMAC Key and the email used to create the account

Then : Add configuration in caddy
Go to Caddy configuration file : locate your tutor root directory
tutor config printroot

In my case :
/root/.local/share/tutor

Then, go to get the caddy configuration file which name is Caddyfile

lms-user@prod-lms:~$ tutor config printroot
/home/lms-user/.local/share/tutor
lms-user@prod-lms:~$ cd /home/lms-user/.local/share/tutor
lms-user@prod-lms:~/.local/share/tutor$ ls
config.yml  data  env
lms-user@prod-lms:~/.local/share/tutor$ cd env/
lms-user@prod-lms:~/.local/share/tutor/env$ ls
apps  build  dev  k8s  kustomization.yml  local  plugins  version  webui
lms-user@prod-lms:~/.local/share/tutor/env$ cd apps/
lms-user@prod-lms:~/.local/share/tutor/env/apps$ ls
caddy  openedx  permissions  redis
lms-user@prod-lms:~/.local/share/tutor/env/apps$ cd caddy/
lms-user@prod-lms:~/.local/share/tutor/env/apps/caddy$ ls
Caddyfile

Make a copy of your caddyfile for rollback if necessary :
cp Caddyfile Caddyfile.copy

Add your Credentials in the Caddyfile.
Edit the file (For example : nano Caddyfile)

Add your credentials previously generate in the ZeroSSL website in “Global configuration” adding a acme_eab block, and replacing the xxxx in the example above with your credentials
key_id → EAB KID
mac_key → EAB HMAC Key

# Global configuration
{


    # Enable proxying from all servers by default. Otherwise, X-Forwarded-* headers will
    # be overwritten.
    # https://caddyserver.com/docs/caddyfile/directives/reverse_proxy#defaults
    servers {
        trusted_proxies static 0.0.0.0/0 ::/0
    }

    acme_eab {
        key_id xxxxxxxxxxxxxxxxxxxx
        mac_key xxxxxxxxxxxxxxxxx
    }

}

# proxy directive snippet (with logging) to be used as follows:

Save your file.

Reboot tutor : tutor local restart

And thats all. I hope I did not forget something.

Troubleshooting, I did other actions, but I’m not sur the are important.
If what is it described upper did not work, try:

  • Add the email used for register in ZeroSSL website :
    tutor config save --set CERTBOT_EMAIL=your-email@example.com
  • Restart docker
  • Stop tutor and start it forcing reconstruction of container :
    tutor local stop
    tutor local start -d

By the way, it solved the certificate issue, but I still can’t load HTTPS webpage, having now the “ERR_CONNECTION_REFUSED” error message.

I had next an buffer overflow error message in caddy, easily fixed with
sysctl -w net.core.rmem_max=7500000
(get from https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes)

I waste lot of time searching informations about how to solve the issue, but finaly, it was ChatGPT which provide me the main answer for this issue.

1 Like

Thank you for your solution but our use case is completely different so I am not sure if it will work.
We are trying to setup Tutor on AWS EKS now and we are having different Caddy errors compared to the original post I made.

But in any case, if I find a solution that satisfies this issue I will post it.

This week, we inexplicably had a similar situation going on with SSL cert generation when running tutor local mode. Since everything in the /env/ folder gets recreated when running tutor config save, rather than editing the Caddyfile directly, the solution that worked was creating a caddyfile.yml plugin in the plugins folder and populating it like this using the credentials generated in the ZeroSSL site as described above:

name: caddyfile
version: 0.0.1
patches:
  caddyfile-global: |
    acme_eab {
        key_id xxxxxxxxxxxxxxxxx
        mac_key xxxxxxxxxxxxxxxxx
    }

Then:

  1. tutor plugins enable caddyfile
  2. tutor config save --set CERTBOT_EMAIL=your-email@example.com
  3. tutor local launch (or tutor local launch -I to skip interactive if you’ve already previously run the interactive startup)