Greetings, everyone!
I am currently developing another one-opentofu module that can deploy an OpenEdX-ready, non-managed k8s cluster (without OpenEdX instances) on Hetzner. Based on the Harmony k8s plugin, with a Hetzner flavor.
Currently, my biggest challenge is making my OpenTofu module compatible with the K8s Harmony plugin, as the plugin was developed with a focus on EKS and DO, GKE, etc, which have managed KaaS. I have already made my module compatible with, at least at first glance. However, my module is still in the experimental phase and not yet production-ready.
And my eye caught on the K8S_STORAGE_CLASS in the tutor config.
My question is next: Does anyone have expirience with Longhorn (distributed storage) + tutor-k8s/harmony? I’m curious to hear any feedback on this combination on the close to production workloads.
I tried to find any information regarding Longhorn+Openedx in the OpenEdX discussion forum + dummy search in the OpenEdX Slack workspace, on google, even through LLM. Nothing.
I would greatly appreciate it if you could point me to such a discussion or documentation.
Thanks in advance!
Hi! I have experience using Rook-Ceph rather than Longhorn. Personally, I think it’s a better choice for production, though it depends on your specific requirements. I recommend Rook because it provides unified storage (Block, S3-compatible Object, and CephFS), which allows you to replace Minio entirely. From a functional standpoint, the underlying storage engine doesn’t change much for the application, but the stability of Ceph is a big plus. You just need to configure block storage (Longhorn or Rook-Ceph doesn’t matter) as the default storage and deploy the tutor installation.
Thanks for the input!
Yeah, one instead of three sounds interesting.
I will test the Rook in performance against Longhorn, with tutor k8s init a stateful DB in k8s. And I will share my results.
With Longhorn, misconfiguration (default configuration, I would say ) might significantly reduce performance; one of the major bottlenecks for me was the on-prem network bandwidth limit. Non-tuned Longhorn v1.11.0 tutor k8s init takes me ~59 minutes. Tuned ~28min. But tuning sacrificed some features
I will add more details on how we used Rook. So we had 6 nodes each with 1GB/s bandwidth and 3 X Samsung SSD 850 EVO 1TB. 1 SSD was the system disk, and the other two were RAW disks for Ceph OSD. I believe that this was the optimal configuration that could withstand the failure of 3 OSDs at the same time. However, compared to AWS EBS, the deployment (migration) speed was 2 times slower because the disks were slow and old. Therefore, if you use fast disks, you can achieve results no worse than EBS.
Wow!
Extremely helpful information!
Huge thanks!
Cool results with 1GB/s bandwith!
HA for storage was achieved by keeping those two RAW + 1 system disks, I guess?
You kept those three disks in one zone, let’s say in one server rack, or were the disks distributed between the zones/regions?
In total, we had 12 RAW disks (12 Ceph OSDs).
All cluster nodes were in the same zone. High availability was guaranteed by Rook-Ceph. Eg.: When one node went down, that is, when 2 OSDs were lost, the cluster continued to work and began the rebalancing procedure. Then, after adding a new node, Rook automatically added new disks to the cluster.