Longhorn use cases

Greetings, everyone!
I am currently developing another one-opentofu module that can deploy an OpenEdX-ready, non-managed k8s cluster (without OpenEdX instances) on Hetzner. Based on the Harmony k8s plugin, with a Hetzner flavor.
Currently, my biggest challenge is making my OpenTofu module compatible with the K8s Harmony plugin, as the plugin was developed with a focus on EKS and DO, GKE, etc, which have managed KaaS. I have already made my module compatible with, at least at first glance. However, my module is still in the experimental phase and not yet production-ready.
And my eye caught on the K8S_STORAGE_CLASS in the tutor config.
My question is next: Does anyone have expirience with Longhorn (distributed storage) + tutor-k8s/harmony? I’m curious to hear any feedback on this combination on the close to production workloads.
I tried to find any information regarding Longhorn+Openedx in the OpenEdX discussion forum + dummy search in the OpenEdX Slack workspace, on google, even through LLM. Nothing.
I would greatly appreciate it if you could point me to such a discussion or documentation.
Thanks in advance!

Hi! I have experience using Rook-Ceph rather than Longhorn. Personally, I think it’s a better choice for production, though it depends on your specific requirements. I recommend Rook because it provides unified storage (Block, S3-compatible Object, and CephFS), which allows you to replace Minio entirely. From a functional standpoint, the underlying storage engine doesn’t change much for the application, but the stability of Ceph is a big plus. You just need to configure block storage (Longhorn or Rook-Ceph doesn’t matter) as the default storage and deploy the tutor installation.

1 Like

Thanks for the input!
Yeah, one instead of three sounds interesting.
I will test the Rook in performance against Longhorn, with tutor k8s init a stateful DB in k8s. And I will share my results.
With Longhorn, misconfiguration (default configuration, I would say ) might significantly reduce performance; one of the major bottlenecks for me was the on-prem network bandwidth limit. Non-tuned Longhorn v1.11.0 tutor k8s init takes me ~59 minutes. Tuned ~28min. But tuning sacrificed some features

I will add more details on how we used Rook. So we had 6 nodes each with 1GB/s bandwidth and 3 X Samsung SSD 850 EVO 1TB. 1 SSD was the system disk, and the other two were RAW disks for Ceph OSD. I believe that this was the optimal configuration that could withstand the failure of 3 OSDs at the same time. However, compared to AWS EBS, the deployment (migration) speed was 2 times slower because the disks were slow and old. Therefore, if you use fast disks, you can achieve results no worse than EBS.

1 Like

Wow!
Extremely helpful information!
Huge thanks!

Cool results with 1GB/s bandwith!
HA for storage was achieved by keeping those two RAW + 1 system disks, I guess?
You kept those three disks in one zone, let’s say in one server rack, or were the disks distributed between the zones/regions?

In total, we had 12 RAW disks (12 Ceph OSDs).
All cluster nodes were in the same zone. High availability was guaranteed by Rook-Ceph. Eg.: When one node went down, that is, when 2 OSDs were lost, the cluster continued to work and began the rebalancing procedure. Then, after adding a new node, Rook automatically added new disks to the cluster.