Building Resilient Storage: Why We Switched from MinIO to Garage for LurraData Lab
In the context of LurraData Lab, our goal has always been to build a robust, scalable data ecosystem using open-source tools. A critical component of this architecture—especially when processing heavy geospatial datasets (GeoTIFFs, Earth Observation data) via Apache Spark and Sedona—is the object storage layer.
Initially, we deployed MinIO as our S3-compatible storage.
The Problem with MinIO at the Edge
MinIO is a fantastic, high-performance tool. However, as our infrastructure requirements evolved, we encountered a philosophical misalignment with our deployment constraints:
- Resource Footprint: MinIO is increasingly optimized for massive, uniform enterprise data centers (Kubernetes, NVMe drives, high-speed networking). For a leaner, more agile lab environment running on varied hardware, its resource demands felt overly aggressive.
- Rigid Cluster Topologies: Expanding a MinIO cluster or handling node failures required a strict erasure-coding setup that was somewhat brittle when dealing with heterogeneous edge nodes. We needed something that could tolerate a “messier” network topology.
The Solution: Enter Garage (GarageHQ)
After evaluating alternatives, we migrated our object storage layer to Garage (developed by the Deuxfleurs team). Garage is specifically designed for geo-distributed, self-hosted, and heterogeneous clusters.
Here is why it became the perfect fit for us:
- True Decentralization: Garage relies on a Dynamo-style architecture (using CRDTs and a distributed hash table). Nodes can go offline, and the cluster heals and rebalances seamlessly. It assumes network failure is a feature, not a bug.
- Lightweight: Written in Rust, it consumes a fraction of the RAM and CPU compared to MinIO, making it viable to run storage nodes on lower-power devices near the data source.
- Seamless S3 Compatibility: Our Spark/Sedona pipelines, which rely on the
s3a://protocol, required zero code changes. We simply pointed the S3 endpoint to the Garage gateway.
The Migration Process
The transition was surprisingly smooth. We set up a 3-node Garage cluster spanning different physical locations.
The data transfer was handled via a standard rclone sync from the legacy MinIO buckets to the new Garage buckets. Once the sync was complete and verified, we updated the credentials in our Airflow orchestration layer and Spark configurations.
Verdict
For enterprise-grade, localized throughput, MinIO remains a giant. But for Lurra Nova’s vision of a resilient, distributed infrastructure that can survive the realities of edge deployments (and unpredictable network conditions in the Balkans), Garage is the clear winner. It aligns perfectly with our ethos of building robust, decentralized technology for real-world applications.