The Ballast Exchange: A Naval Chronicle of Storage Migration

February 2, 2026 | By Captain James Dowell, Systems Commander

Captain Dowell stood at the helm console, studying the resource metrics with the practised eye of a man who had weathered many an infrastructure storm. The HMS Surfstation was a fine vessel—a capable control-plane node with substantial processing power and two NVMe drives amidships that could store the full provisions of a small fleet. But there was something wrong with the way she handled.

The Rook-Ceph storage system, he had come to realise, was the source of the trouble.

It had seemed a sound decision when first commissioned—a distributed storage platform capable of replicating data across multiple nodes, surviving drive failures, scaling to petabytes. The sort of system one might deploy for a squadron of a hundred ships. But Surfstation sailed alone, or nearly so, with only the Thinkpad serving as an auxiliary worker node. And Ceph, magnificent Ceph, required a crew of its own: monitor daemons, manager daemons, OSD processes, CSI provisioners. Eight to ten pods consuming nearly four gigabytes of memory, all to manage storage that sat entirely on one machine.

It was, Dowell reflected, rather like maintaining a full gun crew for a ship that never saw action. The men ate their rations, drew their pay, and occupied valuable berths—but the guns remained silent.

The Decision

The orders came down on a grey February morning: simplify the storage, reduce the overhead, make the vessel lighter and more responsive.

Dowell had considered the options carefully. OpenEBS LocalPV, TopoLVM, the Rancher local-path-provisioner. Each had its merits. But the latter was already aboard, bundled with the k3s distribution that powered the cluster—merely disabled by a configuration flag, like a cannon with its tompion firmly in place.

The plan was straightforward in conception if demanding in execution: enable the local-path-provisioner, migrate each PersistentVolumeClaim from Ceph to local storage, pin all storage-dependent workloads to Surfstation using node selectors, and finally strike down the Ceph apparatus entirely.

Fourteen PVCs in total. Prometheus and Grafana. Loki and Tempo. The AI systems—Ollama with its thirty gigabytes of model weights, Open WebUI. Three PostgreSQL databases serving various applications. The mail server and its webmail interface. Two registry mirrors. Each would need to be brought alongside, its cargo carefully transferred, and sent back into service under the new regime.

The crew set to work preparing the manifests.

The Calm Before

The first obstacle revealed itself with the particular malice that technical systems reserve for the unwary. The k3s service file, it transpired, had been installed with the --disable local-storage flag hardcoded into its ExecStart directive. The configuration file they had prepared was being summarily ignored.

Dowell studied the systemd unit file as one might study an enemy's battle plan:

ExecStart=/usr/local/bin/k3s \
    server \
        '--disable' \
        'local-storage' \
        '--write-kubeconfig-mode' \
        '644' \

The offending lines would need to be struck. A simple edit, a daemon-reload, a restart of the k3s service. Routine maintenance.

What could possibly go wrong?

Into the Storm

The restart command was issued at approximately sixteen hundred hours.

The service stopped. The pods began their graceful termination. And then—nothing.

Dowell waited. The seconds stretched into minutes. The connection to Surfstation remained open but the prompt did not return. The ship, if he might indulge the metaphor, had lost steerage way.

"How long does a restart typically require?" he found himself asking, though he knew the answer well enough. Thirty seconds. A minute at most.

Five minutes passed. Then ten.

The Ceph daemons, he realised with a chill, were refusing to die. They clung to their sockets and file handles with the desperate tenacity of men who know they are being decommissioned. The OSD process in particular seemed determined to complete some final synchronisation, though what precisely it was synchronising to—itself, presumably—remained a mystery.

There was no choice but to use force.

sudo pkill -9 ceph
sudo pkill -9 k3s

The commands were brutal, the digital equivalent of cutting away tangled rigging with a boarding axe. But needs must when the devil drives, and the devil was driving hard that afternoon.

The service was restarted. The error came immediately:

Error: failed to create listener: failed to listen on 127.0.0.1:6444:
listen tcp 127.0.0.1:6444: bind: address already in use

An orphaned process, still clutching its port like a drowning man clutches flotsam. Dowell issued the commands to identify and terminate it. The ss utility showed the port in use; lsof found nothing. A ghost in the machine.

He ordered a reboot.

The Dark Hours

The reboot command was acknowledged. The connection closed. And Surfstation went dark.

Dowell waited. He pinged the IP address—no response. He tried the local network address—nothing. He checked with the sister node, the Thinkpad, which shared the same local network segment. The ARP table told the story plainly:

surfstation.lan (192.168.8.224) at <incomplete> on wlp3s0

<incomplete>. The machine was not responding to ARP requests. It was not, in any meaningful sense, present on the network.

A cold dread settled in Dowell's chest—the particular horror of the remote administrator who has lost contact with a machine located, in this case, in a garage in another city. The owner of that garage had departed for Switzerland that very morning. There would be no physical access, no serial console, no rescue boot from USB. Whatever had happened to Surfstation, she would have to recover on her own or not at all.

The minutes passed with agonising slowness. Dowell considered the possibilities. A kernel panic, perhaps, triggered by the violent termination of the Ceph daemons. Filesystem corruption requiring manual intervention. A hardware failure, catastrophic and final.

He thought of the data aboard. The monitoring history. The AI models. The databases. The email archives. All of it potentially lost because he had issued a reboot command at the wrong moment.

"Nah," he concluded grimly, "it's totally down."

The Miracle

And then, against all probability and expectation, the machine came back.

Dowell never did learn precisely what had delayed the boot sequence. Perhaps the filesystem check had taken longer than expected. Perhaps the BIOS had paused for some inscrutable hardware initialisation. Perhaps Surfstation had simply needed a moment to collect herself before returning to duty.

Whatever the cause, the ping eventually returned:

PING 192.168.8.224 (192.168.8.224) 56(84) bytes of data.
64 bytes from 192.168.8.224: icmp_seq=1 ttl=64 time=3.42 ms

Dowell permitted himself a moment of profound relief—the kind a captain feels when his ship emerges from a squall with all masts standing. Then he set immediately to work.

The Migration

The local-path-provisioner was operational. It had deployed automatically when the disable flag was removed, a single pod in the kube-system namespace consuming mere megabytes of memory compared to Ceph's gigabytes.

The ConfigMap was applied to direct storage to the NVMe mount:

nodePathMap:
  - node: "surfstation"
    paths: ["/mnt/nvme/local-path-provisioner"]

The storage class was set as default. And then the transfers began.

The registry mirrors went first—low-risk targets whose contents were merely cached images, easily rebuilt. Scale down the deployment, delete the old PVC, apply the Helm chart, watch the new PVC bind to local-path storage. A smooth operation, completed in minutes.

The AI stack followed. Ollama and Open WebUI, their substantial storage requirements now provisioned from the fast NVMe rather than the Ceph abstraction layer. The models would need to be re-downloaded, but that was acceptable.

The monitoring suite proved more challenging. Prometheus, Grafana, Loki, Tempo, Alertmanager—five PVCs with StatefulSets that objected strenuously to having their volume claim templates modified. The StatefulSets had to be deleted and recreated, their pods force-terminated when they refused to die gracefully. But in the end, they too submitted to the new order.

The databases required the most care. PostgreSQL does not take kindly to having its data directory yanked away mid-transaction. Each database was backed up using pg_dump before the migration:

kubectl exec -n job-automation deploy/postgres -- pg_dump -U jobuser -d jobautomation > backup.sql

The translation-service database, suffering from severe I/O degradation due to Ceph's struggles, could not even complete a backup—its autovacuum workers timing out continuously, a ship taking on water faster than the pumps could clear it. That data was lost in the transfer. The schema would be recreated from migrations; the content, ephemeral anyway, would accumulate afresh.

The email server went last. Stalwart and Roundcube, carrying correspondence of actual importance. The backup jobs had been running on schedule, providing some insurance, but Dowell still felt the weight of responsibility as he deleted those PVCs and watched them recreate on the new storage backend.

Striking the Colours

With all fourteen PVCs successfully migrated, only one task remained: the decommissioning of Rook-Ceph itself.

The StorageClass was deleted first:

kubectl delete storageclass ceph-block

Then the namespace. This proved unexpectedly difficult—the finalizers on the CephCluster and CephBlockPool resources prevented their deletion, which in turn prevented the namespace from terminating. One by one, Dowell patched away the finalizers:

kubectl patch cephcluster rook-ceph -n rook-ceph -p '{"metadata":{"finalizers":null}}' --type=merge
kubectl patch cephblockpool replicapool -n rook-ceph -p '{"metadata":{"finalizers":null}}' --type=merge
kubectl patch configmap rook-ceph-mon-endpoints -n rook-ceph -p '{"metadata":{"finalizers":null}}' --type=merge
kubectl patch secret rook-ceph-mon -n rook-ceph -p '{"metadata":{"finalizers":null}}' --type=merge

And finally, with a soft finality, the namespace vanished:

Error from server (NotFound): namespaces "rook-ceph" not found

It was done.

A Lighter Ship

Dowell surveyed the results of the day's work. Where once Ceph had consumed eight to ten pods and four gigabytes of memory, now a single local-path-provisioner pod used perhaps fifty megabytes. The storage, all 119 gigabytes of it, sat directly on the NVMe drive—no replication overhead, no monitor quorum, no OSD heartbeats.

The cluster responded more crisply. Pods scheduled faster. The metrics showed memory pressure relieved across the board.

It had been, he reflected, rather like refitting a ship of the line for service as a fast frigate—stripping away the triple gun decks and their attendant crews, reducing the complement to what was actually needed for the mission at hand. Surfstation was no longer prepared to fight a fleet action that would never come. She was lean, responsive, fit for purpose.

The sun was setting over the Pacific—or so Dowell imagined, having no actual window on the weather. Somewhere in Switzerland, the garage owner was enjoying fondue, blissfully unaware of the drama that had unfolded aboard her property.

And in the cluster, fourteen PVCs hummed contentedly on their new storage backend, their workloads serving requests, their data safe on NVMe, their futures secured by the simplest possible provisioner.

Tomorrow would bring new challenges. There were always new challenges—certificates expiring, pods crashlooping, upgrades demanding attention. But that was tomorrow's watch.

Tonight, the ship was sound.

Ship's log concluded by Captain J. Dowell. Technical operations executed via k3s, local-path-provisioner, Helm, and kubectl. The Ceph was adequate while it lasted.