Recently, I almost lost a bunch of ripped blu-rays, DVDs and downloaded movies and TV series. I thought a RAID5 would preserve that reasonably well, but I didn’t consider carefully enough recovery scenarios in case the whole NAS dies. I learned how a NAS can be expensive and paint myself in a corner, but I also learned more about logical volumes that could greatly improve my Linux installation in the future.
A NAS: a great idea at start
I used to store movies and TV series on separate drives. After I almost filled up 2 3Tb drives with ripped blu-rays, I wanted to do better than just adding another drive. Otherwise, I would end up checking all three drives to find out a given movie or TV series. I thus needed a solution to combine the storage space into a single pool.
One way of doing that is a Redundant Array of Inexpensive Drives (RAID). This can use several drives to improve reliability (mirroring), performance (striping) or both. This also combines the space of multiple drives, resulting into a virtual device with more space. I wanted to go with a RAID5, because that resulted in more space, with the possibility of one drive dying without loosing access to the data. Of course, if a drive of a RAID5 dies, you need to replace it as soon as possible so the array can rebuild and be reliable again, otherwise, if another drive fails, data loss occurs. The larger the storage pool is, the more catastrophic is the data loss!
Unfortunately, at the time I investigated that, there was no user interface in Ubuntu I knew to set up a RAID (that may have changed). One needed to copy/paste obscure commands from web pages. I got tired of such processes and wanted some easier way, other than trying with Windows because I don’t have a Windows license for my HTPC with the storage on it.
I thus wanted to experiment with a Network-Attached Storage (NAS) device. I picked the TS-453Be from QNAP, because I wanted a 4-bay device that is not too expensive. I found a lot of 2-bay devices which would have prevented me from building a RAID5, forum posts suggesting that RAID5 is not great because all data is lost if more than one drive dies, that a mirrored RAID is better, etc. Because I didn’t have 4 drives yet, I started with 2 3Tb drives with the project of expanding to four.
One good side of these devices is their ability to be configured remotely. Instead of hooking the device to an HDMI port (but you can do so if you want to) and interact with it using a screen, you can point a web browser to its interface and configure via a web browser.
Downside is the quantity of settings that made little sense to me, and some still do remain obscure, like the JBOD and iSCSI. It was relatively easy to get a RAID setup, but I quickly noticed that only a portion of the space available on the drives was usable by the volume containing the data. This is because the physical space can be used several ways by the NAS, not just for storing data. One can create snapshots or split the space into multiple logical volumes. Volumes can also host virtual machines. I had an hard time finding how to expand the logical volume with my contents because the option in the UI was hidden away, and it was called « Étendre » in French, which I confused with « Éteindre », meaning Power off. But everything was there to expand the logical volume, but not shrink it afterwards.
A couple of months later, I found out it was possible to convert the mirrored RAID into a RAID5 after I added two new drives. That gave me 9Tb of storage space, enough for the moment to store my stuff.
Getting more would however be painful, time consuming and expensive, requiring me to swap out each 3Tb drive with larger ones, one drive at a time. Each time a drive sawp would occur, the RAID would have to rebuild from the three remaining drives, and only after the rebuild is complete would I be able to swap another drive. After all drives are swapped, the storage pool would expand based on the size of the smallest drive.
Then what’s the point? Why not copy all the data elsewhere, rebuild a new array and copy the data back? The longer process has the benefit of leaving the storage pool online throughout the whole migration. Data can still be accessed and even, at least as far as I know, modified! A dying drive also doesn’t bring the pool offline. The degraded RAID can still work, until a new drive is added. The rebuild occurs online, while data can still be accessed. I realized that this high availability was overkill for me, but for a business, that would be critical.
An hybrid backup strategy
I didn’t have enough storage space to back up the whole RAID5 array. Instead of solving that, I went with an hybrid strategy.
- My personal data is already backed up on Dropbox and present on two machines at home. My idea was to make my NAS the second Dropbox client, but I quickly noticed that Dropbox doesn’t run on QTS, the Linux variant installed by QNAP on the NAS. I thus needed to keep my HTPC for that. Dropbox allows me to have an offsite backup, which would be handy if a disaster such as a fire destroyed my home. But thinking more, after such a disaster, getting back my photos, videos, Live sets, etc., would be kind of minor concern.
- I have hundreds of gigabytes of Minecraft videos I authored and uploaded on YouTube but wanted to keep an archive of these. I ended up uploading these to an Amazon S3 bucket, which is now moved to Glacier. This allows to save the data for a low price but requires more time and some fees to get the data back. I thought the NAS could itself synchronize the MinecraftVideos folder with S3. No, it cannot run S3, not anymore! QNAP switched gear, giving up on S3 in favor of inter-NAS synchronization! That means if I want an automated backup, I would need to buy a second NAS with the same (or higher) storage space, and set it up to be synchronized with the first! For a small business, I can imagine this possible, but for a home user, that looks like overkill.
- I thought that not backing up the ripped DVDs and blu-rays would not be a big problem since I have the originals. This was a big mistake.
- More and more recorded videos added up, with no backup. Using my HD PVR from Hauppauge, I recorded several movies from my Videotron set top box.
My backup plan was thus outdated and needed some revisions and improvements.
Noisy box
The NAS, in addition to its lack of integration with Dropbox and S3, was quite noisy. Sometimes, it was quiet, but regularly, it started to make an humming noise, and it wouldn’t stop unless I power it off or tap on it a few times. I searched a long time to solve this. I tried to screw the drives instead of just attaching them with the brackets, but the problem was that one of the drives (a new drive) I put in the NAS was bad and noisy! Switching that with another 3Tb drive I had fixed the issue.
The noisy drive, I moved it into my main computer and it was relatively quiet at first. But it ended up making an annoying humming sound that my mic was picking, reducing the quality of many of my Minecraft videos. At some point, I got fed up and decommsionned that drive, in favor of the last 3Tb drive remaining in my HTPC. My NAS combined with a NVIDIA Shield was pretty much replacing my HTPC which I was more and more thinking about decommisioning.
The disastrous scenario
During 2020 summer, my QNAP NAS suddenly turned off and never powered on. When I was turning the unit on, I was getting a blinking status light and nothing else. At first, I was pissed off, because I wasn’t sure I would be able to fix this and was anticipating delays to get that repaired because of the COVID-19 pandemic. My best hope was to recover the data using a Linux box, then I would decide whether to get this NAS repaired or replace it with a standard Linux PC. Recovery eneded up to be harder than I thought, which made me angry.
When I lost access to all my files, I noticed how time consuming re-ripping the blu-rays and DVD will be. I would need to insert each disc into the blu-ray drive, start MakeMKV, click the button to analyze the disc, wait, wait, wait, then enter the path to store the ripped files to. Even though there is a single field text can be entered in, MakeMKV doesn’t put it in focus, forcing me to locate the super small mouse pointer (at least on my HTPC where I had the blu-ray drive to rip from), click, enter the path, check (and the font was super small), click again and then wait, wait, wait. For one disc, that’s OK. For 30…
I also lost a bunch of movies recorded using my HD PVR. The quantity of recordings increased over time and I didn’t realize none of this was backed up! Backup plans need to evolve and be revised over time.
Recovery attempts
First problem was to move the four drives into a computer that would be able to host them. I didn’t have enough free bays in my main computer, not without removing another drive. I was worried Windows would get screwed up by this and wouldn’t re-establish broken links to my Documents, Music, Videos and Pictures folders. I thus chose to use my old HTPC as a host, but fitting and powering the four drives was a painful procesess. I had to use pretty much all the SATA cables I had, and one broke during the process. I had to unplug the hard drive in a really old PC to get the SATA cable, unplug my DVD drive in my main PC to get the SATA cable, found one other SATA cable in a drawer, etc. I also needed a molex to SATA converter cable because the PSU only had four SATA power cables. I needed to power five drives: the four NAS drives to rebuild the RAID array, and the SSD containing Ubuntu! Because of the pandemic, I wasn’t sure the computer stores in my area were open or not, so my best bet to get new hardware was online, with delays of several days, even for something as simple and stupid as a SATA cable. Using what I have was my best option.
All these efforts were pretty much worthless because I wasn’t able to access the data. We’ll see why later on. All I could do is trigger a SMART long self-test, to at least verify the drives were good. All four drives passed the test. No need to get the NAS fixed or continue recovery attempts if more than one drive had failed that test.
I couldn’t go further without ordering hardware. I started with a four-bay USB to SATA docking station. Finding one was first tricky (one bay, just for Mac, etc.), but I got one and it worked like a charm. However, at first, it caused me issues: plugged the power cable in wrong direction, tested with a defective drive (yes, the noisy 3Tb drive I removed from my system just doesn’t power up anymore!!!), but it ended up working.
I was hoping to put the four NAS drives in there and have Windows see the drives, then I would try using ReclaiMe to get the files back. I also needed to get a large enough drive to hold all the data. I got a 10Tb one, that would do, I thought.
Since I was able to reassemble the block device using MDADM, I explored the idea of dumping that block device to a partition on the new 10Tb drive. For this, I had to plug the 10Tb drive to my HTPC, which already was missing connectors for 5 drives! I used the USB docking station for that. Bad idea: the HTPC is too old, having just USB2, and copying 10Tb through USB2 is a good way to strengthen your patience, and get you annoyed. It would have taken more than 2 days just for that step to finish! There was no solution, unless I got a PCI Express card with a USB3 port or probably better eSATA port, and then I would need to get a eSATA drive enclosure to put the 10Tb drive in! I didn’t like this at all, because that was a lot of waiting for an intermediate result only.
After the failure to create the block device because of USB2 slownesss, I got super fed up and decided to proceed with the repair of my NAS. I contacted their technical support and we figured out that a repair was needed and my warranty was over since a year. I thus needed to pay 300$, plus shipping of the NAS. I felt at that time it was my last hope of recovering the data.
But I tried anyway to get the drives out of my HTPC and plug them into my docking station, so ReclaiMe could analyze them. Well, RelciMe completely failed. It detected the logical volumes on the drives, which looked promising, but instead of making use of the Ext4 file system, it just scanned the drive for data looking like files. It ws thus just able to extract files with no name but some contents, even unsure if the files were complete! That would be unusable garbage, better off just re-ripping all the discs. R-Studio, another tool I tossed at this, also failed miserably, not even able to reassemble the RAID5. I got so fed up that at some point I considered contacting a data recovery company, but I was concerned that it would be too expensive, and I would just get chunks of data named with hash codes like I was about to get with ReclaiMe.
Lasagna file system
QTS makes use of multiple layers when configuring a file system. Each layer provides some features, but all of this is adding complexity at recovery time.
The diagram below summarizes the structure.
First, the physical drives are combined into a RAID5 using the DMRAID module of the Linux kernel. This allows to create a RAID array in software, without any specialized hardware. The MDADM tool can be used to configure or activate a RAID. I was able to activate the RAID and get a block device out of it.
The block device could be used to host a filesystem. Instead, it is formated by QNAP as a DRDB storage pool, at least according to forums I searched on. Some people attempted to mount the DRDB device without success, because QTS uses a forked version of DRDB preventing anything other than QTS to read it! Because of that, only a QNAP NAS can reassemble the RAID5 and get data back!
The DRDB volume is formatted as a physical device for the LVM system. Logical Volume Manager (LVM) allows to split a storage device into multiple logical partitions. Partitions can be resized at runtime and don’t have to use contiguous space on the physical volumes. They can span multiple physical volumes as well. This is something any ordinary Linux distribution supports, as this is part of the mainline Linux kernel! Only caveat is the absence (at the time I am writing) of user interfaces exposing these. One needs to use command lines such as pvcreate, vgcreate and lvcreate to manipulate the logical volumes, but the commands are not as complex as I thought.
I read that QNAP also forked LVM, so I was worred that even if I got past the DRBD layer, I would not cross the LVM one.
Note that when I partitioned my 10Tb drive, I found a LVM partition on it! The assembled RAID apparently was a LVM, so maybe I would have been able to vgimport it and get access to the logical partitions! However, any attempt to do so would have failed or changed the machine id in the volume group, reducing chances my fixed NAS mounts the array and filesystems. My new 10Tb drive was already formatted at the time, with some data on it, so couldn’t use it to back up the full RAID and test, unless I get another drive. I thus decided to stop my attempts there since my NAS was shipped to QNAP at the time of that discovery.
Below the LVM layer, there are logical partitions, at least one large with the files I wanted to recover. The Ext4 native Linux file system is used here. That is used to organize the space into files and folders. Recovering all the data requires handling the Ext4 filesystem to get the full file contents back, not just portions of files with no names.
Full recovery
I got the fixed NAS back and inserted the drives in. The NAS powered up and recognized the drives as if it never got broken. I was thus able to get my files back, everything was there. Because the recovery process was so painful and expensive, I didn’t feel any victory, just a bit of relief that this was over.
While waiting for the fixed NAS, I formatted my 10Tb drive. I experimented with logical voluems, ccreating several partitions on the drive: one for Minecraft Videos, one for movies, one for TV Series, one for a full copy of my Dropbox folder, one for my music files, etc. Using the Ubuntu wiki, it was simple to create the logical volumes and then I started copying files on them. I was ready to transfer the contents of my NAS on the new disk when I got the NAS back.
Even though I recovered my data, I will probably have to re-rip some DVDs that are unreadable by Kodi. The VIDEO_TS structure is causing a lot of headache for pretty much all Linux-based players. VLC seems the most versatile one able to read most DVDs, but sometimes, I needed to use MPlayer, Kaffeine, etc. I remember that almost destroyed my dream of having all my DVDs and blu-rays on hard drives. Of course, Windows with PowerDVD or similar DVD player will work better, but I don’t want Windows on my HTPC, better return back to the sandalone DVD/blu-ray player and spend countless minutes searching for disks. MakeMKV should help solving that, because Kodi can read MKVs without issues. I may be able to convert previously ripped VIDEO_TS into MKV, saving me the trouble of re-ripping the disks.
After that bad experience, I came up with the plan of keeping the NAS as long as it would work, but back up all data on another drive. If the NAS dies a second time, then I would not need to recover any data and would just repurpose the drives, probably in a standard Linux PC.
Lesson learned: a NAS is for cases where you have multiple drives, more than four, if not more than eight, no usual PC can accomodate. It is relatively straightforward to get a standard ATX case that will host six drives, including SSDs, and getting a power supply unit with six SATA connectors is perfectly fine. Having to do it, I would probably explore the route of modular power supplies to reduce cable clutter, but even that is optional.
By trying to save myself some copy/pasting, I ended up with more pain and problems. Having spent at least half a day exploring logical volumes, at worst experimenting in a virtual machine, I would have figured out that my existing HTPC would have been able to combine my existing drives into a storage pool that can expand over time. If the HTPC dies, another Linux PC can import the volume group and things go on. Unfortunately, nothing can prevent disaster caused by failed drives other than back ups.
Other benefits of logical volumes
After I explored logical volumes, I am pretty sure I need them for my next Linux installation, because they will solve a bunch of fundamental issues I am getting again, again and again.
- Each time I perform an upgrade, I am running into the risk of a catastrophic issue making the whole system unusable. Ubuntu offers no downgrade path. If an upgrade fails, just reinstall from scratch. This is why several people suggest to not dist-upgrade but just reinstall clean, every time. Logical volumes alleviate that through snapshots. Before a dist-upgrade, I could just create a snapshot of the volume holding Linux, upgrade and in case of an issue, just restore the snapshot in a few minutes, no reinstall, no reconfiguration.
- Supporting upgrades, downgrades, multiple versions, multiple Linux distributions, all of this requires the home directory to be separate from the root file system. But each time I over partition my drive, one partition gets full and I have to either move data around, or restart computer and perform a repartitioning, which is time consuming and a risk for data loss (e.g., power outage while GParted moves data!). Logical voumes solve that, by allowing resizing at runtime. If I need more than 50Gb for my home drive, no problem, just claim some extents from the physical volumes, no need to be continugous space, and the resize occurs at runtime, without any unmount or reboot. I can keep working while the resize occurs. That’s really neat and powerful.
- Even the classical problem of expanding a drive is easier with logical volumes. LVM can move all data from one physical volume to another, transparently, at runtime, while I can work on the machine. Replacing a too small or end of life SSD is thus easier. Of course, sharing a SSD between Windows and Linux is always painful and problematic process, although perfectly possible. Dual booting Windows and Linux is itself a painful problematic process anyhow.