ZFS / FreeNAS users - advice on an ongoing resilver

hon1nbo · July 25, 2020, 7:07pm

I have a FreeNAS box with a Raidz1 pool. This machine holds the VMs backing a personal hypervisor, as well as the torrent seedbox for a full InfoCon mirror. It’s original pool is 7x 3TB Hitatchi drives. This reached capacity last week.

I procured 7x 8TB drives and planned on doing a one-by-one drive swap and resilver. I’ve done this previously before, but never on a system this full.

Well, I think I’m regretting it. I started the first drive around 11:30pm on July 24th, and it’s still in “scanning” status and the time remaining counter only keeps climbing. Due to the active nature of the VMs I’d rather not scrap the array and rebuild if I can avoid it. If I do end up with dataloss, I have everything backed up through BackBlaze so it’s not critical.

Since the InfoCon mirror can be reseeded from distributed sources, I’m wondering if I can erase the contents (which is about 80% of the existing pool size) and if this will have any effects on the resilver operation. However, given it’s still calculating parity I’m not sure this is going to be effective; I don’t know if it will recalculate everything regardless, or if removing the index for said data in ZFS will actually halt that massive IO operation.

Thoughts?

-Jim

EDIT: so apparently one of the freenas tools doesn’t do estimates very well; zpool status shows a much more reasonable output.

I’m still curious if, once this drive completes, it would be better to delete the 10TB volume that InfoCon currently sits on since that’s an easy re-pull for the most part (resyncing that archive from the upstream source takes about 3-4 days for all of it sans the DEF CON torrent, which takes about a week). I’m the only documented full mirror for InfoCon outside of the official source so I get pretty heavy pull, but since I’m far from the only copy it’s fine to resync it.

root@sybil:~ # zpool status
pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0 days 00:00:24 with 0 errors on Wed Jul 22 03:45:24 2020
config:
NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
da6p2 ONLINE 0 0 0
errors: No known data errors
pool: sybil
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Jul 24 23:29:38 2020
7.23T scanned at 151M/s, 6.09T issued at 127M/s, 11.7T total
884G resilvered, 51.81% done, 0 days 12:58:39 to go

lukeiamyourfather · July 25, 2020, 7:38pm

If the pool is full then it’s also likely heavily fragmented which will make the initial stage of the resilver slower. Assuming you’re using a recent version of FreeNAS, it’ll scan the whole pool and create a map of where stuff lives and then it’ll issue more or less sequential commands to resilver. Once it gets to that second part of the resilver it’ll speed up big time. Since ZFS is both a file system and a volume manager it knows what blocks are in use and will only resilver those (unlike RAID which will rebuild all blocks no matter what) so deleting some data and the snapshots that hold that data would make the resilver faster.

If you have enough SATA/SAS ports or another machine (and a high speed network) then I’d use send and receive to replicate the data to a new pool rather than replacing one drive at a time. It would be safer and probably faster. You could keep it running while the replication happens then take a snapshot at the end and replicate only the differences. The downtime could be minimal.

Food for thought on ZFS in general. RAID Z isn’t really used with hard drives anymore. RAID Z2 has become the de facto because drive capacities are becoming impractically large for RAID Z. Single parity might make a comeback when declustered RAID makes it into ZFS (has been in testing phase for a few years or more).

lukeiamyourfather · July 25, 2020, 7:41pm

The “scanned” part is the mapping it does before actually starting the resilver. The “issued” part is the actual resilver progress being done in more or less sequential fashion.

hon1nbo · July 25, 2020, 7:43pm

Surprisingly, about 10TB of that 13TB pool was formed in the last week and fairly contiguous. The Scrubs take about 30 seconds for the whole volume.

This is my goto in most situations, but not feasible at this particular moment.

Yeah I know, in this case it’s purely economics for a non-business use where read operations are king. Data is all backed up in various places (3-2-1), and it’s not costing money or anything if the pool goes down. Also the aforementioned problem with not having another unit to just do a send-receive with the current pools.

Yeah this output is much more like I was expecting; I was worried because I was getting scan-only results with absurd timers, without any issuing status. It was just a quirk of the interface.

lukeiamyourfather · July 25, 2020, 7:45pm

If you haven’t already enable the attribute to automatically expand the pool. Otherwise it’ll still be the same capacity once all of the drives have been swapped. I know this has to be done before the last drive is swapped. I don’t know if it has to be done before the first drive is swapped…

hon1nbo · July 25, 2020, 7:54pm

FreeNAS enables this attribute automatically.

do you by chance know if taking out that 10TB written in the last week would make a notable difference? It’s easy enough to resync that chunk; the sudden grown was that my old infocon box was being moved to this unit, and I started mirroring the secondary content that I hadn’t before which added an unexpected 5 TB.

lukeiamyourfather · July 25, 2020, 8:25pm

Yes, reducing the amount of data would decrease the resilver times proportionally. If the data is in snapshots then those would have to be deleted too.

hon1nbo · July 25, 2020, 10:11pm

Whelp I’m going to nuke some hardware that’s backed up and use it to go the other route and eat the downtime on that hardware: I found the pool is operating on these big drives with a 512B block size rather than the native 4k

Even with the economics of this operation, that’s too disgusting to tolerate. So I killed the MDADM dataset on a different machine that’s got a cloud backup, am going to transfer the VM data to this, and then nuke this pool from orbit. It’s the only way to be sure.

Thanks Luke for chiming in.