Then you didn’t plan out your hardware properly. Of course SW raid is going to use CPU to process disk operations, and depending on what type of filesystem you are using it may use RAM too. If you stress your CPU out 100% then there is no CPU left for disk operations. HW raid controllers offload these compute tasks to specialized chips on the card (some dont, those are fake raid). Yes I’ve put very heavy load on ZFS systems and didn’t have IOWAIT on my scsi targets. I’ve also used MDADM, windows raid, and storage spaces with great results. I do not know everything and sometimes have to consult the google machine for settings that I may not know about, but any issues are easily solvable.
I am aware there are ways to get more than 8 drives on a raid controller but you reach the limit of the pci express lane really quickly, especially when you are running all flash arrays. (At least through my research last year.) In this case you need to run multiple controllers in multiple pcie ports and the costs just keep adding up.
I mean, when I talk about autoconfig of smartmon, I mean things like this:
Instead of things like:
Personally, I have come to rather enjoy my software raid. It works really well. Speed is a bit faster on my older hardware raid.
Aye. Had no idea what I was getting into with the first MXroute node
Production servers does not work out of the box, you need to configure them as per your requirement.
You can use megacli to monitor your drives if you are using LSI based RAID card, we have a cron the runs once a day and emails us drive status
We actually only offer hardware RAID. The quote I think you meant to attribute to HV is, SSD is very reliable and the failure rate of some RAID cards might actually be higher than the SSD so it might be worth considering a single SSD + a backup drive over RAID1.
If there’s one thing I’ve learned about steps taken to increase redundancy to reduce the impact of failure:
It means nothing in the end, I’ve never witnessed solid numbers showing that the intended reality played out as any better statistically. Limited perspective of the things I’ve seen, everything fails, especially SAN.