‘Mistakes Were Made’
Six months after starting this project, I received an email from Ryan Ellis asking me if I had any tips regarding this NAS build.
Just read your article on the NAS. Amazing job. I would like to replicate what you did. Any tips?
Well, I most definitely do have a list of things I’d have done different! Nothing too dramatic, but there are some decisions that were overkill and some where I went a little too low end.
- Using Intel and Asus motherboards: The Intel DP55KG w/ P55 Express chip set is not liking the Ubuntu 10.04 LTS Linux, or apparently any Linux distro for that matter. Specifically, the NAS box with the Intel mother board is unable to do a soft reboot. That means every reboot requires my physical presence in the data center. This has been a known problem for a while but it did not turn up during my mobo research. Many folks have tried various kernel options to change the rebooting behavior with mixed success. I’ve not been able to resolve the issue. When building up the NAS box I told myself that the Linux community would eventually resolve the issue. Maybe it has, but now that we are in production I can’t really experiment with the server.
Lesson Learned: If the mobo is not working perfectly for you, then find another. It’s too painful to revisit once in production.
- Not using “server grade” motherboards: Linux is unable to monitor things on the Asus and Intel motherboards, like fan speed and temperature, that I’d like to be graphing in Cacti. This is apparently possible with the “server grade” budget motherboards from the likes of SuperMicro.
Lesson Learned: It only saved us SFr.400-800 to use these performance desktop motherboards, but our ability to proactively monitor fans is lost. I wish I’d gone for a SuperMicro motherboard.
- The network load is much lower than I had realized, so the Intel Quad-port NIC is overkill–not even 100 Mbps at peak usage! This is apparently due to the client side file cache on our client server machines. This was difficult to predict on our old system because we were running with direct attached storage. In hind sight I wish I’d done more research. The two Intel PRO/1000 PT Quad Port Server Adapter could have been single port NICs, saving us SFr.800 total.
Lesson Learned: Try to accurately measure and predict how much network traffic you’ll see. Did I really need four port NIC bonding? No even close.
- I didn’t pay enough attention to adapter-to-drive cabling. The LSI 3ware 9650SE-ML16 card came with 1-to-4, Multilane-to-SATA breakout cables, but the SuperMicro SuperChassis 836A-R1200B came with backplane with four Multilane ports. That ment sourcing four CBL-SFF8087-05M Multilane-to-Multilane cables, an extra cost. And when I did get them, two were ~10cm shorter than I would have preferred–the cables are currently a bit tight and cannot be moved without loosening the connection. We probably spent another SFr100 on extra cabling.
Lesson Learned: At least think about device-to-device cabling beforehand, and don’t leave until the build.
- RAID 1+0 may have been overkill, RAID 6 performance would probably have sufficed. Our production metrics seem to indicate that we run at no more than 33-40%, conservatively, of capacity at peak, and the vast build of our NAS activity is reads. RAID 6 probably would have been a safe choice, and doing so would have reduced the number of hard drives by 6 total (3 on each server), which would also have allowed us to use a smaller chassis. Total savings would have been SFr 1700-2000, a non-trivial amount.
My wife, Robyn, helping me build up one of the NAS servers
That said, we would be reducing our margin for error, room for future growth (there are currently two empty drive bays on each server), and not allowed changes in application behavior which would result in more writes. (RAID 6 is great for heavy read applications, like ours, but have much weaker write performance characteristics.)
- I did not appreciate how little I understood drbd, or block-level replication for that matter. This resulted in taking poorly understood actions on production data. In hind sight, it would have been wise to setup a test environment on the side (Amazon EC2, some old kit, whatever) and experimented. If I had made a mistake, we would have had to implement our disaster recovery procedures, which are time consuming and resulted in non-trivial down time.
Lesson Learned: If it works like magic, then you don’t have a clue how it works. For something as fundamental as DRBD is to a redundant NAS system, one should make decisions ad novum, ‘with intent’.
- Setting up the monitoring was significantly more work that I had predicted. While our Cacti + SNMP setup is very powerful, it is not easy to get going for anything but very common metrics. Specifically, configuring important alerts for things like drive failures, or graphs of NFSv4 metrics has been a considerable amount of work. In fact, I’ve had to come up with my own NFSv4 Cacti template which, to my surprise, did not exist.
- These boxes are heavy. Like in the 30kg region. Installing them into the rack alone, even with the assistance of a foot-actuated hydraulic lift, was difficult and borderline dangerous. Managing to get the rails aligned correctly was very challenging.
60kg of Network Attached Storage
Lesson Learned: Don’t install anything other than a switch alone.
- WD Green versus WD RE4 drives: We could probably have used cheaper WD Green drives instead of the RE4 series “Enterprise Hard Drives”. Ryan Shrout and Patrick Norton talk about the apparent fallacy that WD Green drives are not suitable for a NAS in Episode #95 of This Week in Computer Hardware. The cost savings is huge. Currently at Digitec.ch, where we bought our drives, a WD Caviar RE4 2TB runs for SFr255 and a WD Caviar Green is SFr109–a SFr146 savings. With the 22 data mount hard drives in our build, that works out to SFr3,212! And we could have saved an additional SFr~168 on the operating system drives too.
All of that said, we are in production and everything works. More dramatically, this project after a mere six months has already resulted in a positive return on investment, when accounting for hardware costs alone. Factor in the time I spent on this project, 60-80 hours, and we will be in the black some time in Q1 2011. Not bad. (This self-built approach was taken in favor of outsourcing our storage to our hosting company’s shared NetApp NAS at a TB/month rate.) It also has been a wildly educational experience and forced me to understand my application even more than before.