SSD Migration in the Enterprise Environment

SSD technology has hit like a hurricane leaving enterprise in awe of its potential yet leery of high cost for a storage medium still in its infancy.

Todays business is reluctant to implement SSD storage for reasons that run the gamut from reliability and durability to long-term performance gain concerns.

The inherent cost of slc drives is self defeating for a small business and time has yet to put its stamp of approval on mlc design.

In hopes of alleviating concerns of todays business through our success in SSD migration, I hope to address specific concerns, investigate today’s enterprise SSD options and share my experience with SSD use in a real enterprise production environment.

ENTERPRISE SSD TYPES

There are three major types of enterprise SSD today. Those are DDR, SLC NAND, and MLC NAND. Each holds a particular forte when used in an enterprise environment. For data that requires the highest levels of performance where cost is of no concern, DDR based arrays are preferred as there is no storage medium available that can read and write faster that DDR-based arrays.

Typical DDR Array

Texas Memory Systems and Violin Memory manufacture the most commonly used arrays today. These arrays are typically seen in a shared-storage (SAN) scenario where cluster databases, applications, virtualization platforms, and digital media that require the highest levels of performance and redundancy are stored. Remember, cost is of no concern in this scenario.

Business applications are showing common use in an HPC type of environment where speed is the most critical factor in their selection process; a process where nanoseconds can be critical to the end result.

Typical SLC NAND CardTypical SLC NAND Card SLC NAND is available in several form factors including rack mount arrays (like the DDR arrays described above), PCI-Express cards, and 2.5 SATA drives. The most commonly used form factor for SLC use in the enterprise is PCI-Express cards. There are many manufacturers of PCI-Express based SLC SSD today. Fusion-io, Texas Memory Systems, Virident, and OCZ manufacture the most commonly used SLC-based PCI-Express SSD today.

Typical SLC Card

SLC NAND is beneficial for companies that want the speed of DDR-based arrays without the complexity of shared storage and high cost of DDR-based arrays. Typically, servers can have one or more SLC NAND PCI-Express cards installed for data storage where high levels of performance is needed. In most scenarios, this data is databases, virtualization platforms, read/write cache for applications, and digital media. SLC NAND is cheaper than DDR because it costs less to manufacture. The downside to SLC is the lower lifecycle of the storage in comparison to DDR. If it is written to extremely heavy over time, cells can become unusable meaning data cannot be stored in those locations.

MLC Drives

MLC NAND is available in PCI-Express card and 2.5 SATA drive form factors from many different manufacturers. Companies using MLC NAND today typically use it as a shorter-term storage solution for high performance and lower cost than SLC NAND, and in the same usage scenarios as SLC. Its downside is a lower write lifecycle than SLC NAND. MLC can be useful if it is implemented correctly and expectations are set accordingly.

MLC SSD MIGRATION

Our company utilizes MLC NAND SSDs in our systems today simply because of the high performance, lower cost, and simplicity of use. When our traditional SAN was off lease, we decided to move forward with local MLC SSD storage in each server needing high performance storage. SAN, with the performance characteristics we desired, was just far too expensive and it wasn’t business wise to spend hundreds of thousands of dollars on a SAN where we could get equal or better performance for far less money and far less complexity. Further savings was also achieved as we didn’t require a trained SAN administrator, many 42u cabinets, fiber channel switches with HBAs and fiber interconnects, a billion amps of power, and tons of cooling with this plan.

For our Oracle & SQL Server Databases, Microsoft Hyper-V, and VMware ESX platforms, we chose the Dell PowerEdge R910 as the server platform. This was preferred as it offered 4 CPU sockets, up to 1TB of RAM, up to 16 2.5 drive bays, multiple 10 Gigabit options, 4 power supplies, as well as a plethora of PCI-Express slots. This allowed us to have 32 CPU cores in each 4u server with high memory and storage options.

For storage in those servers, we evaluated many vendors and types of storage. We chose the OWC Mercury Extreme Pro Raid Edition SSD for its relatively low cost, very high performance, and 5-year replacement guarantee. If cost was no object, we would have chosen the Fusion-io ioDrive Duo SLC with several cards per server to meet the storage and redundancy requirements, however, the 5-year warranty along with the much lower cost made the choice far more economical.

With 16 drives in each server in a RAID10 configuration, the likelihood of over half of the drives failing simultaneously is extremely remote. We have a tried and tested disaster recovery plan which is a must for any company in todays world. We also have several identical SSDs on site should failure occur but in 6 months of use thus far, we have experienced only one single failure of a drive. OWC had the replacement delivered within a reasonable timeframe.

Pegged CPUs from SSD Usage

We have seen massive productivity gains from our Development and Quality Assurance (QA) Teams as a result of the reduction in time necessary to test new features, along with regression and functionality testing. They are not waiting on the database as long as they previously did and can complete their jobs much faster. This results in far more far more development and testing within the same time span which then enables thorough evaluation of our code with the inclusion of new features faster and easier.

We struggled to reproduce the issues reported by our clients as our storage was slower and not capable of equal throughput and volume. OWC SSD storage changed that paradigm for us.

At times, we have to test lower performance storage to try to mirror performance seen at our client sites. Our our day-to-day development and regression/functionality testing is done on our servers with OWC SSD for the sole reason that the jobs can be completed in a fraction of the time required on our legacy systems. It is unexpected that we classify SATA/SCSI/FC SAN storage as legacy storage today and we have yet to see a spindle disk-based SAN outperform SSD storage. They exist and serve the sole purpose of occupying the space in 42u cabinets along with 60+ amps of 208v/480v power costing a great deal of money.

Our company has no regrets with our migration over to MLC NAND-based SSD.

CLOSING THOUGHTS

Oracle IO Calibration TestIn conclusion, companies interested in high performance SSD today would be well advised to start a pilot project or trial in areas they think would benefit them most. Setup a development/testing environment and fully test your scenarios. Be sure to check out and test the support of your chosen vendors. If they cant replace a faulty drive or product in a timeframe that is acceptable to your needs, there are many who can,

Oracle IO Calibration Test

Evaluate as many vendors as possible; choose the one that best suits your needs in performance, reliability, and support. As we did, instill confidence through availability of cold spares on hand and you just may just gain a similar view that SSDs are an extremely viable and cost effective choice and quite possibly the future of enterprise storage. With costs falling as they are, it is only a matter of time before it is widely adopted in large enterprises.

ABOUT THE AUTHOR

Noel Lucas has been managing IT Operations for a medium-sized independent software vendor catering to the financial sector for 12 years. He holds a BS in Network Design & Management from WGU, as well as many certifications from Microsoft and Oracle. Performance tuning and tweaking systems is his forte. He resides in Georgia, USA, with his wife and young son.

DISCUSSION HERE…

8 comments

  1. blank

    Great review we to are in testing phase of rolling out 16 x 2.5SSD MLC servers for high volume media sites we host. It actually reduces our costs by 50% and reduces the amount of servers required.

  2. blank

    Thank you for the kind words, Peter. I am glad you enjoyed it and that more companies are adopting SSD.

  3. blank

    “With 16 drives in each server in a RAID10 configuration, the likelihood of over half of the drives failing simultaneously is extremely remote.”

    With any RAID 10 array you will loose all your data if two simultaneous drive failures occur in the same RAID 1.

    • blank

      Actually, Andreas, RAID10 is simply striped groups of RAID1(mirrors). In RAID10, all but one drive from each RAID 1 group could fail without damaging the data.

      • blank

        Noel, I’m fully aware of the way a RAID 10 works. In your 16-drive scenario you have 8 mirrors that are then striped. I don’t see the relevancy of pointing that the likelihood of over half of the drives failing at the same time is extremely remote without mentioning that in a worst case scenario it’s enough that two drives in the same mirror fails for you too loose all the data in your 16-drive array.

      • blank

        Andreas, I see what you mean now. I should have worded it differently for clarity. It will be edited shortly. I appreciate your feedback.

  4. blank

    Andreas, I run 8 disks in each disk group, with 2 disk groups. Two disk failures in a single DG would not cause me to lose data, even four or 8 wouldn’t. How do I know this? I’ve tested it extensively before rolling it out into production. Most enterprise-grade RAID controllers allow you to choose how many disks you want in each disk group when building your RAID10 disk groups. Most controllers would default to eight 2-disk groups, but I felt that wasn’t enough protection, and that is why I chose two eight-disk groups.

    • blank

      Interesting, what controller are you using?

      If eight 2-disk groups is a raid 10 then two 8-disk groups is a raid 01. If you can handle loosing 8 drives in a disk group that means the mirroring must be between the disk groups (thus being raid 01) or that you have eight 2-disk mirrors with each mirror having one disk in each of the two disk groups.

Leave a Reply

Your email address will not be published. Required fields are marked *