SAN Virtualization Guidelines
Now that SAN "plumbing" has matured with an ample collection of Fibre Channel products, it's time to turn our attention to fully harnessing the storage assets at the other end of the light beams. That takes us into the realm of SAN virtualization.
While SAN connections widen the pipes and stretch the distance between disks and hosts, the new plumbing alone does little to reconcile the conflicts among servers competing for scarce disk space. You can look at SAN virtualization products as capacity brokers in this chaotic environment. In their simplest form, they collect all or portions of the SAN's physical disks into a pool, and hand out logical slices to needy application servers without having to re-cable or rezone the SAN.
Properly architected, virtualization provides many benefits, such as the ability to allocate storage resources on-demand, integrate storage products from multiple vendors, configure selectively for high availability and reduce the total cost of ownership. Choosing a virtualization product is the challenge. We'll give you some guidelines that our customers use, and consequently, influence our solutions.
At last count, five divergent approaches to sharing virtual disk capacity have emerged in the SAN market, spanning about 10 discrete implementations. Ranging broadly in price, performance, and utility, these virtualization solutions can be categorized by the methods they use to translate the the physical reality to the host's logical view. The effectiveness of each technique is essentially determined by where in the SAN the mapping takes place and what platform is used to deliver the services. Our taxonomy lumps the offerings into:
Note: Some virtualization engines are packaged versions of storage domain servers.
- Multi-host storage arrays
- Host-based LUN masking filters
- File system redirectors via outboard metadata controllers
- Specialized in-band virtualization engines
- Dedicated storage domain servers
While virtualization suppliers' claims are often indistinguishable, there are seven criteria that determine the success and viability of each approach:
Ultimately, the best choice for virtualized SANs must provide unprecedented levels of reliability, availability and scalability, while serving as the basis for advanced storage services and management.
- The degree of independence that these products provide from a host's operating system and file system.
- The broadness of support for a mixture of storage hardware.
- The ability to protect investments in legacy storage assets.
- The ability of the security policy to share virtual resources while adequately excluding uninvited guests.
- The effectiveness of the technology at minimizing losses due to planned and unplanned downtime.
- The breadth of devices consolidated into a centralized management view.
- The ability to leverage commodity hardware and storage devices for improved performance and functionality at reasonable cost.
This is a critical point. Several suppliers have elected to place virtualization software on the hosts - each and every host, that is. These vendors' engineering teams are spending a lot of time just on porting and qualifying software to every operating system. This process compromises focus and energy. History has shown that this strategy is difficult to maintain given all the version changes across many host environments. And the fact remains that this approach requires IT staff to install intrusive, processor-consuming software on each host or risk problems. Host-based solutions can mean only one thing for the system administrators: more headaches every time a system is added or updated.
Mixed Storage Support
The usual answer from many vendors is "Don't mix. It's hard. They don't interoperate." Translation: vendor lock-in. While choosing products from a single vendor can provide a certain level of near-term comfort, in the long run you are compromising your ability to respond to change. Fortunately, a few suppliers without allegiances to specific storage hardware are far more liberal, willing, and most importantly, able to put nearly anyone in their storage pool as long as it talks Fibre Channel.
Legacy Investment Protection
How much of your current disk population is Fibre Channel (FC) -ready? If your mix includes SCSI, EIDE or SSA drives, the SAN virtualization choices get slim. Of course there are Fibre Channel routers and bridges that could be worked in for additional cost and complexity. Better instead to look for storage pooling products that have built-in support for your existing interfaces. Properly done, the hosts won't know the difference between virtual devices coming from an FC drive and a native SCSI spindle (performance of the hardware aside).
Security and host independence are somewhat intertwined. Depending on host-based software or hardware to implement the security layer for shared access control over a SAN is misplacing the authority. A rogue host doesn't play fair - it can read and write to any disk in the pool, unintentionally corrupting a neighbor's data. Steer towards outboard security implementations that centralize access control and you'll sleep nights. There's another benefit: with the growing importance of personal privacy in the e-Commerce world, an outboard security implementation simplifies the auditing of data trails.
Resiliency to Outages
Buying devices in pairs to protect against failure is simply not the best way to spend the IT budget, even though it may be a common practice. The more practical (and effective) way is to amortize redundancy across many resources in an N+1 fashion. In other words, when you need five units, buy six, not 10, and you'll have a great combination of availability and cost-savings. Make sure your virtualization solution supports this capability - not all of them do, and can cause an unanticipated increase in cost of ownership.
Some define centralized storage pools and storage management as limited to disks within one box, or one vendor's line of products. What is your definition? We believe you should look for centralized administration that includes pooling all the disks across a network, regardless of where, how many, or what make or model of storage is attached.
Price-performance leverage of the virtualization platform
For reasons already discussed, we feel that the virtualization engine should be outboard, and not a burden of the hosts. In this case, the platform for the virtualization becomes extremely important. Some vendors' products use proprietary or custom hardware and software to provide virtualization and other services. Naturally, this increases the development and testing costs, which the end user must ultimately fund. In addition, the performance and reliability of the system is more of a gamble, for which the end user bears a large portion of the risk. We feel you should look for solutions that leverage existing, proven, high-performance technologies that are cost-efficient, familiar, easily upgradeable, and extensible. This includes processors, storage devices, and operating systems. With that in mind, now you can plan on flexibly scaling your performance, redundancy and capacity based on your budget and business needs, rather than the other way around.
Back to the Choices
Let's compare the SAN virtualization alternatives and see how each one ranks against our criteria.
A multi-host array (Figure 1) puts the pooling responsibility at the storage subsystem level, usually with RAID controller firmware. This implementation offers favorable performance, as well as high availability configurations. Connectivity to many flavors of hosts is supported, but you can only buy the disks that come with the array. Perhaps the biggest drawback of this approach is that the size and makeup of the pool is limited to the array's monolithic enclosures. Spilling over means running multiple pools and losing allocation freedom and centralization. Although some vendors might offer centralized management for multiple arrays, there are unanswered questions about multi-vendor support.
One means of enabling storage pooling is to install specialized device drivers on each host to prevent that host from accessing storage resources that it doesn't "own." These LUN Masking drivers (Figure 2) are typically configured using a central management application that can be either host-based or outboard. Although this method might work well for small, controlled configurations, it introduces several complexities and costs in large data center and enterprise SAN operations. First, the LUN masking support must span a potentially wide spectrum of server platforms - as we noted earlier, this presents a significant challenge for the vendor to adequately supply and maintain. Also, because every single host must have the LUN masking driver, there is a performance hit to the host and therefore the network. Plus, change management across numerous hosts is tedious, costly and slow. Perhaps even more disconcerting is the ability for any "rogue" host without the proper LUN masking software to defeat the security controls of the shared resources and corrupt others' disks in the storage pool.
File System Redirectors
A third type of pooling technique involves the use of file system redirector software (Figure 3). Basically, file access control travels over the LAN, but disk data I/O moves over the high-speed SAN. Each host on the SAN requires software to facilitate the mapping of file names to block addresses, all brokered by an external metadata controller or file system manager. To be fair, these products are really targeted at offloading disk I/O traffic from LANs, rather than general purpose virtualized storage pooling. We've included them in our virtualization roundup since there is a level of storage abstraction in the design. Like LUN masking software, file system redirection is tied to specific operating environments and components must be installed on every host. Though the file sharing services offer value, they are not the best solution for general SAN virtualization and storage management. You should overlay file redirection software on a virtualized storage pooling service to get the best of both worlds.
Specialized In-band Virtualization Engines
These products provide virtualized storage pooling by consolidating the storage allocation and security functions on dedicated platforms that sit between the hosts and the physical storage (thus "in-band"). Typically, no additional software is required on the hosts, allowing the engines to support the diverse range of popular open systems servers. The virtualization engine (Figure 4) can incorporate a wide range of components and features. At one end are the entry-level products that strictly address simple storage pooling needs and require the purchase of external switches and storage devices to complete the picture. Others choose to embed switching support in the "appliance" bundle. Still others include disks, and appear very similar to multi-host arrays, but potentially at lower price points with greater configuration flexibility. The particular components of the appliance are not necessarily measures of quality, merely options.
You should note that there is a war raging between the out-of-band (outside the data path) and the in-band virtualization camps. Some argue that in-band products slow data access down, and that the failure of the virtualization platform could compromise availability. This is only true if the product is carelessly designed. The successful, intelligent storage control suppliers have proven that you can use caching and alternate paths to achieve big performance and availability payoffs. We've seen dramatically enhanced I/O response firsthand from JBODs and disk arrays that were supplemented with in-band virtualizations engines sporting advanced caching. As for survivability, configuring alternate paths is a long-standing, proven method for continuous availability that can be implemented for storage networks.
So, the true measure of success for these appliances lies in their ability to confront and deal with these technical challenges. Lesser implementations will expose themselves as single points of failure; intelligent ones will provide alternate paths and multi-node redundancy through classical networking techniques proven in the LAN and WAN space. Weaker products will experience a significant performance hit as data travels through the appliance; successful solutions will possess sophisticated, robust read and write caching algorithms that actually improve the performance of the physical disks under their control, while also leveraging the cache already built in to the disk arrays.
Storage Domain Servers
A storage domain server (Figure 5) is a commercial server platform dedicated to the virtualization and allocation of disk storage to the hosts. The virtualization function is implemented in software that runs as a network storage control layer on top of the platform's native operating system. This allows it to leverage many of the operating system's networking, volume management, device interoperability and security features. Some storage domain servers are designed to collaborate over the SAN. In this way, they distribute the load and management chores for a large storage pool while maintaining centralized administration. The hardware performance and number of storage domain servers can be optimized to site-specific requirements.
Storage domain servers are capable of adding value to the I/O stream by optionally performing host- and storage device-independent caching, in-band performance and load monitoring, snapshot and remote mirroring services, to name a few. The richer the feature set, the simpler it becomes to institute LAN-free and server-less backups, disaster recovery programs and decision support practices across the entire storage pool without regard to the supplier of the physical SAN components. The end result is a huge reduction in acquisition, administrative and upgrade costs with high return on investment (ROI) for nearly any type of SAN environment.
The similarities to specialized virtualization engines are not coincidence; many specialized appliances are simply storage domain servers with hardware and software add-ons. While they lose some of the flexibility of a storage domain server, these appliances bundle the necessary services in a plug-and-play solution at targeted price points. In the end, just as the deployment of network domain servers delivered significant advancement for LANs, storage domain servers promise to deliver the most compelling advantages of disk virtualization for SANs.
The recent flood of storage virtualization products presents an abundance of choices - and a fair share of confusion. To cut through the chaos, we've tried to identify the key factors that will influence each offering's long-term success and viability for both the end user and the vendor. These are summarized in Table 1 below. Ultimately, the best product for you will provide complete freedom of choice and high performance at a reasonable cost. That way, you - not the vendor - have control over your storage environment.