Wednesday, September 03, 2008

Clouds, Private Clouds and Data Center Automation

As part of Pacific Crest’s Mosaic Expert team, I had the opportunity to attend their annual Technology Leadership Forum in Vail last month. I participated in half-a-dozen panels and was fortunate to meet with several contributors in the technology research and investment arena. Three things seemed to rank high on everyone’s agenda: cloud computing and its twin enablers - virtualization and data center automation. The cloud juggernaut is making everyone want a piece of the action – investors want to invest in the next big cloud (pun intended!), researchers want to learn about it and CIOs would like to know when and how to best leverage it.

Interestingly, even “old-world” hosting vendors like Savvis and Rackspace are repurposing their capabilities to become cloud computing providers. In a similar vein InformationWeek recently reported some of the telecom behemoths like AT&T and Verizon with excess data center capacity have jumped into the fray with Synaptic Hosting and Computing as a Service - their respective cloud offerings. And to add to the mix, terms such as private clouds are floating around to refer to organizations that are applying SOA concepts to data center management making server, storage and application resources available as a service for users, project teams and other IT customers to leverage (complete with resource metering and billing) – all behind the corporate firewall.

As already stated in numerous publications, there are obvious concerns around data security, compliance, performance and uptime predictability. But the real question seems to be: what makes an effective cloud provider?

Google’s Dave Girourad was a keynote presenter at Pacific Crest and he touched upon some of the challenges facing Google as they opened up their Google Apps offering in the cloud. In spite of pouring hundreds of millions of dollars on cloud infrastructure, they are still grappling with stability concerns. It appears that size of the company and type of cloud (public or private) is less relevant, and more relevant is the technology components and corresponding administrative capabilities behind the cloud architecture.

Take another example: Amazon. They are one of the earliest entrants to cloud clouding and have the broadest portfolio of services in this space. Their AWS (Amazon Web Services) offering includes storage, queuing, database and a payment gateway in addition to core computing resources. Similar to Google, they have invested millions of dollars, yet are prone to outages.

In my opinion, while concerns over privacy, compliance and data security are legitimate and will always remain, the immediate issue is around scalability and predictability of performance and uptime. Clouds are being touted as a good way for smaller businesses and startups to gain resources, as well as for businesses with cyclical resource needs (e.g., retail) to gain incremental resources at short notice. I believe the current crop of larger cloud computing providers such as Amazon, Microsoft and Google can do a way better job with compliance and data security than the average startup/small business. (Sure, users and CIOs need to weigh their individual risk versus upside prior to using a particular cloud provider.) However for those businesses that rely on the cloud for their bread-and-butter operations whether cyclical or around-the-year, uptime and performance considerations are crucial. If the service is not up, they don’t have a business.

Providing predictable uptime and performance always boils down to a handful of areas. If provisioned and managed correctly, cloud computing has the potential to be used as the basis for real-time business (rather than being relegated to the status of backup/DR infrastructure.) But the key question that CIOs need to ask their vendors is: what is behind the so-called cloud architecture? How stable is that technology? How many moving parts does it have? Can the vendor provide component-level SLA and visibility? As providers like AT&T and Verizon enter the fray, they can learn a lot from Amazon and Google’s recent snafus and leverage technologies that can simplify the environment enabling it to operate in lights-out mode – making the difference behind a reliable cloud offering and one that’s prone to failures.

The challenge however, as Om Malik points out on his GigaOm blog, is that much of cloud computing infrastructure is fragile because providers are still using technologies built for a much less strenuous web. Data centers are still being managed with a significant amount of manual labor. “Standards” merely imply processes documented across reams of paper and plugged into Sharepoint-type portals. No doubt, people are trained to use these standards. But documentation and training doesn’t always account for those operators being plain forgetful, or even sick, on vacation or leaving the company and being replaced (temporarily or permanently) with other people who may not have the same operating context within the environment. Analyst studies frequently refer to the fact that over 80% of outages are due to human errors.

The problem is, many providers while issuing weekly press releases proclaiming their new cloud capabilities, haven’t really transitioned their data center management from manual to automated. They may have embraced virtualization technologies like VMware and Hyper-V, but they are still grappling with the same old methods combined with some very hard-working and talented people. Virtualization makes deployment fast and easy, but it also significantly increases the workload for the team that’s managing that new asset behind the scenes. Because virtual components are so much easier to deploy, it results in server and application sprawl and demands for work activities such as maintenance, compliance, security, incident management and service request management go through the roof. Companies (including the well-funded cloud providers) do not have the luxury of indefinitely adding head-count, nor is throwing more bodies at the problem always a good idea. They need to examine each layer in the IT stack and evaluate it for cloud readiness. They need to leverage the right technology to manage that asset throughout its lifecycle in lights-out mode – right from provisioning to upgrades and migrations, and everything in between.

That’s where data center automation comes in. Data center automation technologies have been around now for almost as long as virtualization and are proven to have the kind of maturity required for reliable lights-out automation. Data center automation products from companies such as HP (on the server, storage and network levels) and Stratavia (on the server, database and application levels) make a compelling case for marrying both physical and virtual assets behind the cloud with automation to enable dynamic provisioning and post-provisioning life-cycle management with reduced errors and stress on human operators.

Data center automation is a vital component of cloud computing enablement. Unfortunately, service providers (internal or external) that make the leap from antiquated assets to virtualization to the cloud without proper planning and deployment of automation technologies tend to provide patchy services giving a bad name to the cloud model. Think about it… Why can some providers offer dynamic provisioning and real-time error/incident remediation in the cloud, while others can’t? How can some providers be agile in getting assets online and keeping them healthy, while others falter (or don’t even talk about it)? Why do some providers do a great job with offering server cycles or storage space in the cloud, but a lousy job with databases and applications? The difference is, well-designed and well-implemented data center automation - at every layer across the infrastructure stack.


Mahesh said...

It is interesting that with all this talk about cloud computation and the related automation nobody talks about the kinds of applications that will work well on this computing ecostructure. E-Mail and related applications scale well on a cloud computing platform due to the limited functionality it offers and the ease of partitioning of the application. All application don't lend well to this platform. For example, badly written applications will perform poorly whether you run it in a cloud or not. recognized this and created the Apex structure for writing scalable application in such a hosted environment.


Venkat Devraj said...

Valid point, Mahesh. That's one of the reasons I guess, why we are seeing newer apps being designed for virtualized containers and cloud computing, while the majority of legacy apps stay hard-wired to physical servers. The cloud is not meant to be a panacea for ill-designed apps.