Tuesday, February 23, 2010

What’s the “Right Way” to Automate the Application Stack?

Cloud computing is forcing IT organizations to rethink automation. Early adopters started out with the delivery of servers, storage and network connectivity in self-service mode via an internal cloud. While this helped reduce service delays, real service level improvements weren’t forthcoming. Application owners and end users were still experiencing significant elapsed times between service request and delivery. It was becoming obvious to the IT thought-leaders that merely provisioning and offering up “ping, power and pipe” quickly wasn’t going to have a meaningful impact. That just shifted the bottleneck up the stack into the application layers. The new focus of the cloud revolves around rapid deployment and mass management of applications. Without the ability to provision and manipulate discrete application services, the value of the cloud is stunted.

As the recent acquisition of Phurnace indicates, the traditional data center tools vendors have started figuring it out and are attempting to offer solutions that deliver automation aimed at the application layers. The advent of these vendors and their mainstay strength in server provisioning is bringing forth multiple interesting approaches to application automation. Approaches vary from the composite policy-based (but hands-off) VMware vApps strategy to the automation of selective app admin tasks for specific app components such as Java code deployment (e.g., Phurnace) to broader automation platforms with out-of-the-box modular content to address the entire administrative lifecycle of various app components (e.g., Stratavia). The key question that emerges for customers is “what’s the right way to automate applications?” Conventional wisdom says if your only weapon is a hammer, every problem looks like a nail. Armed with robust server virtualization and provisioning toolsets, some of the larger vendors are approaching the application stack with (no surprises!) a focus on provisioning. However at the risk of sounding clichéd, I have to say that automating application administration is a very different paradigm. 

You see, at the server layer and below (including storage and networking), significant admin time is taken up in provisioning. Post-provisioning activities such as patching and configuration management are often handled via provisioning – i.e., by reimaging the OS with an updated (patched/reconfigured) image. So all in all, provisioning is the key administrative function at these lower layers.

However as you go above the server, you encounter discrete application layers such as webservers, application servers and databases – to name the most popular components. Each of these dictate a different operations lifecycle that is not dominated by provisioning and configuration management. In fact, provisioning barely takes up 15-20% of the typical App Admins’ time. The remaining 80% time is spent on other post-provisioning activities such as maintenance, incident response and service requests (the graphic below provides examples of these task categories).

But then the traditional vendors ask, why do these App Admins do all these things in the first place? Why can’t they, like Sys Admins and Network Admins, handle these other activities via Provisioning/Re-provisioning? For instance, instead of applying a new patch or performing a maintenance operation or doing a code release, why can’t the App Admin just provision a new image that has the desired changes? That will eliminate the need for these other post-provisioning activities and free up App Admin time.

This type of argument does not work because it goes against the grain of real-world application management. Instances of an application component frequently develop a unique fingerprint over the course of their use. This fingerprint is based on several factors including security requirements, performance adjustments, and the experience and skill-level (a.k.a. best practices) of the Admins managing them. To make things worse, these fingerprints can be dynamic in nature. The same application may look different at different times of the day. For instance, a database may serve as a transactional data-store during regular business hours and may be configured specifically to facilitate smaller read/write I/O operations, whereas at night time, the same database may be converted into a batch database with different buffer sizes, log file locations, etc. to facilitate bulk writes. Bland categorization and re-imaging of the application server (say, to apply a new application patch) by the Sys Admins causes much of this dynamic application fingerprint to be lost - creating in turn a lot more work for the App Admins (to restore the fingerprint as much as they can – assuming they themselves remember all the changes and can get it right the first time!)

The other challenge is that a single server frequently hosts multiple application types and instances. This is commonly encountered even in large-scale production environments. Each of these instances may have their own maintenance window and need to be patched/ reconfigured/upgraded and managed individually based on instance-specific standards and dependencies. Reimaging (the entire server) doesn’t afford granular control of individual instances. Now one may argue that this was more prevalent in the olden days when servers were expensive and with virtualization being commoditized now, each application instance can reside on its own server image thereby eliminating this problem. But reality goes deeper than that. It’s not just server resource optimization that called for multiple application types and instances to reside on the same server; cohabitation was also tied to performance, security and other considerations. For example, in the case of a performance sensitive application that utilizes federated databases, a DBA may elect to keep some of the associated databases on the same physical OS to minimize context switching and network latency incurred due to physical separation of the databases onto different servers. Regardless of whether the servers and underlying network adaptors are physical or virtual, the location of the underlying databases can make a big difference in terms of response time for a data-join operation being in sub-seconds versus minutes. Thus without proper understanding of the various application types, and related design considerations (such as transaction types, application access methods, data volume, affinity, partitioning, etc.) and best practices, choosing the wrong automation method can result in degraded service levels.

Attempting to offer application management in the cloud with just an application image provisioning model is akin to showing up to a gun fight with a rock. Proponents of this model can claim that rapid provisioning and policy-based reimaging of the relevant application components is the new way of application management in the cloud. While this approach may work for a handful of admin functions, it does not offer granular control or a pragmatic framework for most mandatory post-provisioning tasks (represented in the graph above) and hence will be discarded by savvy App Admins. Only solution providers that have a true application management DNA (with a deep understanding of task patterns associated with various application components) and offering automated application management capabilities out-of-the-box can win legitimate mind-share in the near term and sustainable market share in the long run.

2 comments:

Unknown said...

Good artcile Venkat. Your lifecycle visual is right on the money! Talking of hammers its hard to tell whose at the top of the pile: BMC or VMWare. BMC just might pip VMWare with their tunnel visioned view of configuration management.For them,Every problem is a config problem and or can be solved by adjusting the server config. Yeah right!

Ed Lind said...

Change is in air at BMC. Dunno if you heard but the president of their Enterprise Service Management deivision (and former CEO of their Bladelogic acquisition) Dev Ittycheria just quit. The fresh blood (or the old guard) might expand their focus beyond config mgmt especially if the recent Phurnace acquisition is anything to go by.