Friday, December 28, 2007

Gartner Show Highlights Need for More Intelligence in Data Center Automation

Gartner’s recent Data Center Conference was a blast! Between technical sessions, booth duty and a joint presentation on Data Center Economics with a client (Xcel Energy), I managed to sneak in a round of golf at Bali Hai (first time I ever played with a caddie…). While attendance at the event (3 to 4K maybe) was certainly not the highest in recent events, the conference sufficiently put the spotlight on 3 key areas: virtualization, automation and green data centers (in that order). Having been seen as the red-headed stepchild for the longest time (as compared to virtualization), it was good to see automation finally beginning to garner its fair share of attention (and IT budgets!).

There were dozens of tools and technologies with an automation theme. Not surprisingly, automation is also basking in the glory of the ripple effects from virtualization and green data centers. With virtualization boosting the number of “server environments”, each of which need to be individually monitored and managed, automation is being increasingly regarded as the other side of the virtualization coin. Similarly, with the awareness for green data centers increasing, I’m beginning to see innovation even in areas as obscure as automating the hibernation/shutdown of machines during periods of non-use to reduce energy consumption.

So given the plethora of tools and products, is every automation capability that is needed out there already invented and available? Not so. One of the biggest things lacking is “intelligence” in automation. Take for example run book automation toolsets. While many of them offer a nice GUI for administrators to define and orchestrate ITIL processes, they inherently introduce a degree of rigidness as well in the process. For instance, if there are 3 run books or process workflows that deal with server maintenance, the Maintenance Window is often hard-coded within each of the workflows. If the Window changes, all 3 of the workflows have to be individually updated. That creates a significant maintenance overhead. Similarly, if a workflow includes a step to open a ticket in BMC Remedy, the ticketing system’s details such as product type and version are often hard-coded. If the customer upgrades the Remedy ticketing system or migrates from Remedy to HP Peregrine, the workflow doesn’t function anymore! An intelligent process workflow engine would avoid these traps.

An intelligent automation engine is often characterized by the following attributes:

  • Centralized policy-driven automation – Policies allow human input to be recorded in a single (central) location and made available to multiple downstream automation routines. The Maintenance Window example above is a great candidate for such policy-driven automation. Besides service level policies (such as the Maintenance Window), areas such as configuration management, compliance, and security are well suited for being cast as policies.


  • Metadata injection & auto-discovery – Metadata is data describing the state of the environment. It is important for automation routines to have access to state data and especially be notified when there is a state change. For example, there is no point in starting the midnight “backup_to_tape” process as a 4-way stream when 2 of the 4 tape drives have had a sudden failure or are offline. The automation engine needs to be aware of what is available so it can launch dependent processes to optimally leverage existing resources. Such state data can be auto-discovered either natively, or via an existing CMDB, if applicable.


  • Event correlation and root cause analysis – Ability to acknowledge and correlate events from multiple sources and being able to leverage this information to identify problem patterns and root cause(s) would make automated error resolution more precise.


  • Rules processing – Being able to process complex event-driven rule-sets (not just simple boolean logic) allows triggering of automation in the right environments at the right times.


  • Analytics and modeling – Being able to apply dynamic thresholding as well as, analytical and mathematical models to metadata to discern future resource behavior is key for averting performance hiccups and application downtime.


  • During the Gartner show, I walked by many of the booths looking at what these supposedly “bleeding edge” vendors offered. Suffice it to say, I wasn’t dazzled. Rather than providing just asinine glue (which unfortunately, is what most of the current crop of process automation and orchestration tools are reduced to) to piece together multiple scripts and 3rd party tool interfaces and refer to it as “automation”, the customers I work with are increasingly interested in seeing offerings that leverage the above capabilities. Other than Data Palette, I don’t know of any products that do so. But then, you already knew that!