Tuesday, April 29, 2008

Run Book Automation Gets Smarter

Over a year ago, I had written about different data center automation options, including Run Book Automation (RBA) or as some call it, IT Process Automation. Since then there’s been lots of activity in that space. Many IT organizations that had merely been curious about this kind of technology in earlier months have now begun to earmark specific budgets to evaluate and deploy these tools. I’m beginning to see more and more RFPs in this area. Even though some of the vendors in this space like iConclude and RealOps have been acquired and seem to have lost some of the core talent and drive behind the technology, this area is continuing to see a tremendous amount of innovation driven primarily by other startups. In fact, it almost seems that a new version of run book automation has evolved. Analysts are referring to the enhancements via glowing adjectives such as intelligent process automation, Run Book Automation version 2.0 and Decision Automation based on (the complexity of) the use case being automated (such as, disaster recovery automation in the latter case).

There appear to be primarily two catalysts for the emergence of RBA 2.0:
1. RBA 1.0 was too simplistic (and limiting). RBA by itself wasn’t meant to introduce any new automation functionality. It was merely designed to string along existing tools and scripts in the right sequence to automate specific low-level and mundane IT processes. Some prominent examples are automation of Help Desk (a.k.a. “Tier 1”) work patterns such as trouble ticket enrichment and basic alert-flow triage and response. Given that the primary user-base for this kind of technology were junior IT operators, the assumption was that they typically wouldn’t know much coding or scripting and would need generic out-of-the-box functions to automate things such as opening and closing tickets in popular ticketing systems, rebooting servers, changing job schedules, and so on. Also frequently these products wouldn’t even expose the source code within these steps for any kind of customization – all to keep complexity at bay. Depending on what you intended to accomplish, you end up buying different “integration packs” that allowed you to interact with specific toolsets that were already deployed within your environment and chain together the requisite steps.

2. RBA 1.0 was too static in nature. When you defined and rolled out a workflow, it made specific assumptions about the platform, version and state of the environment and the underlying toolsets that it connected to. If any of those underlying components changed, then all too often the workflow would cease to function in the manner expected, producing unreliable results and diluting the value of automation. Some of the RBA 1.0 products queried the state of the environment at runtime to avoid this problem, but that resulted in bulky workflow steps that consumed higher CPU, memory and I/O resources on the target environments (the impact was especially evident in the case of frequently run workflows).

Companies that had deployed RBA Version 1 began to run into these limitations as they attempted to move up the IT food chain - to Tier 2 and Tier 3 teams and their increasingly complex activities. Most Tier 2/3 areas, such as database administration, systems administration or application support called for these two deficiencies to be overcome.

Consequently RBA 2.0 showed up with specific enhancements to address the above two areas. (I'm going to lead with the solution for problem #2 since that's a wee more challenging.)
- More intelligent and dynamic workflows – Dynamic workflows evolve in a pre-defined (read, pre-approved) way as the environment undergoes changes. This is accomplished in RBA 2.0 via the introduction of a metadata repository between the automation workflows and the target environment. This metadata layer captures the current state of the target environment (along with historical data for comparison and trend analysis purposes) and injects this information into the workflows just prior to or during runtime (this process is referred to as “metadata injection”).

This centralized metadata repository gets populated via relevant collections – either natively or via integration with a CMDB if one exists in the environment. (Note: Even if a CMDB exists, native collections are still relevant since the metadata required for advanced process automation goes way deeper than the metadata found in a conventional CMDB. For instance, in the case of database automation, the type of collections include not only DBMS platform, version, patch level and configuration settings, but also functional aspects such as sessions logged into the database, wait events experienced, locks held, etc. – all of which may be required to either trigger or drive real-time automation behavior.)
Environmental attributes that cannot be auto-discovered can be specified within the metadata repository via a central policy engine.

Metadata injection is one of the key differentiators within RBA 2.0 to bring about requisite changes in automation behavior. This allows workflows to acquire a degree of dynamism (in a pre-approved way) - without getting bloated with all kinds of ad-hoc/runtime checks or worse, becoming stale (and having to be maintained/updated manually).

- Higher flexibility to accommodate higher complexity – RBA version 2 often exposes the source-code beneath the steps for editing and allows new steps to be added in any scripting language the user may prefer. It is not uncommon to see two disparate Tier 2 teams each with its own scripting language preference (if you don’t believe me, just look towards any Sys Admin team that manages both UNIX and Windows boxes or any DBA team that manages both Oracle and SQL Server… These preferences all too often have a tendency to get religious.) The ability to view code and modify the underlying workflow steps, along with support for disparate scripting languages (within the same workflow) allows admin teams to inspect out-of-the-box functionality and make relevant modifications to the existing workflow templates to fit their more advanced requirements.

Talking of products offering RBA version 2, Stratavia’s Data Palette is leading the charge in this area via its central metadata repository and decision automation capabilities. (The product just picked up another award today for making the top software innovators list at the Interop/Software 2008 event.)

Deploying RBA is a strategic decision for many organizations. Expect to have to live with your choice for quite some time. Before you place your bet on a specific solution, take a broad look at representative IT processes (if required, across multiple tiers/teams) you expect to automate today as well as, in the next 24 months, and ensure you are investing in a platform that comes closest to supporting your organization’s ambitions.


Shekhar said...

This is good info Wish we had this a year ago. We are in the process of implementing Opalis and they are a RBA ver 1 tool. We have more than a dozen workflows deployed on a couple hundred environments. That's only 40% of our production but its already turning into a maintenance nightmare. We may need to evaluate where we are with them before we invest more on this tool. Thanks again for this data.

Ed Lind said...

At my company ,we have spoken to a few run book vendors. There is an understated problem that does not set up RBA version 1 well in more advanced IT admin areas i.e., its premise of reusing vendor supplied steps. This goes against the grain of senior IT administrators who prefer their own scripts and code to anything a 3rd-party can come up with. They like solving problems their own way on the servers they manage. Most programmers and admins will tell you plug and play sounds cool but does not work. Admins have to be able to reengineer vendor code snippets and make it their own. Vendors may downplay this as a political “not invented here” syndrome and brush it off. But it erects an effective barrier against widespread use of RBA version 1 in advanced IT areas. Im encouraged RBA version 2 exposes underlying code and allows you to create your own steps.

Anonymous said...

Hey great blog Venkat. Can you share some sample products for RBA 1 and 2? In addition to Data Palette of course ;)

Venkat Devraj said...

Ed - You make a valid point. Plug-and-play is good in certain cases, but doesn't always live up to its potential. There almost seems to be an inverse relationship between process complexity and plug-and-play potential.

For instance, if I have to write a routine within my app to interface with a printer, I'm going to make an API call to an existing routine (within the library my chosen language offers) that has already figured out how to interface with that printer. All the printer-related complexities are abstracted from me, and that actually makes me more productive. RBA intends to bring the same capability to IT operations process automation.

However it can be a tad more challenging to bring about such abstraction to more complex IT processes. For instance, a relatively basic admin action such as starting up a server can be accomplished in 5 lines of code in some environments, whereas in others, the action may be more complex (or just broader in scope) and require 20 steps and 200 lines of code. Who’s to say what should be the basis for a “generic/reusable" step?

Astute vendors will not view comments like your's as a symptom of the “not invented here” syndrome and dismiss it, but instead view it as a real market need and ensure their products comply.

Venkat Devraj said...

Hello anonymous - Rather than giving out a few product names, I will do one better. In the near future, I will publish a few guidelines to help you figure out whether a particular RBA product has 2.0 attributes or not.

You can then do a quick Google search on "Run Book Automation" and then apply the guidelines to the product(s) you choose to look at.