Friday, December 29, 2006

A Layperson’s Guide to IT Automation Market Speak

My company was a sponsor at the recent Gartner Data Center conference in Las Vegas (http://www.gartner.com/2_events/conferences/lsc25.jsp). Sure enough, the words “IT automation” were all pervasive at the event. There seemed to be so many companies crawling out of the woodwork, all vying for total market dominance. It is energizing to see that automation has truly gone beyond being just a buzzword. However when I spoke to many of the folks who had attended the show, one common frustration seemed to exist across the board - the amount of irrelevant, almost bordering on the nonsensical, marketing chatter in this space.

Imagine you are a senior IT manager (pick your own fancy title) and you are interested in enhancing the level of operational maturity in your environment. (Nice!) Your organization is already following ITIL standards closely and you are keen on seeing what else is out there that might help you. So you attend the next IT/data center optimization show or even, simply do a Google search and…. boom! A ton of offerings with terms ranging from the simple (“lights out monitoring”) to the arcane (“autonomics", “adaptive infrastructure”, "run book automation") hit you between the eyes. (Where oh where, were many of these companies and offerings just a year or two ago?) Almost all of them flush with venture capital and slick marketing, no doubt, but confusing as heck! So you go to your favorite analyst’s website and guess what, it’s a total letdown. There is no single comprehensive coverage of all these offerings and in fact, each analyst seems to have a fetish for a certain buzzword or two. And pitifully, some of the big analysts are still caught up in old world IT agendas, blissfully unaware of the progress in automation; if anything, they are scrambling to come up to speed just as quickly as the rest of us, mere mortals.

After much time and research, I noticed each automation vendor and offering is focused on a horizontal or vertical subset of IT. However from their marketing collateral, it appears as though each one does pretty much everything under the sun – obviously, a lot of redundancy. So what’s an IT manager to do? Who do they believe? Where does one offering end and the other starts? Obviously, it’s not practical to try and evaluate each and every vendor and attempt to do a POC to determine fit…

Here is a guide I built to navigate my way through the different kinds of offerings out there and make the right recommendations for my clients. Hopefully you will agree that it is simple to grasp (and without the usual glib marketing speak!). Based on all the companies I witnessed during the conference, I was able to categorize companies and their offerings into six clean “buckets”.

1. The first category is caveman-style “script-based automation”. Other than being very prolific, it is the dumbest way to attempt to automate anything because (as I have mentioned in prior postings), raw scripts are difficult and time-consuming to write, maintain and deploy, especially across large server farms.

2. The second category is “simple repair automation” – the kind brought about by GUI monitoring and administrative tools such as BMC Patrol, Quest Central, Embarcadero dbArtisan / Performance Center and Oracle Enterprise Manager (OEM). Again as I have mentioned in prior postings, these GUI tools are awesome when it comes to monitoring and performing one-off tasks, but ill suited for carrying out the simplest of repairs in a consistent way across multiple servers. The main problem with them is that they make specific assumptions about the underlying environment which may not be applicable to all environments. Since they have “canned repair logic”, they do not accommodate custom business rules or IT policies very well. Some of these tools do allow scripts to be executed when certain conditions are detected. However this approach runs into most, if not all, of the problems associated with scripts mentioned above. As such, the so-called automation capabilities within these products are of limited practical utility.

3. The third category comprises “automated provisioning and configuration management” tools. While automating provisioning and configuration management is useful, these activities comprise a rather slim portion of the average IT administrator’s workload. There are several other tasks emanating from user requests, environmental changes (unanticipated changes), change control requests (anticipated changes), software releases and alerts that these tools are not meant to deal with. Such limited utility means customers are forced to introduce a bunch more tools in their environment (to automate tasks besides provisioning.) Not ideal.

4. So a fourth category aptly named as “run book automation” (heavily “supported” by Gartner) has come into existence. These tools offer a framework for automating activities driven by custom business logic via a workflow GUI and pre-built integration into commercial monitoring and ITIL packages (especially incident management and ticketing). However the underlying business logic code often has to be written by the IT administrators themselves. Thus while this approach is superior to script-based automation by centralizing script logic onto a workflow GUI, it still suffers from the biggest deficiency of scripts, i.e., it requires an expert administrator to code the business logic into the workflow engine, thereby competing with regular day-to-day tasks for the administrator’s time. If the administrator doesn’t have the bandwidth to properly instrument and deploy these products, they ends up as shelfware (which is more than likely, since administrator bandwidth is the core issue these tools attempt to increase, but instead they consume more than they provide in the short run). Companies that are usually successful in leveraging run book automation are ones that have already attained significant operational maturity even prior to deploying these tools. Companies that are still struggling with daily fires are better off spending time to shore up their internal processes and gradually looking to the other categories here to attain efficiencies.

5. Given the drawbacks of run book automation, another category that is fast emerging as a more viable solution is “domain knowledge automation”. Products in this category offer a run book automation platform with built in expertise for one or more horizontal areas in the IT stack, such as server administration, network administration, application administration and/or database administration. Via this approach, products eliminate much of the problem with script-based automation and more importantly, reduce the need for administrators to code custom logic from scratch. StrataVia’s Data Palette™ is a good example of products in this category. Data Palette represents domain expertise in database automation by providing a library of standard operating procedures (SOPs) pertaining to common DBA tasks. The entire SOP code-base is exposed in an open-source manner such that administrators can tweak the business logic code to fit their environments, if the pre-existing SOPs aren’t suited for the task. Products in this category support several scripting environments to avoid the administrator having to learn a new language. In order to build domain knowledge and awareness, products in this category also need to have the capability to detect external events, along with decision automation prowess to deal with specific changes in event states. This capability is useful for triggering pre-configured SOPs at opportune times without human intervention.

6. The sixth and last category is self-managing software, or what is often referred to as “autonomics”. This is true automation nirvana. Products in this category are aware of themselves, their environments and the interplay thereof. They use this awareness to optimally install, configure, maintain, update and recover themselves. Much of this functionality can be built into products to make them autonomic from the ground up or layered in via an external service to make existing software products autonomic. For instance, Data Palette uses the latter approach to make databases autonomic. The good news is that most of the large vendors like IBM, Microsoft and HP are heading in this direction, even though they may label this functionality differently. (IBM coined the term autonomic computing, HP calls it adaptive infrastructure and Microsoft calls it dynamic systems initiative.) However the bad news (not surprisingly) is that most of the large vendors focus on their own products and their partner ecosystems and worse, take a low-level hardware or operating system level approach to autonomics. In other words, they are mostly about different components of the IT stack collaborating with each other which in my opinion, just doesn’t work – because the level of collaboration required is impractical and doesn’t happen. Even if it does, it doesn’t keep up consistently past a release or two; there are just too many moving parts. Of all the six categories I have listed, autonomics is probably in the most nascent stage.

Hopefully this six-bucket categorization helps the average customer understand what’s actually out there and even if a vendor uses different marketing speak (which is more than likely), they can still read between the lines and place each product in the right bucket. Furthermore, it is my hope that these six buckets will let customers take a step back and see what is most prudent for their organization to evaluate and focus on offerings in that category, allowing them to move faster towards their preferred area of IT optimization.

Just one final word of advise to vendors – don’t even bother appeasing all the big analyst groups out there and rigging your marketing message. As long as your customers know where your offering fits in the above categories, it results in a more mature market for IT automation. Such a market has clean segmentation as its primary attribute allowing different products to fit into appropriate segments, and allowing each to play nicely with others in complimentary categories. In the end, this approach makes it a lot less painful for customers to find what they are looking for, thereby accelerating user evaluation and adoption. Vendors that do not gain this realization and hide their offerings behind a cloud of ambiguity are not only doing themselves a disservice, but also holding the market back by a decade.

Tuesday, November 28, 2006

DBA Script Exodus to Python or Ruby Expected in the Near Term

During the last year and half, I have been observing an increase in the number of DBAs heading in the direction of learning and using “new age” scripting languages. I refer to Ruby and Python specifically. However this migration is not a new phenomenon; the move to such new age scripting was preceded by the migrations to the “axial age” from the “proprietary age” and in turn to the “open age”. And if anything, this migration towards Ruby and Python is not just a passing fad; I expect it to turn into a full-fledged exodus enabling higher scripter productivity and fostering more automation in the DBA world.

During the proprietary age, I’m sure most of us have encountered DBAs specializing in a single (proprietary) DBMS language like PL/SQL or T-SQL. The main problem with that approach was the “only tool being a hammer” situation (where, if the only tool you have is a hammer, every problem begins to look like a nail). DBAs started using their favorite proprietary language everywhere, even to address scripting needs outside the database. For instance, in a Windows/SQL Server environment, I would run into T-SQL routines doing OS-level backups via NetBackup. Granted you could do backups this way, however backups done via T-SQL would need to be driven off a SQL Server environment. This approach was fine for backing up database servers (and that too, only environments comprising SQL Server predominantly), however in the case of other (non-database) servers like application, mail and web servers this was grossly inefficient and somewhat of a misuse of T-SQL’s capabilities.

I saw this leading to the axial age, wherein DBAs began using DBMS independent scripting languages such as Korn Shell and VBScript. These routines served as the “axis” (axes) to connect a variety of operating system, database and application related functionality. This worked pretty well in handling both DBMS and non-DBMS routines and tasks alike, however they were usually confined to a particular operating system platform. For instance, Korn Shell scripts wouldn’t work on Windows and VBScript wouldn’t function on UNIX without a lot of heavy-lifting. (Certain 3rd parties came up with specialized wrappers to allow shell scripts to work on Windows, however that required separate purchase and installation of such layers and they had their fair share of problems in deployment, invocation and runtime exception handling.)

Thus came in the open age, wherein DBAs could go with an “open” scripting language such as Perl, Tcl/Tk and even PHP that would run on a plethora of operating system platforms (Windows and/or anything ending with an “x”) including thin clients. Most of these languages were great for writing one-off scripts however they severely lacked the strengths of a full-blown language, especially the newer ones such as C# or Java (especially real object-oriented capabilities such as inheritance and polymorphism.) Hence while their applicability was high, their reusability remained rather limited. DBAs had to pretty much start from scratch each time to write newer routines. At most, they could take an existing script and hack it until it met their requirements (turning it into a whole different script in the process.) There was no library of reusable code routines that they could reference to put together newer functionality using existing code blocks. That greatly restrained DBA productivity and also resulted in a plethora of one-off scripts – each of which needed to reside on each target server increasing complexity and maintenance overhead.

The new age scripting languages have emerged as a viable solution. Python and Ruby both provide an eclectic and formidable mix of the capabilities of the open age languages, combined with the versatility of the axial languages. They retain all the power of their scripting predecessors, yet put almost the entire power of Java and C# (more, some would argue) in the DBA’s hands. DBAs can truly write code libraries once and reference them (note, reference them, not copy and hack them) many times. A single line of Python or Ruby can do so much resulting in tightly written code yet with clean and elegantly stated constructs, and documented classes and methods, resulting in fairly easy to understand code. For instance, look at the following line of Ruby code:

1.upto(3) { databaseStartup.retry }

One doesn’t need years to scripting experience to understand what it’s attempting to do. One doesn’t have to mess around with complex structures and code just to manipulate a simple string. And best of all, there appears to be an unprecedented growth in communities of followers, around both Python and Ruby. Larger efforts based on these languages such as Ruby on Rails and Django are gaining mainstream support resulting in significant value-add to these languages via more reusable code libraries, classes, methods and documentation. Based on this momentum, neither of these languages appear to be going away anytime soon. So if you are a DBA that’s already embraced one or both of these scripting languages, kudos, you now have more options in your toolbox to create and deploy industrial strength automation, especially coupled with a centralization platform like Data Palette*. However if you are still on the fence or worse, dedicated to playing with isolated Korn shell scripts and T-SQL routines, it may be time to consider giving yourself an upgrade!


(* Incidentally, Data Palette’s SOP automation framework allows automation routines in any scripting language to be written, including T-SQL if that’s your flavor of choice. However it is my experience that the reusability and power of each SOP increases manifold by writing SOPs in Python or Ruby. Have fun scripting in the new age!)

Wednesday, November 22, 2006

Need for orchestration managers in IT automation

I was having a conversation earlier this week with the chief architect of a multinational (Top 3) IT services company regarding our company’s Data Palette automation product. He is no stranger to IT complexity since his company manages the IT departments and business processes of literally hundreds of organizations in the private, public and government sectors. He asked me an interesting question: if Data Palette could be used as an “orchestration platform” in calling different standard operating procedures not just related to database administration per se, but managing other components of the IT stack such as system level products and network level products.

That took me a bit by surprise since all along I have just been focused on database management while talking with him. I told him that Data Palette has traditionally been used to call other DB related product APIs and 3rd party routines within different SOPs (Data Palette’s standard operating procedures) if that 3rd party or DBMS product routine is the best way to accomplish a given task (for instance, a Data Palette SOP could call StatsPack if required for Oracle DBAs that are used to utilizing that functionality for macro-level performance diagnosis, or in the case of a more complex situation, execute an SOP based on receiving an SNMP trap from enterprise monitoring tools like HP Openview or Mercury Sitescope). He mentioned that while this was useful, Data Palette could potentially serve a larger role in the environment wherein its SOP Module could be deployed as a central administration and audit console and provide an abstraction layer around other niche applications.

Come to think of it, there's nothing to really stop an organization from using Data Palette (or any similar product) in such a capacity and I told him as such. In fact, this approach would help companies in two ways:
1. Allow companies that rely on a plethora of tools to simplify their environment by orchestrating tool usage via a central automation platform (Data Palette was the catalyst in my conversation, however any similar automation product could be used in this capacity).

2. Allow teams situated around the world, as well as newer hires to get upto speed faster by just having to learn one product interface, as opposed to having to come upto speed on several tools and point solutions.

Thinking further about it, one of the reasons automation efforts are seen as unrealistic in many complex environments is because of multiple people doing things their own way and using whichever tool they feel is appropriate for their situation. By having such an orchestration platform, it allows everyone (not just DBAs or specific administration staff) be on the same page about what tools are meant for what use and stick to that standard via a single command and control interface. If a team member really has a favorite tool for say, server provisioning, they can even introduce that tool to a larger audience via such orchestration and make that part of the standard. This encourages multiple team members to have a conversation about why a certain approach or tool is better for a task and win over others to their way of operating.

The chief architect’s suggestion is a stellar one that will allow IT environments with higher complexity to embrace automation in a faster and more meaningful manner. These are the kind of innovative thinkers and strong champions our IT industry in general, and automation efforts in particular need at this moment in dealing with complexity caused by silos of people and ongoing changes. It’s kinda like having your cake and eating it too by retaining the tools and technologies that one is most familiar with and weaving them into larger scale automation efforts.

Sunday, November 12, 2006

Autonomic Oracle 10g Installs? Forget about it!

DBAs and users alike eagerly await the benefits of automation and autonomics which will translate into less routine administration and faster completion of work. An autonomic database environment is marked by self-managing software including installs and maintenance, all the way to disaster recovery. But unfortunately, it seems software vendors like Oracle do everything they can to keep this goal elusive and breed an ever-increasing population of mundane and under-appreciated DBAs that need to stay married to their Blackberries and do a lot of nocturnal heavylifting.

Take my own very recent example. Last week, I was requested to install an Oracle 10g Release 2 database on a Linux RHEL 4 64-bit x86 server for a new Web application that needs to go live shortly. The install and database creation needed to be completed prior to Monday morning. Simple enough. I was so confident about this process which I had probably done dozens of times with many different versions of Oracle, I didn’t even start on the work until Saturday night (last night).

I installed the database in silent mode via the Unattended Install SOP within Data Palette (SOP = Standard Operating Procedure, a set of documented task plans and corresponding automation routines and workflows within the Data Palette automation platform). I was happy to see it work like a charm! (Look Ma, no hands...). Then I was off to create the database. Since I didn’t have a canned SOP to do the task (StrataVia is due to release a Database Creation SOP shortly), I figured I would do it manually. I started up sqlplus with the credentials ‘/ as sysdba’. Since I was logged into the box as oracle:dba, I expected to get to the SQL> command-line prompt so I could execute the CREATE DATABASE command. (Being an old-world sort of DBA, I prefer the command-line to the GUI tools like the Database Assistant that take 3 minutes just to start up.)

Immediately, I got an error message: “error while loading shared libraries: libaio.so.1: cannot open shared object file: No such file or directory
ERROR: ORA-12547: TNS:lost contact”.

Thinking that for some reason, the libaio RPMs weren’t available, I ran the “ldd” command to figure out the shared library dependencies:

$ ldd $ORACLE_HOME/bin/oracle

Here’s a subset of the output:
libaio.so.1 => file not found
libdl.so.2 => /lib64/libdl.so.2 (0x0000003d1b700000)
libm.so.6 => /lib64/tls/libm.so.6 (0x0000003d1b900000)
libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003d1bf00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003d1ef00000)
libc.so.6 => /lib64/tls/libc.so.6 (0x0000003d1b400000)
/lib64/ld-linux-x86-64.so.2 (0x000000552aaaa000)

Aa ah! I found the problem (in bold above). The shared library that Oracle was expecting was not to be found. Maybe I figured our Sys Admin had not installed all the prerequisite RPMs beforehand. The Data Palette Unattended Install SOP does check for these RPMs and optionally install them, however I figured somehow one had been missed since our Sys Admin prefers to install RPMs manually. So I manually checked for the existence of the libaio RPM:

$ rpm –q libaio
libaio-0.3.105-2

OK, so it did exist. This was now puzzling. Why wasn’t Oracle using the libaio rpm that was on the system?

One thought was to just disable Async I/O and relink oracle so it didn’t try to utilize the libaio shared library.

(Side note, but one way I have done this in the past for 10g R2 is as follows:
$ su – oracle
$ . oranev
$ [ Ensure that all databases using this ORACLE_HOME and related services are down ]
$ cd $ORACLE_HOME/rdbms/lib
$ make –f ins_rdbms.mk async_off

[ Now update relevant init.ora parameters such as disk_asynch_io=false and filesystemio_options=none ].

Now one can confirm that async I/O is not in use by Oracle (or any other apps on the machine) by grepping the /proc/slabinfo file for “kioctx” and “kiocb” (egrep "kioctxkiocb" /proc/slabinfo) and ensuring the first couple of columns show zeros.

If other apps might need async I/O, you can check if Oracle is still using async I/O by rerunning the ldd command (ldd $ORACLE_HOME/bin/oracle grep libaio) to ensure that the output doesn’t still refer to the libaio file. The output of the nm command (nm $ORACLE_HOME/bin/oracle grep io_getevent) can also be verified to ensure any object file symbols are not referring to LIBAIO.) [END OF SIDE NOTE.]

In this particular situation, I couldn’t refrain from using AIO since that this Web application was expected to be I/O intensive and I wanted to keep both asynch I/O and direct I/O enabled. So the only option was to dig in and see why Oracle was not accepting the existing libaio RPM and see if I could install the right RPM version.

I called up our Sys Admin and explained what I was running into and requested for sudo access so I could deal with this shared library problem. I told him I would only reinstall or relink any libraries pertaining to oracle. As he graced me with sudo access, he reminded me that he would check the sudo logs to ensure I didn't mess with anything else, and also mentioned that he didn’t think the problem had to do with the Red Hat OS, and that all RPMs were backward compatible. He even referred to the Oracle Installation Guide that stated that ensure that the XYZ or higher version of an RPM existed before proceeding with the install.

Fair enough, the Oracle documentation can’t ever be wrong, right? I started Googling the “error while loading shared libraries: libaio.so.1:” error message and found quite a few 3rd party sites talking about having the libaio-0.3.96-3 RPM. However I had a newer version of this RPM. If the Oracle documentation was accurate, I shouldn’t be having this problem.

I then sudo’d in, copied the .src file for the older RPM from the Red Hat site and tried installing it. However it errored out stating a newer version already was installed. Then I retried using the “--replacepkgs” flag. Again to no avail. Finally after some vigorous head-scratching, I uninstalled (erased) the existing newer RPM and installed the older version.

That did the trick! The ldd and rpm -q commands revealed the following:

$ ldd $ORACLE_HOME/bin/oracle
libaio.so.1 => /usr/lib64/libaio.so.1 (0x0000002a96f43000)

$ rpm –q libaio
libaio-0.3.96-3

After this, I could create the database I wanted with the appropriate configuration. But by the time it was all done, it was past 3:30 in the morning. I couldn’t believe I had just spent almost 6 hours on this insignificant issue. Autonomic installs – yeah right! I couldn’t believe Oracle wouldn’t even bother updating its software install archive and documentation regarding this issue so our Sys Admin would have installed the right RPMs in the first place (or update the Data Palette Unattended Install SOP to check for and install the right RPM version).

Based on this experience, I feel even with their latest 10g release, Oracle has done very little in making their software self-managing such that even novices can install and maintain it with ease. During the recent OpenWorld 2006, they were already announcing Oracle 11g and its whopping 482 new features! (http://www.oracle.com/technology/events/oracle-openworld-2006/oow_tuesday.html).

I think I echo the sentiment of many tired DBAs when I say: give us a break guys! We don’t wanna see yet another new whizbang release loaded with cool marketing features (aka grid computing) that businesses aren’t quite ready to use. We don’t care about the newest release of OEM/Grid Control and its RAC monitoring features (which by the way, also have some serious bugs). Instead just give us a release that is stable when it comes to basic functionality (the 20% functionality that's used 80% of the time) and such that a layer of abstraction is provided around mundane administration. Make your documentation current, make the database a little more smarter to spare us the late nights and abuse associated with installing and managing your product so that DBAs can work on things that are more relevant to our, our customers’ and users' businesses and not worry about which version of a shared library Oracle is expecting in order to function smoothly.

For my part I have to state, at the risk of sounding perverse, I was actually somewhat glad to go through this pain. After this was done, I poured myself another cup of coffee and started documenting this problem and solution and sent it off to the Data Palette Engineering team at StrataVia so they could build in this problem scenario and the corresponding solution into their new Database Creation SOP so other users of Data Palette (especially less experienced personnel) do not have to battle the same issues I did last night. But talk about loss of sleep and wasted productivity the next day… Yeesh!!!

Tuesday, November 07, 2006

State of database tools is appalling!

When one thinks of “database tools”, an overly crowded image comprising lots and lots of GUI utilities and point solutions comes to mind. Given all these tools, one would think the database tools market is a mature one such that all that is to be invented in this area is already invented and available. Right? On the contrary, the truth was never further away.

Almost all of the tools in the market today focus on just two things:
1. Monitoring and alerting
2. Administrative GUIs to carry out ad-hoc/one-off tasks.

There are a handful of large companies out there that specialize in this area and they produce all kinds of tools – utilities to point and click and create users, compare schemas, add space, roll out releases, tune SQL, produce wait event graphs and other such performance metrics, etc. But at the end of the day, all of these tools belong to one or both of the above categories in various permutations and combinations. (You name a tool and I will tell you which of the above category it focuses on.) There is no earth-shattering innovation here. Add to this the fact that many DBAs prefer the flexibility of the command-line anyway for the two categories above and many of these tools eventually morph into shelfware grossly under-delivering on the investment made in them.

Don’t get me wrong – these large tools providers do their fair share of R&D (approx. 18 to 28% of their overall revenue, as per last year’s SEC filings of heavyweights BMC, Quest Software and Embarcadero… whoo hoo!). But it looks like most if not all of their R&D is relegated to figuring out better and nicer looking GUIs for monitoring. That’s great! I’m a big fan of usability improvements that result in nice intuitive interfaces, however think of this: if I want to create a user in one database, I might as well use the command-line (unless I’m too lazy to memorize the syntax or look it up in the online SQL Reference guide). If I’m dealing with say, half a dozen databases where I need to create the same user, I would love to use these fancy GUI tools and point and click my way to glory. But what happens if I need to propagate this same new user and privileges to a couple dozen databases, or even, hold your breath - a 100 databases? That would be a lot of pointing and clicking, no matter how easy to use that tool is. Now add to these 100 databases, a variety of mundane/repeatable tasks – such as installs, configs, refreshes and cloning, maintenance plans, patches, upgrades, migrations, managing load processes, DDL releases, and so on. That makes for a very busy DBA team working around the clock and no amount of point solutions and nice GUIs can change that. None of these mainstream tools are even architected for that kind of functionality and task replication.

Now some of these tools vendors are getting smart. They are introducing newer "automation" features. Yaay! But guess what, much of these automation features are built on top of the monitoring functionality they previously had. And they are restricted to certain vanilla automation or allowing a script to be kicked off whenever an alert fires off. In the latter case, the script still has to be written and tested by a DBA. DBAs in complex, heterogeneous environments are already super busy. They don’t have the time to write these complex scripts. The most they may do is look on Google for a particular script and if found, pick it up. But again, they don’t have the bandwidth to test these recently downloaded scripts in all the database environments they support. And oh by the way, the DBA that wrote or downloaded a particular script may be well versed with how it works, but none of the other DBAs in the team are likely to be familiar with it. So now each DBA starts building her own toolbox of scripts. Most of these scripts are not documented in a central location and there is no pre-defined (commonly agreed upon) methodology regarding which script to use in what circumstances. Worse yet, each of these scripts usually need to reside locally on each of the (100) database servers. Even if one line of code needs to change in one script to accommodate a change in the environment, someone needs to manually log into each of the DB servers and make the change. Not a bad approach to follow, if fat-fingering and typos were non-existent!!!

Now you begin to see what many veteran DBAs and IT managers have been seeing for years – the DBA tools industry is in a state of despair without any light at the end of the tunnel. All that’s coming into the tunnel is more monitoring tools – they have become more light-weight, they now carry out extra audit functions, etc. But at the end of the day, all they do is monitoring and ad-hoc administration.

In the meantime, some of the DBMS vendors have been introducing automation capabilities into their core products in recent years. However the bulk of that automation is also vanilla in nature and hence fairly limited in its capability. Let's understand what this "vanilla automation" means. Let's take for instance, Oracle's datafile auto-extend capability. Oracle has had this feature for some time now. However the feature is quite set in its way such that it just makes the datafile grow until it reaches a pre-set limit or until the underlying mount-point/drive gets full. It does not accommodate sys admin policies prevalent in most organizations such as implicit quotas on different mount-point (due to diverse applications and databases sharing the same storage devices as the DBMS) and gracefully growing onto the next appropriate mount-point. In other words, vanilla automation capabilities built into DBMS products and 3rd party tools do not necessarily accommodate custom IT policies and environmental rules. Any kind of custom policies need a script to be built by the DBA and that approach runs into the script-related problems mentioned above. Worse yet, some of these tools require the DBA to learn a proprietary scripting language before they can build any custom automation capability (a notable example is BMC Patrol). Yeah, like they have the time to do that...

Some of the DBMS vendors have also begun to provide DBA tools separate from their core DBMS product. But they are not exactly stellar either! For instance, Oracle has been dutifully releasing its Enterprise Manager product since version 7. It’s “intelligent agent” technology has been difficult to install, was unreliable and would fail often. At times, it would suck up more system resources than the database itself (a la BMC Patrol or an HP Openview DBSPI). I know of an old school DBA that has been burnt by OEM more than once on a production environment. He's sworn never to ever touch OEM with a barge pole no matter how much it evolves! It's just lost all credibility with him and others like him. He feels Oracle should stick to building databases rather than trying to do it all! Anyway, the latest incarnation of OEM, Grid Control is certainly more reliable and powerful than its predecessors. However its power mostly applies just to Oracle and that too, the latest versions of Oracle. For instance, one of the most powerful capabilities of Grid Control is dynamic RAC node provision and configuration for Oracle10g, but that doesn’t work for Oracle8i and Oracle9i. OEM claims to support SQL Server and IBM DB2 as well, but only for monitoring (no automation features are included) and that too for a hefty surcharge! Well, that’s no surprise right? Think about it, why would Oracle make it easy for customers to manage its competitors’ products? A lot of my customers (though they use Oracle databases for some of their applications) think that Oracle, being the crafty company that it always has been, is in the business of selling databases and applications and along the way, cutting off the oxygen for competing products while ostensibly supporting them.

So what’s a customer to do? Continue to participate in the great monitoring rollout from these behemoths and hire human DBAs by the truckloads to do the real work when the pagers go off? Sure, you need high quality DBAs. But merely adding head-count doesn’t scale. Talk to the companies that have 100s of DBAs. They have already tried that approach and are consumed in their own hodgepodge of problems…

There is still hope yet. There are some smaller innovative companies that are rising up to the challenge – namely, companies like Opalis, Appworx, Realops, Opsware (this one isn’t so small any more!) and StrataVia (warning, implicit marketing push, since this is the company I work for) with its Data Palette product. All of these companies are beginning to address the above problems in their particular area of focus – application administration, systems administration, database administration, et al. Take a look at what they have to offer and see if it’s suitable for your organization. Maybe it is, may it is not. But you owe it to yourself and your shareholders to see if you can use their newer offerings to go beyond mere monitoring and ad-hoc administration to actually improve the operating efficiency of your IT environment.

The coming years should be interesting to see which of these innovative companies and products the market adopts and embraces and which ones wither away. But more than anything, it would be intriguing to see how long the 800-pound DBMS tools behemoths continue to survive focusing on improving monitoring and ad-hoc administration alone.

Monday, October 30, 2006

Ignore Pre-Automation Process Optimization at Your Own Peril!

I just came off a Pilot implementation of my company’s Data Palette automation product at a Fortune 100 company. This Pilot started off as optimistic jubilance at being given an opportunity to implement our technology in such a large and reputable organization, but soon that exhilaration changed to moroseness and even, despair. Finally, by the time the Pilot was nearing completion, the prevailing mood changed back to a sense of positive achievement as the effort successfully wound up in spite of certain deficiencies. This roller-coaster of emotions could be attributed to the missed opportunities and changing goals evidenced during the Pilot.

When we started out, the primary goal of the Pilot was to prove that Data Palette can be made to work in the client environment. (Because the environment is very large and complex with several moving parts and heavy human intervention, automation was not considered easily doable.) I helped carry out a detailed discovery there and felt we could take on the challenge. However I did so intending to optimize part of their processes along the way; to minimize the areas where human involvement existed, but was really unnecessary. The client CIO agreed that optimizing their IT processes was a good thing prior to standardizing and automating them. However once the Pilot started and we laid out the standard operating procedures, the client team operating at the ground-level was resistant to any changes in their day-to-day processes. They felt human DBAs needed to be in the drivers’ seat and any decision making needed to be kept manual, even the mundane ones that made them get up in the middle of the night.

That’s when the demons of doubt started plaguing me. Could we still automate the processes the way they were and attain significant productivity gains? Sometimes the manual actions and logistics interspersed throughout the task take up more time than the task itself. If these time-consuming manual interchanges cannot be avoided, any automation of the task itself can seem insignificant. (I' m sure you have often heard IT personnel say "well the actual task only takes about 20 minutes, but the logistics around it cause the task to take up 2 hours, so I'm not sure this task can be automated or whether it would be valuable to have it automated!")

I really wasn’t worried about Data Palette’s ability to accommodate manual decision making and control and the end result proved this out. However the extent of benefit to the client environment was bothering me. In certain areas, I wasn’t content in letting them operate the way they were. One of the intentions (and side benefits) of standardization and automation is that it needs to be preceded by process optimization to bring the right amount of value. I was concerned due to the ground-level staff’s rigidity, value wasn’t being realized to its full potential.

Let me explain my concerns via a simple IT call center operation. When a call or alert comes in, it’s typically handled by a Help Desk or Tier 1 group. They document the problem in a trouble ticket, look at the nature of the problem and try to resolve it if possible. If they can’t, they assign it to the appropriate Tier 2 group and move on to the next call. Depending on the urgency of the problem, an Tier 2 person may be paged and assigned to the issue so it can be worked on. Now what happens if the Tier 1 person handling the issue is unsure about “when” to pass it on to the right Tier 2 group. Let’s say, this person is obsessed with solving that problem and is unwilling to pass it on quickly. She takes hours trying to figure out what is causing it and it becomes a matter of personal pride. That’s very nice of the individual, but this kind of manual decision making hurts the caller and the company. The caller has to wait a lot longer to get a resolution. If there was a business rule that stated that a Tier 1 person could spend no more than 5 minutes trying to identify the problem and after that, they had to pass it on to a different group, that would help matters. That would allow the right (more senior) person to evaluate the problem and implement the solution. And that would free up the Tier 1 person to take additional calls and the operation would run much more smoothly. The process would dictate when and how each call would be escalated rather than placing the onus on the Tier 1 individual to decide how to deal with a call.

Similarly, if the Tier 1 person had to go to a manager’s office and ask for her permission to assign the problem to a Tier 2 group, that would introduce an even higher level of inefficiency. Or during assignment of the ticket, the Tier 1 person had to call the Tier 2 person and if unable to reach him, had to walk up to a this person’s cube (say, it’s in a different part of the office) to try and get that person or wait for him to get off the phone, again the process would be grossly inefficient. These are all areas that involve unnecessary human involvement and delays and pose a barrier to automation. With the right level of process optimization and decision automation, these inept areas can be completely eliminated.

As you peruse the above examples, some of you may feel "This is a no-brainer. Why would anyone tolerate such inefficiencies in their day-to-day IT processes??" Well, pause for a minute now. Take a deep breath and look hard inside your own environment. The inefficiencies may have manifested in a different way. Many managers and ground-level personnel in companies feel “We know what’s best for our business and there’s a reason why we have developed certain processes to accommodate our unique needs. Any automation efforts need to take those into consideration.” It’s easy to be rigid about these things. And some areas really do deserve such rigidity. But many areas really don't. When you have external specialists working with your internal experts, it’s a unique opportunity to be able to re-evaluate the basis for operating in a certain way and adjust them to yield the maximum productivity and faster task accomplishment prior to automating that method of accomplishment. The results are usually in black and white. If the productivity gain is demonstrable and not just a spreadsheet exercise, then the change should be embraced.

In this particular Pilot, we were able to eventually convince the staff that it was in their best interest to accept the process improvements in certain areas. That allowed the feeling of jubilation to return all around. However it may be only until the next Pilot, since this arm wrestling over process optimization is somewhat of a recurring pattern...

Process optimization should be ignored at one’s own peril since it could very well be the best part of the automation effort. Such optimization often means the difference between a hefty 30% plus efficiency gain versus a mere 5~10% improvement.

Tuesday, October 24, 2006

Are IT Automation Initiatives Here to Stay?

A few weeks ago, I was one of the lucky few (well, relatively speaking!) speakers at the Dow Jones DataCenter Ventures 2006 conference in San Jose (http://datacenterventures.dowjones.com/Default.aspx?pageid=111). The event reviewed emerging technologies related to the data center space and provided innovative companies a platform to present their solutions especially around virtualization, automation and methods to block things that are generally considered nasty such as viruses, worms, spam and other such elements causing loss in productivity and revenue.

Ben Horowitz, CEO of Opsware, one of the bigger success stories in the conference, was a keynote speaker and shared how his team was able to get their company on the map, the IT automation map that is. Several venture capitalists, industry analysts, trade journalists and large company technology scouts were present there. But what made the trip worthwhile was the presence of many technology leaders, including CIOs, CTOs and other real-world end-users with assorted titles).

Just about 18 months ago, a similar presentation from me at a CIO round-table resulted in barely 6 attendees showing up (about 18 were expected). It just seemed like CIOs had more pressing stuff to attend to, than being at a boring “ping, power and pipe” event learning about ways to keep the crown jewels of the business accessible and secure. But now, things seem to have take a turn for the better. Optimal data center management is beginning to be seen as sexy and glamorous, almost like business intelligence was, 5 years ago. So much so that popular publications like eWeek are increasingly running articles around this topic (see “Five Biggest Data Center Concerns” at: http://www.eweek.com/article2/0,1895,2034035,00.asp?kc=EWEWKEMLP102306STR2).

So what triggered this behavioral change? Data center management has always been one of the more costly items in IT due to the hosting space, power, and cost of human capital (not necessarily in this order). So why change perceptions now? Mind you, I’m not complaining. This obviously bodes well for the IT industry in general and the database industry in particular - since databases are ubiquitous in any data center. However I’m fairly curious about evolutions in the IT landscape, and understanding their short-term and long-term impact on business. Is the current IT optimization movement a mere swerve or is it here to stay?

My personal opinion (and hope) is that this is for keeps. CIOs are beginning to realize just how much they are truly spending on IT administration. Worse still, they are tying their smartest resources to mundane problems rather than the biggest opportunities. Current popular press and word of mouth is causing them to take a hard look at emerging alternatives rather than treating the toll merely as a cost of running a business. If innovative technology can bring around even a 15% efficiency improvement (half of what many of these vendors promise) to these more expensive administrative areas, it would tremendously impact the bottomline, free up IT budgets and ripple through to positively effect the priority of relevant IT projects, which will be greatly appreciated by the users and shareholders.

Besides my optimism, another indicator that this renewed focus on IT optimization is here to stay is the advancement in several sub-technologies that make it possible. For instance, key areas such as agent architecture, push/pull models, network and server security, autonomic computing, server virtualization and grid management, decision automation and expert systems have matured significantly over the last 10 years allowing vendors to apply these solutions to specific niche areas thereby culminating in the beneficial situation prevailing in the industry today.

So what does all this mean for the business besides just reducing costs? Well, such optimization results in lowered downtime, higher scalability and more predictable performance. But more than anything, it affords businesses the opportunity to align their smartest people towards solving their business’s biggest challenges and leveraging the best opportunities rather than merely being the best command-line junkie. And that, is the biggest win-win for the business and all stakeholders, including the employee.

Saturday, October 14, 2006

Baby steps, automation and OMM

I often hear this from my client IT managers, even battle-hardened ones - “We think automation and OMM are good things for us, but we don’t know how to really get started. We manage so many databases and applications here that it’s tough to standardize anything. We have no say in the matter. Our DBAs are busy working their tails off; it’s unrealistic to tell them to drop what they are doing and work on automating stuff. It’s just not going to happen!”

I know there are many environments out there that are probably struggling with the same issues. The desire is there to get to higher levels of operational maturity using board-based processes such as ITIL and OMM, but the challenges are steep. (BTW, if you are unfamiliar with OMM or the Operations Maturity Model, my company has a white-paper on their website on the topic: http://www.stratavia.com/downloads/9_adaptiveimplementation.pdf.)

For many companies, there are a myriad of 3rd party ERP, CRM and custom home-grown applications. There is often more than one mainstream database platform in use - Oracle, SQL Server, and/or DB2. The developers are playing with open-source DBMS platforms and there’s a good chance your DBAs will need to start supporting them in the near future, if they aren’t already! There are hundreds, possibly thousands of tickets being worked on by the DBA team every month. Pretty soon, there will be new projects that will need to go live. You will need to hire another DBA or more to support those projects. Hopefully your CIO will give the green light to add that head-count. And then, there’s talk that your company might acquire that pain-in-the-derriere competitor of yours… Your top DBAs carry a ton of tribal knowledge inside their heads and they don’t always have the time to document things or train others. They are all running at 100 miles an hour. In spite of this, the business users don’t seem exactly pleased; heck - truth be told, they aren’t even remotely sympathetic. Your team is already working so hard, what else can they do? In such “saturated manpower” situations, is any improvement possible? Isn’t standardization merely a nice-to-have, a luxury that one just doesn’t have time for?? Is automation a realistic goal to chase, given the team’s lack of bandwidth?

My question to those managers usually is, is this situation sustainable? How long can you go on like this? And how do you not get smoked by your competitors that have already figured this out? If IT is not providing the competitive advantage your business needs, the writing is on the wall. Regardless of the fact that your environment is bursting at the seams, you need to act now ‘cause there’s never a good time to do this. You have to take some tangible steps to inject a sense of order in your environment and get things under control.

So how do you do go about doing that? This is where baby steps come in.

The first step one can take is to simplify the environment. Don’t even think about relatively lofty initiatives such as automation yet. Focus whole-heartedly on identifying where complexity abounds in your environment and attempt to strike it down. For instance, if your DBA group uses a variety of tools, put together a “tools/needs analysis” spreadsheet that lists each tool, who uses it and why. (Send me an email if you would like me to share some of the spreadsheet templates I use in this regard.) Look for tools that are not actively in use, and those that are redundant. Similarly, make an inventory of what scripts are in use. Look for inefficiency indicators – such as, different DBAs using different scripts to accomplish the same thing.

Here I’m assuming that your environment already has decent processes for change control, release management and configuration management. If not, implement simple processes that help establish a level of stability and predictability in your environment without injecting additional layers of complexity. If your existing processes are too complex, try to get the relevant stakeholders to sufficiently dumb down the processes. To determine if a process is “simple” or not, try describing it on a single sheet of 8½ X 11 paper. If you can’t, chances are, the process is too convoluted.

Once you feel your environment is as simple as can be, then it’s time to embark on standardization. Compile a list of the top 5 most common (repetitive) tasks per platform. Make note of how different DBAs in your team carry out these tasks. There are different ways to accomplish this. The simplest way would be for the DBA to jot down notes on each step she is taking next time she is asked to perform that task. Now remember, it is important for her to do this WHILE she is performing the task rather than jotting down the steps from memory. The latter scenario leaves room for errors and omitting steps. If the DBAs are too busy or plain unwilling to do this and you are unwilling to MAKE THEM do it, you may need to watch over their shoulders and take notes while they do it. (In most cases, merely threatening them with watching over their shoulders will make them do it just to get you off their backs!)

Also it’s been my experience that if you assuage them and keep them in the loop about what you are trying to achieve, they will support you because after all, their quality of life goes up as well due to such initiatives.

Your goal should be to compile a spreadsheet listing each DBA and his/her task recipe for each of the 5 tasks. Look for areas of commonality or divergence. If multiple DBAs are carrying out the same task in different ways, determine the best method to carry out that task based on the success criteria that’s most relevant to you and your organization – indicators such as fewer errors (“first time right”), faster completion time (“on time delivery”), etc. Once the most efficient methods per task are nailed down via a formal SOP document (email me if you would like me to share a good SOP documentation template), coach all your DBAs to use that method each time they perform that specific task. Regardless of which time shift they work, or where they are physically located, they need to follow that SOP.

Once you have the best recipe for each of the Top 5 tasks selected, everyone in the DBA team has signed off on it and is actively using it, it’s the right time to be automating it. This can be achieved using an automation platform such as Data Palette.

BTW, this Simplification -> Standardization -> Automation cycle is not a one-time thing. It needs to be an iterative process. Once you have successfully done it for the top 5 tasks, look at the next 5 tasks. Continue doing this until most mundane tasks in your environment are running on auto-pilot, or to put it more succinctly, you have reached OMM Level 3 or higher.

One more interesting fact, you don’t have to automate a large percentage of your mundane tasks to gain value. Even just the top 5 or 10 tasks will give you great returns for your efforts!

Friday, October 06, 2006

Automation Attempts without Executive Sponsorship = Failure

I’m part of a team that works with numerous companies attempting to inject a higher degree of automation in their IT operations, specifically within database administration (DBA). We are more successful in making this happen in some companies than others. Even companies in the same industry sometimes have dramatically varied results. I have often wondered why this is the case. After all, DBA work is DBA work and there is a huge amount of commonality – no matter what individuals working within that company may think per se.

When I put my finger on the pulse of this issue, it boils down to one core element: executive management involvement or lack of it. While we typically go into a company from the top down (at the C-level or VP-level), the Executive often hands us over to middle-management who in some cases, then hands us over to the DBA team. We are very successful in implementing automation within companies where the Executive keeps track of where we are traversing within his/her organization and the reactions we are getting from his/her team, even after the hand-off. In case, the DBAs or their managers are not that automation-friendly and push back saying their environment is too complex to automate, a red flag goes off in the Executive’s head. He/she probes further - Why is this the case? Have we backed ourselves into a corner due to the variety of customizations we have done over the years? What can be done to simplify the environment? Can we still automate some tasks that are the most painful/time-consuming? Without this thought process occuring and without the Executive being engaged, it is difficult to attain positive results.

Does this mean that middle managers and DBAs are not to be trusted with evaluating automation options? Not necessarily… Middle managers and DBAs that are typically tasked with evaluating and validating automation-enabling products are already busy with their day-to-day work. Any free time is taken up by meetings. This makes it hard for even the DBA with the best of intentions to invest time in the process thereby causing it to die on the vine or alternatively, being pushed out until no one remembers any longer what this effort was all about. In spite of automation having the capability to significantly enhance their environment and help them reduce their task burden, the effort never takes off.

There are seven simple things Executives can do to ensure this doesn’t happen:
1. Be up-to-date on virtualization and automation technologies and some of the activity from especially startups in this space. (Interestingly, almost all innovation comes from startups; I feel industry giants are incapable of innovating, I will elabore more on that in a future blog…).

2. Talk to your peers to find out what they are doing to reduce their pain of managing large number of databases. Have they pursued any innovative approaches?

3. Talk to your favorite analysts about which companies they are seeing as emerging stars in this space. (For instance, an analyst I follow closely is Noel Yuhanna from Forester Research. He provides a rather unique perspective on this space. For example, see http://www.forrester.com/findresearch/results?Ntt=Noel+Yuhanna&Ntk=MainSearch&Ntx=mode+matchallany&N=0).

4. Don’t hesitate to get into the weeds. Meet with your line-level IT managers and DBAs at least once a quarter. Challenge them to think out of the box. Ask them to come up with solutions besides just throwing more bodies at the problem. The latter approach doesn’t scale in the long run. Make them feel comfortable that their jobs are not at stake. Show them by example that their value in the organization will only increase by them investing time in strategic initiatives such as standardization and automation.

5. Help your executive admins be more aware of technology areas that are interesting to you (such as data center management technologies, IT automation tools, etc.) That way, when vendors in this space attempt to reach you, your gatekeepers can properly vet them and see if their offerings fall in any of the “interesting” categories and if so, bring them to your attention.

6. After the initial due diligence with these vendors, if any of them are brought in for a more detailed evaluation or a proof of concept, keep your ear to the ground on how their efforts are progressing. Have periodic status meetings with your internal people as well as the vendor representatives to understand each party’s perspectives and coach both sides to achieve success.

7. Work with your internal team and help them re-prioritize some of their activities so they have appropriate time and energy to work on these high-value areas with the vendor. Otherwise it becomes an exercise in futility if the attention span of the internal staff is very limited during the evaluation.

These steps seem overly dewy-eyed, but yet it is painful to see so many Executives ignore these and toss any automation related messaging to their subordinates expecting them to magically have the time to fit yet another thing in their already packed schedules. It’s a straight equation:

New Technology Initiatives without Adequate Exec Sponsorship = Failure!

With the above approach, Executives will be able to more effectively speerhead new technology initiatives and leverage the ones that really add value to their organizations in the shortest amount of time.

Wednesday, October 04, 2006

Don’t Let Security be a Deterrent to DBA Outsourcing

If I got a dollar for each time an IT Manager in a medium-sized company says “we like the offering, but our security policies prevent us from outsourcing our database support…”, I would have hundreds of dollar bills stashed in my pockets! Most recently, I heard this from an experienced DBA Manager for a company in the entertainment industry. That made me wonder – many of the goliaths in the financial services industry, including large credit card processors, insurance providers and banks (with the notable exception of JP Morgan Chase that actually backsourced their IT services from IBM; see http://www.cio.com/archive/090105/whiplash.html?action=print) rely every day on outsourced IT services, so why is this a problem for a company in the entertainment business?

Maybe since the other organizations are much larger, they have the financial and professional clout to invest in the outsourcing relationship (or make the vendor pay for dedicated redundant leased lines, specialized staff, etc.) to get their security concerns addressed and it’s the small and medium sized businesses (SMB) that encounter this problem.

But regardless of company size, I just don’t believe that security should be a deterrent to outsourcing. If good security policies are followed by a company and enforced by the outsourcing partner, the company can avail all the benefits of outsourcing without fear of compromising security. The DBA outsourcing industry has matured to offer numerous advantages and some of the better known companies in this space seem committed to addressing their customers’ diverse security requirements.

In my experience, three primary areas come into play to ensure the outsourcing vendor is not going to jeopardize your environment:
· Confidentiality agreements.
· Secure environment for the outsourcer to log in and work, regardless of where they are working from (in-house, from their houses or from their remote office). Make key security requirements part of the overall service level agreement.
· A tamper-proof multi-level audit trail.

A good vendor will walk you through these in detail and usually, will bring these tools and templates with them.

In the case of the confidentiality agreements, look at the confidentiality clauses in your standard employment agreement that an in-house DBA (employee) would sign, and ensure that the vendor agreement is a super-set of that agreement. (After all, every company out there trusts certain employees sufficiently to let them access their mission-critical systems and data.)

In terms of providing a secure environment, larger companies with the financial resources may request for dedicated leased lines prior to commencing work. However most security issues can be kept at bay by following good security practices internally, auditing the vendor to ensure they have similar or better policies and having a good VPN solution with a key-fob for any remote access that may be necessary. When it comes to remote access, regardless of your outsourcing philosophy, you need good policies to allow even your internal personnel to work from their houses, especially during off-hours. Ask your vendor for their security policy manual. If required, hire a third-party security consultant (or use your internal sys/network admin, if you have one and if he/she is well-versed with infrastructural security requirements) to look for any gaps in their policies. (The following site has links to some real useful security-related publications pertaining to industry standards: http://www.csrc.nist.gov/publications/nistpubs.) Ensure the vendor agrees to address any significant gaps prior to commencing work. Once that is done, have an audit done at the vendor’s site to ensure they indeed implement everything they mention in their policy manual and all gaps have been dealt with.

If required, you could even segregate DBA work such that vendor personnel cannot access any raw data in the database. Any work that requires access to data can be routed through internal personnel/managers. In the case of certain databases (like Oracle), it is also possible to control this at a granular level by granting privileges to carry out physical DBA tasks (sysoper) without access to everything (sysdba) or specifically, user data within the database.

Lastly, a good audit trail needs to be maintained both within and outside the database (at the operating system and network level) so events such as login times, userids, etc. can be checked and correlated when required. Ideally, rather than just waiting for violations to show up and then investigating the root cause, it is advisable to set up events within the audit software to look for violation patterns and take appropriate automated actions including sending out alerts. For database-level auditing, there are multiple tools that accomplish various things – some of these do not even require native database auditing to be turned on. Depending on the tool, they sniff SQL statements over the network, latch on the shared memory and/or periodically capture database activity and alert on problem signatures. All of these are effective methods to keep an eye on your environment and ensure the vendor is complying with your policies.

In addition to these, if your business requires it, you can also make other requests of your outsourcing partner such as not to use offshore resources to work in your environment and so on. Experienced vendors would already be familiar with such requests and may have special packages to accommodate them (often at a premium; but that premium may be well worth it if the overall costs are still significant lower and at better quality than doing it yourself and to be able to sleep at night).

Friday, September 29, 2006

Database support license = unfair trade practice… ??

The other day, I used the Oracle transportable tablespace mechanism at a customer site to rapidly copy a schema from one database to another. The customer DBA working with me stared in awe and said “This is slick! I have never used this new feature before...” Now it was my turn to be awe-struck. I gently reminded him that Oracle had released the transportable tablespace feature in Oracle8i back in 1999! And this was 2006 and we were working with Oracle10g… the transportable tablespace mechanism was not a new feature anymore – unless one has been hanging with Rip Van Winkle for the last 7 years.

That little incident made me wonder... Many of our customers rely on multiple database platforms, tools and applications. Some of these applications use older database versions. Often even the applications that use the newest database version pretend the database is an older version. In other words, the apps rarely use any newer feature/functionality from the latest releases. In such situations, I always wonder what advantage did the customer gain by upgrading their database version?

When I ask the customer this question, they say “Oh, we had no choice, the vendor has desupported the earlier version.” I naively ask “Were you using their support pretty regularly then and that’s what prompted you to upgrade since you were at risk of not being supported any more?”. Pat comes the answer from their DBA – “We were already on the terminal release for that version; it was stable in our environment, so we weren’t really calling their support line at all. But then, we were anyway paying them support fees, the upgrade didn’t cost us anything, so we went ahead and upgraded.”

An upgrade didn’t cost anything? Yeah right! When asked how long and how many resources it took them to upgrade, the answer is usually in several man-months to provision new machines, perform installation, configuration, testing of the newer database release and all the related apps to ensure those apps can function in the new release without any kind of degradation. So now I’m even more confounded. If you weren’t using the vendor’s support services, why would you care if you were de-supported or not? Then again comes the answer “oh, it’s our corporate policy that we can’t have un-supported software”. Policy made by who? Does this so-called policy-maker understand the ramifications of upgrading the database without a clear business need? Does this person really understand the ROI on the support fees your company has been paying the software vendor?

It seems somewhat of a catch-22. If you don’t upgrade your already stable release, then you are at risk of running on a de-supported release. So you spend lots of time and resources upgrading and then move to a newer release that may not be as stable as the prior release you were on. And on top of that, you barely use any (read zero!) of the newer features available in the new release. So besides having the peace of mind that you are running on a “supported” platform, what else do you gain? Nothing but the problems that are introduced with the newer release…

Oracle, Microsoft and IBM have all released newer versions of their databases in the last couple of years. They charge an average of 20% (an arm and a leg!) for support. I say, given that very few newer functionality is being utilized, customers should just stop paying for support. If there are bugs in the product, the vendor should provide a patch regardless of whether the customer has purchased support or not. After all, didn’t the customer pay for the software expecting it to work? If it doesn’t work, whose fault is it? Why does the vendor provide a solution to that bug only if the customer has paid an extra 20% for this so-called support?

Imagine what would happen if the average retailer operated like the average software company… You pick up a book at Borders and find that it’s missing a couple of pages. You take it back asking for another copy and they ask you if you have purchased a support license…

If this scenario is not acceptable in retail, why is it the norm in the software industry? Is it because in the former case, people are dealing with their “own money” whereas in the latter, it’s just their “company’s resources”??

I say, it’s time for companies to rally together and file a class action suit against the software monoliths asking for support fees to be refunded back to them. If the software is buggy, then the software vendor needs to get it fixed and supply a patch pronto regardless of whether the customer has a support license or not! After all, they have paid for that software expecting it to be functional. No sane person pays for anything expecting it to be buggy.

As far as support fees funding “future innovation” goes, newer releases/upgrades could be chargeable separately and customers at their discretion can look at the new release and see if it has value for them and if their apps would run on it, etc. and accordingly make the decision to upgrade (or not!). They shouldn’t have to upgrade because the older version is going to be “de-supported”. If the vendor is going to de-support an older release, the vendor should then provide the newer release at no additional charge to the customer that doesn’t have a business need for upgrading otherwise.

I don’t know if customers would be willing to go through the pain of litigation. But maybe they can just stop paying for support. Unless they put their foot down, the software vendors have no motivation to change and it’s business as usual all around where the customer shells out gobs of money for “support” that they sparingly use. The returns are just not there for the customer.

Software vendors should charge for the software or charge for the support; but charging for both given the above circumstance resorts to an unfair trade practice in my book. What am I missing?

Thursday, September 28, 2006

Commoditization of Database Administration - Will DBAs and IT Managers "Get it"?

Over the last 5 years, database administration is slowly but steadily turning into a commodity exposing numerous niches that need creative input and hard work by the DBAs whose bandwidth has suddenly increased. There are three primary market forces working together to make this happen:
(i) Advances in DBMS technology from IBM, Microsoft, Oracle and open-source databases (in that order) ;
(ii) DBA outsourcing and offshoring;
(iii) Automation and virtualization technologies that create a layer of abstraction over DBA tasks ranging from simple ones like database server provisioning to relatively complex ones such as error-free nightly load process management.

The first stems from the effect of databases themselves turning into commodities. For instance, DBMS software is being given away for free not only from open source DB support hawkers such as MySQL but also the big 3 (Oracle, IBM and M$). IBM announced a free version of DB2 back in Jan '06 (http://www.varbusiness.com/article/showArticle.jhtml?articleId=177105213) . Likewise, Oracle has been offering its free Express Edition for a little while now (http://www.crn.com/showArticle.jhtml?articleID=172901482).

As databases turn into commodities, what is the impact on database administration? Rather than being thought of as something intrinsic to the company, it is being rightfully seen as a regular IT function - a necessary evil that is not necessarily a core competency of the business. Database administration has increasingly been separated from data management with the need for specialists in each area being felt by the business. This separation has resulted in the DBA not having to deal directly with the data within the database. DBAs used to be viewed as someone with the keys to the kingdom - in terms of their promixity and access to business-critical data. But with newer audit capabilities and security models offered by Oracle (sysdba versus sysoper, etc.) and other vendors, it is easy to separate the Systems DBA role from data-centric responsibilities. This commoditization and separation has further made it conceivable for companies to entertain going after boutique/best of breed outsourcers and hand over support for the traditional Systems DBA role in a cost effective and secure manner driven by metrics and service level agreements.

Many of these DBA outsourcing specialists have managed to add value via strong process models such as ITIL and monitoring/automation tools and integrated themselves within the companies they provide services to - further adding credibility to the fact that database administration is something that can be successfully separated and outsourced by astute IT managers.

The state of automation and virtualization frameworks and supporting sub-technologies have matured over the years - making automation of many mundane yet labor-intensive areas a reality. DBMS and tools vendors have largely shed their inhibitions about supporting solely their proprietary platform and have taken on support for competiting platforms - starting with basic monitoring support. For instance, Oracle has recently added support for SQL Server and DB2 databases within their OEM/Grid Control monitoring tool. This was plain inconceivable just a few years ago! But as the industry evolves, mainstream DBMS vendors have no choice but to lean more and more towards offering better editions of their databases for free and include support for competing databases within their own tools.

While these events bode well for the average customer due to lowered product costs and higher administrative efficiency and quality, how does this impact the individuals themselves - the DBAs who have been depending on the various DBMSs and assorted tools for their livelihood? Do they have reason to start looking at alternative careers as mail-carriers or petting zoo attendants?

My opinion is, the commoditization of database administration is a golden opportunity for DBAs to finally separate the mundane administrative areas, outsource and/or automate them and elevate themselves up the food chain to focus on areas that they previously never had the time to work on. The more user-visible stuff that if worked on, will yield measurable results in terms of reduced # of SLA violations, performance dips and outages. Areas such as quantitative performance management, service level management, data management, application/database interplay management, database/infrastructure optimization, new DBMS feature evaluation and other proactive management areas. In other words, the optimal career movement for the successful DBA needs to be "management" related (not necessarily people management, but managing performance and related areas indicated above.)

It will be interesting to see in the coming years how many DBAs actually "get" this, embrace the change and position themselves at the forefront of this once-in-two-decades kind of revolution versus how many continue to sit in their temporary trenches with arcane tools and scripts, fight progress and eventually miss the boat.

Tuesday, September 26, 2006

DBA Job Growth Statistics are Encouraging; the Insecure DBA's Attitude Towards Automation is Not!

I saw a report from InfoWorld today that showed database administration being in the top 5 fastest growing IT positions. I have seen similar reports in the recent past ranging from the US Dept of Labor to independent/private reports. They all peg database administration to be growing anywhere from 33% to 66% - by the year 2014. In spite of this growth, the biggest concern is lack of quality DBAs (database administrators). Sure there are lots of bodies around the planet, but there just aren't enough qualified professionals to meet the growing requirements. People that not only can handle part of the job load, but those that can also communicate well with their users and peers. People that not only know the mechanics behind a task, but also what needs to be done and when to a database to meet user requirements and business growth.

If these job growth statistics aren't a clarion call for leveraging automation, I don't know what is!

Yet I often encounter DBAs that shrivel up at the mere sound of the word "automation" or go on the defensive. (Let's call these folks the "insecure DBAs".) They claim database administration cannot be automated in THEIR environment due to the complexity of the tasks they do and/or due to the constantly changing underlying environment. News flash: database administration isn't rocket science... and btw, even rocket science leverages automation in more ways than one can imagine. Any task that comprises a specific sequence of steps (and last I checked, most if not all tasks in database administration can be boiled down to such sequences) can be automated.

Some DBAs agree that yes, some of their tasks can be automated, however they will end up forgetting how to carry out those tasks manually. And when (note their use of "when" and not "if") the automation system fails, then the business would be in a conundrum since they would have forgotten how to do the task manually. Boo hoo... Wake up guys and smell the coffee. IT automation is happening in a big way. The companies that don't embrace it will be relegated to the dark ages. They just won't be able to compete effectively. Job loss is a natural effect of companies going under... regardless of industry job growth statistics.

Instead of running around in their firefighters' personas and conjuring up excuses as to why automation is not an option in "their" environment, it's time DBAs started working together to build standard operating procedures and leveraging those procedures as blue prints for automation. Hiding underneath a stack of manual tasks and looking busy used to guarantee employment. Now in the evolving technology landscape, it's a sureshot way to keep oneself and one's company stuck in the annals of old world IT (aka "luddite-dom"), replete with a plethora of one-off tools, commands and actions.

It's time to revel in the inherent job security provided by the growth in demand for quality DBAs. Using standardization and automation to handle those lower level tasks, it's time to polish up those communication skills, move up the food chain and work on the exciting big projects that truly give competitive advantage to your company and propel it into the 21st century! That more than anything, will assure the insecure DBA of continued employment.