Wednesday, January 28, 2009

Implementing a Simple Internal Database or Application Cloud - Part I

A “simple cloud”? That comes across as an oxymoron of sorts since there’s nothing seemingly simple about cloud computing architectures. And further, what do DBAs and app admins have to do with the cloud, you ask? Well, cloud computing offers some exciting new opportunities for both Operations DBAs and Application DBAs – models that are relatively easy to implement, and bring immense value to IT end-users and customers.

The typical large data center environment has already embraced a variety of virtualization technologies at the server and storage levels. Add-on technologies offering automation and abstraction via service oriented architecture (SOA) are now allowing them to extend these capabilities up the stack – towards private database and application sub-clouds. These developments seem more pronounced in the banking, financial services and managed services sectors. However while working on Data Palette automation projects at Stratavia, every once-in-a-while I do come across IT leaders, architects and operations engineering DBAs in other industries as well, that are beginning to envision how specific facets of private cloud architectures can enable them to service their users and customers more effectively (while also compensating for the workload for some of their colleagues that have exited their companies due to the ongoing economic turmoil). I wanted to specifically share here some of the progress in database and application administration with regard to cloud computing.

So, for those database and application admins that haven’t had a lot of exposure to cloud computing (which BTW, is a common situation since most IT admins and operations DBAs are dealing with boatloads of “real-world hands-on work” rather than participating in the next evolution of database deployments), let’s take a moment to understand what it is and its relative benefits. An “application” in this context, refers to any enterprise-level app - both 3rd party (say, SAP or Oracle eBusiness Suite) as well as home-grown N-Tier apps that have a fairly large footprint. Those are the kind of applications that get maximum benefit from the cloud. Hence I use the word “data center asset” or simply “asset” to refer to any type of database or application. However at times, I do resort to specific database terminology and examples, which can be extrapolated to other application and middleware types as well.

Essentially a cloud architecture refers to a collection of data center assets (say, database instances, or just schemas to allow more granularity) that are dynamically provisioned and managed throughout their lifecycle – based on pre-defined service levels. This lifecycle covers multiple areas starting with deployment planning (e.g., capacity, configuration standards, etc.), provisioning (installation, configuration, patching and upgrades) and maintenance (space management, logfile management, etc.) extending all the way to incident and problem management (fire-fighting, responding to brown-outs and black-outs), and service request management (e.g., data refreshes, app cloning, SQL/DDL release management, and so on). All of these facets are managed centrally such that the entire asset pool can be viewed and controlled as one large asset (effectively virtualizing that asset type into a “cloud”).

Here’s a picture representing a fully baked database cloud implementation (if the picture is blurry, click on it to open up a clearer version):

As I had mentioned in a prior blog entry, there are multiple components that have come together to enable a cloud architecture. But more on that later. Let’s look at database/application specific attributes of a cloud (you could read it as a list of requirements for a database cloud).

  • Self-service capabilities: Database instances or schemas need to be capable of rapidly being provisioned based on user specifications by administrators, or by the users themselves (in selective situations – areas where the administrators feel comfortable giving control to the users directly). This provisioning can be done on existing or new servers (the term “OS images” is more appropriate given that most of the “servers” would be virtual machines rather than real bare metal) with appropriate configuration, security and compliance levels. Schema changes or SQL/DDL releases can be rolled out in a scheduled manner, or on-demand. The bulk of these releases, along with other service requests (such as refreshes, cloning, etc.) should be capable of being carried out by project teams directly– with the right credentials (think, role-based access control).
  • Real-time infrastructure: I'm borrowing a term from Gartner (specifically, distinguished analyst Donna Scott's vocabulary) to describe this requirement. Basically, the assets need to be maintained in real-time per specific deployment policies (such as development environment versus QA or Stage), tablespaces and datafiles created per specific naming / size conventions and filesystem/mount-point affinity (accommodating specific SAN or NAS devices, different LUN properties and RAID levels for reporting/batch databases versus OLTP environments), data backed up at the requisite frequency per the right backup plan (full, incremental, etc.), resource usage metered, failover/DR occurring as needed, and finally, archived and de-provisioned based on either a specific time-frame (specified at the time of provisioning) or on-demand -- after the user or administrator indicates that the environment is no longer required (or after a specific period of inactivity). All of this needs to be subject to administrative/manual oversight and controls (think, dashboards and reports, as well as ability to interact with or override automated workflow behavior).
  • Asset type abstraction and reuse: One should be able to mix-and-match these asset types. For instance, one can rollout an Oracle-only database farm or a SQL Server-only estate. Alternatively, one can also span multiple database and application platforms allowing the enterprise to better leverage their existing (heterogeneous) assets. Thus, the average resource consumer (i.e., the cloud customer) shouldn’t have to be concerned about what asset types or sub-types are included therein – unless they want to override default decision mechanisms. The intra-cloud standard operating procedures take those physical nuances into account, thereby effectively virtualizing the asset type.
The benefit of a database cloud includes empowering users to carry out diverse activities in self-service mode in a secure, role-based manner, which in turn, enhances service levels. Activities such as having a database provisioned or a test environment refreshed can often take multiple hours and days. Those can be reduced to a fraction of their normal time – reducing latency especially in situations where there needs to be hand-offs and task turn-over across multiple IT teams. In addition, the resource-metering and self-managing capabilities of the cloud allow better resource utilization and avoids resource waste, improving performance levels, and reducing outages and removing other sources of unpredictability from the equation.

A cloud, while viewed as bleeding edge by some organizations is being viewed by larger organizations as being critical – especially in the current economic situation. Rather than treating each individual database or application instance as a distinct asset and managing it per its individual requirements, a cloud model allows virtual asset consolidation, thereby allowing many assets to be treated as one and promoting unprecedented economies of scale in resource administration. So as companies continue to scale out data and assets, but cannot afford to correspondingly scale up administrative personnel , the cloud helps them achieve non-linear growth.

Hopefully the attributes and benefits of a database or application cloud (and the tremendous underlying business case) become apparent here. My next blog entry (or two) will focus on the requisite components and the underlying implementation methods to make this model a reality.

Friday, January 09, 2009

Protecting Your IT Operations from Failing IT Services Firms

The recent news about India-based IT outsourcing major Satyam and its top management’s admissions of accounting fraud bring forth shocking and desperate memories of an earlier time – when multiple US conglomerates such as Enron, Arthur Andersen, Tyco, etc. fell under similar circumstances, bringing down with them the careers and aspirations of thousands of employees, customers and investors. Ironically Satyam (the name means “truth” in the mother language, Sanskrit), whose management have been duping investors for several years now (by their own admission) had received the Recognition of Commitment award from the US-based Institute of Internal Auditors in 2006, and was featured in Thomas Friedman’s best-seller “The World is Flat”. Indeed, how the mighty have fallen…

As one empathizes with those affected, the key question that comes to mind is, how do we prevent another Satyam? However that line of questioning seems rather idealistic. The key question should probably be, how can IT outsourcing customers protect themselves from these kinds of fallouts? Given how flat the world is, an outsourcing vendor’s (especially one as ubiquitous as Satyam in this market) fall from grace has reverberations throughout the global IT economy - directly in the form of failed projects, and indirectly in the form of lost credibility for customer CIOs who rely on these outsourcing partners for their critical day-to-day functioning.

Having said that, here are some key precautionary measures (in an evolving order) companies can take to protect themselves and their IT operations beyond standard sane efforts such as using multiple IT partners, use of structured processes and centralized documentation/knowledge-bases.
· Move from time & material (T&M) arrangements to fixed-priced contracts
· Move from static knowledge-bases to automated standard operating procedures (SOPs)
· Own the IP associated with process automation

Let’s look at how each of these afford higher protection in situations such as the above:
· Moving from T&M arrangements to fixed price contracts - T&M contracts rarely provide incentive to the IT outsourcing vendor to bring in efficiencies and innovation. The more hours that are billed, the more revenue they make – so,where’s the motivation to reduce the manual labor? On the other hand, T&M labor makes customers vulnerable to loss of institutional knowledge and gives them little to no leverage when negotiating rates or contract renewals because switching out a vendor (especially one that holds much of the “tribal knowledge”) is easier said than done.

With fixed price contracts, the onus on ensuring quality and timely delivery is on the IT services vendor (to do so profitably requires use of as little labor as possible) and subsequently, one finds more structure (such as better documentation and process definition) and higher use of innovation and automation. All of this works in the favor of the customer and in the case of a contractor or the vendor no longer being available, makes it easier for a replacement to hit the ground running.

· Moving from static knowledge-bases to automated SOPs – It is no longer enough to have standard operating procedures documented within Sharepoint-type portals. It is crucial to automate these static run books and documented SOPs via data center automation technologies, especially newer run book automation product sets (a.k.a. IT process automation platforms) that allow definition and utilization of embedded knowledge within the process workflows. These technologies allow contractors to move static process documentation to workflows that use this environmental knowledge to actually perform the work. Thus, the current process knowledge no longer merely resides in peoples’ heads, but gets moved to a central software platform thereby mitigating loss of key contractor personnel/vendors.

· Owning the IP associated with such process automation platforms – Frequently, companies that are using outsourced services ask “why should I invest in automation software? I have already outsourced our IT work to company XYZ. They should be buying and using such software. Ultimately, we have no control over how they perform the work anyway…” The Satyam situation is a classic example of why it behooves end-customers to actually purchase and own IP related to process automation software, rather than deferring it to the IT services partner. By having process IP defined within a software platform that the customer owns, it makes it conceivable to switch contractors and/or IT services firms. If the IT services firm owns the technology deployment, the corresponding IP walks out the door with the vendor preventing the customer from getting the benefit of the embedded process knowledge.

It is advisable for the customer to have some level of control and oversight over how the work is carried out by the vendor. It is fairly commonplace for the customer to insist on use of specific tools and processes such as ticketing systems, change control mechanisms, monitoring tools and so on. The process automation engine shouldn’t be treated any different. The bottomline is, whoever has the process IP carries the biggest stick during contract renewals. If owning the technology is not feasible for the customer, at least make sure that the embedded knowledge is in a format wherein it can be retrievable and reused by the next IT services partner that replaces the current one.

Friday, January 02, 2009

Zen and the Art of Automation Continuance

The new year is a good time to start thinking about automation continuance. Most of us initiate automation projects with a focus on a handful of areas – such as provisioning servers or databases, automating the patch process, and so on. While this kind of focus is helpful in ensuring a successful outcome for that specific project, it also has the effect of reducing overall ROI for the company – because once the project is complete, people move on to other day-to-day work patterns (relying on their usual manual methods), instead of continuing to identify, streamline and automate other repetitive and complex activities.

Just recently I was asked by a customer (a senior manager at a Fortune 500 company that has been using data center automation technologies, including HP/Opsware and Stratavia's Data Palette for almost a year) "how do I influence my DBAs to truly change their behavior? I thought they had tasted blood with automation, but they keep falling back to reactive work. How do I move my team closer to spending majority of their time on proactive work items such as architecture, performance planning, providing service level dashboards, etc.?” Sure enough, their DBA team started out impressively automating over half-a-dozen tasks such as database installs, startup/shutdown processes, cloning, tablespace management, etc., however during the course of the year, their overall reactive workload seems to have relapsed.

Indeed, it can seem an art to keep IT admins motivated towards continuing automation.

A good friend of mine in the Oracle DBA community, Gaja Krishna Vaidyanatha coined the phrase “compulsive tuning disorder” to describe DBA behavior that involves spending a lot of time tweaking parameters, statistics and such in the database almost to the point of negative returns. A dirty little secret in the DBA world is that this affliction frequently extends to areas beyond performance tuning and can be referred to as “compulsive repetitive work disorder”. Most DBAs I work with are aware of their malady, but do not know how to break the cycle. They see repetitive work as something that regularly falls on their plate and they have no option but to carry out. Some of those activities may be partially automated, but overall, the nature of their work doesn’t change. In fact, they don’t know how to change, nor are they incented or empowered to figure it out.

Given this scenario, it’s almost unreasonable for managers to expect DBAs to change just because a new technology has been introduced in the organization. It almost requires a different management model, laden with heaps of work redefinition, coaching, oversight, re-training and to cement the behavior modification, a different compensation model. In other words, the surest way to bring about change in people is to change the way they are paid. Leaving DBAs to their own devices and expecting change is not being fair to them. Many DBAs are unsure how any work pattern changes will impact their users’ experience with the databases, and whether that change will cost them their jobs even. It’s just too easy to fall back to their familiar ways of reactive work patterns. After all, the typical long hours of reactive work shows one as a hardworking individual, providing a sense of being needed and fosters notions of job security.

In these tough economic times however, sheer hardwork doesn’t necessarily translate to job security. Managers are seeking individuals that can come up with smart and innovative ways for non-linear growth. In other words, they are looking to do more with the same team - without killing off that team with super long hours, or having critical work items slip through the cracks.

Automation is the biggest enabler of non-linear growth. With the arrival of the new year, it is a good time to be talking about models that advocate changes to work patterns and corresponding compensation structures. Hopefully you can use the suggestions below to guide and motivate your team to get out of the mundane rut and continue down the path of more automation (assuming of course, that you have invested in broader application/database automation platforms such as Data Palette that are capable of accommodating your path).

1. Establish a DBA workbook with weights assigned to different types of activity. For instance, “mundane activity” may get a weight of say 30, whereas “strategic work” (whatever that may be for your environment) may be assigned a weight of 70. Now break down both work categories into specific activities that that are required in your environment. Make streamlining and automating repetitive task patterns an intrinsic part of the strategic work category. Check your ticketing system to identify accurate and granular work items. Poll your entire DBA team to fill in any gaps (especially if you don’t have usable ticketing data). As a starting point, here’s a DBA workbook template that I had outlined in a prior blog.

2. Introduce a variable compensation portion to the DBAs’ total compensation package (if you don’t already have one) and link that to the DBA workbook - specifically to the corresponding weights. Obviously, this will require you to verify whether DBAs are indeed living up to the activity in the workbook by having a method to evaluate that. Make sure that there are activity IDs and cause codes for each work pattern (whether it’s an incident, service request or whatever). Get maniacal about having DBAs create a ticket for everything they do and properly categorize that activity. Also integrate your automation platform with your ticketing system so you can also measure what kind of mundane activity are being carried out in a lights-out manner. For instance, many Stratavia customers establish ITIL-based run books for repetitive DBA activities within Data Palette. As part of these automated run-books, tickets get auto-created/auto-updated/auto-closed. That in turn will ensure that automated activities, as well as manual activities get properly logged and relevant data is available for end-of month (or quarterly or even annual) reconciliation of work goals and results – prior to paying out the bonuses.

If possible, pay out the bonuses at least quarterly. Getting them (or not!) will be a frequent reminder to the team regarding work expected of them versus the work they actually do. If there are situations that truly require the DBAs to stick to mundane work patterns, identify them and get the DBAs to streamline, standardize and automate them in the near future so they no longer pose a distraction from preferred work patterns.

Many companies already have bonus plans for their DBAs and other IT admins. However they link those plans to areas such as company sales, profits or EBITDA levels. Get away from that! Those areas are not “real” to DBAs. IT admins rarely have direct control on company revenue or spending levels. Such linking, while safer for the company of course (i.e., no revenue/profits, no bonuses), does not serve it well in the long run. It does not influence employee behavior other than telling them to do “whatever it takes” to keep business users happy and the cash register ringing, which in turn promotes reactive work patterns. There is no motivation or time for IT admins to step back and think strategically. But changing bonuses and variable compensation criteria to areas that IT admins can explicitly control – such as sticking to a specific workbook with more onus on strategic behavior – brings about the positive change all managers can revel in, and in turn, better profits for the company.

Happy 2009!

PS => I do have a more formal white-paper on this subject titled “5 Steps to Changing DBA Behavior”. If you are interested, drop me a note at “dbafeedback at stratavia dot com”. Cheers!