Showing posts with label Cloud Computing. Show all posts
Showing posts with label Cloud Computing. Show all posts

Wednesday, January 28, 2009

Implementing a Simple Internal Database or Application Cloud - Part I

A “simple cloud”? That comes across as an oxymoron of sorts since there’s nothing seemingly simple about cloud computing architectures. And further, what do DBAs and app admins have to do with the cloud, you ask? Well, cloud computing offers some exciting new opportunities for both Operations DBAs and Application DBAs – models that are relatively easy to implement, and bring immense value to IT end-users and customers.

The typical large data center environment has already embraced a variety of virtualization technologies at the server and storage levels. Add-on technologies offering automation and abstraction via service oriented architecture (SOA) are now allowing them to extend these capabilities up the stack – towards private database and application sub-clouds. These developments seem more pronounced in the banking, financial services and managed services sectors. However while working on Data Palette automation projects at Stratavia, every once-in-a-while I do come across IT leaders, architects and operations engineering DBAs in other industries as well, that are beginning to envision how specific facets of private cloud architectures can enable them to service their users and customers more effectively (while also compensating for the workload for some of their colleagues that have exited their companies due to the ongoing economic turmoil). I wanted to specifically share here some of the progress in database and application administration with regard to cloud computing.

So, for those database and application admins that haven’t had a lot of exposure to cloud computing (which BTW, is a common situation since most IT admins and operations DBAs are dealing with boatloads of “real-world hands-on work” rather than participating in the next evolution of database deployments), let’s take a moment to understand what it is and its relative benefits. An “application” in this context, refers to any enterprise-level app - both 3rd party (say, SAP or Oracle eBusiness Suite) as well as home-grown N-Tier apps that have a fairly large footprint. Those are the kind of applications that get maximum benefit from the cloud. Hence I use the word “data center asset” or simply “asset” to refer to any type of database or application. However at times, I do resort to specific database terminology and examples, which can be extrapolated to other application and middleware types as well.

Essentially a cloud architecture refers to a collection of data center assets (say, database instances, or just schemas to allow more granularity) that are dynamically provisioned and managed throughout their lifecycle – based on pre-defined service levels. This lifecycle covers multiple areas starting with deployment planning (e.g., capacity, configuration standards, etc.), provisioning (installation, configuration, patching and upgrades) and maintenance (space management, logfile management, etc.) extending all the way to incident and problem management (fire-fighting, responding to brown-outs and black-outs), and service request management (e.g., data refreshes, app cloning, SQL/DDL release management, and so on). All of these facets are managed centrally such that the entire asset pool can be viewed and controlled as one large asset (effectively virtualizing that asset type into a “cloud”).

Here’s a picture representing a fully baked database cloud implementation (if the picture is blurry, click on it to open up a clearer version):

As I had mentioned in a prior blog entry, there are multiple components that have come together to enable a cloud architecture. But more on that later. Let’s look at database/application specific attributes of a cloud (you could read it as a list of requirements for a database cloud).

  • Self-service capabilities: Database instances or schemas need to be capable of rapidly being provisioned based on user specifications by administrators, or by the users themselves (in selective situations – areas where the administrators feel comfortable giving control to the users directly). This provisioning can be done on existing or new servers (the term “OS images” is more appropriate given that most of the “servers” would be virtual machines rather than real bare metal) with appropriate configuration, security and compliance levels. Schema changes or SQL/DDL releases can be rolled out in a scheduled manner, or on-demand. The bulk of these releases, along with other service requests (such as refreshes, cloning, etc.) should be capable of being carried out by project teams directly– with the right credentials (think, role-based access control).
  • Real-time infrastructure: I'm borrowing a term from Gartner (specifically, distinguished analyst Donna Scott's vocabulary) to describe this requirement. Basically, the assets need to be maintained in real-time per specific deployment policies (such as development environment versus QA or Stage), tablespaces and datafiles created per specific naming / size conventions and filesystem/mount-point affinity (accommodating specific SAN or NAS devices, different LUN properties and RAID levels for reporting/batch databases versus OLTP environments), data backed up at the requisite frequency per the right backup plan (full, incremental, etc.), resource usage metered, failover/DR occurring as needed, and finally, archived and de-provisioned based on either a specific time-frame (specified at the time of provisioning) or on-demand -- after the user or administrator indicates that the environment is no longer required (or after a specific period of inactivity). All of this needs to be subject to administrative/manual oversight and controls (think, dashboards and reports, as well as ability to interact with or override automated workflow behavior).
  • Asset type abstraction and reuse: One should be able to mix-and-match these asset types. For instance, one can rollout an Oracle-only database farm or a SQL Server-only estate. Alternatively, one can also span multiple database and application platforms allowing the enterprise to better leverage their existing (heterogeneous) assets. Thus, the average resource consumer (i.e., the cloud customer) shouldn’t have to be concerned about what asset types or sub-types are included therein – unless they want to override default decision mechanisms. The intra-cloud standard operating procedures take those physical nuances into account, thereby effectively virtualizing the asset type.
The benefit of a database cloud includes empowering users to carry out diverse activities in self-service mode in a secure, role-based manner, which in turn, enhances service levels. Activities such as having a database provisioned or a test environment refreshed can often take multiple hours and days. Those can be reduced to a fraction of their normal time – reducing latency especially in situations where there needs to be hand-offs and task turn-over across multiple IT teams. In addition, the resource-metering and self-managing capabilities of the cloud allow better resource utilization and avoids resource waste, improving performance levels, and reducing outages and removing other sources of unpredictability from the equation.

A cloud, while viewed as bleeding edge by some organizations is being viewed by larger organizations as being critical – especially in the current economic situation. Rather than treating each individual database or application instance as a distinct asset and managing it per its individual requirements, a cloud model allows virtual asset consolidation, thereby allowing many assets to be treated as one and promoting unprecedented economies of scale in resource administration. So as companies continue to scale out data and assets, but cannot afford to correspondingly scale up administrative personnel , the cloud helps them achieve non-linear growth.

Hopefully the attributes and benefits of a database or application cloud (and the tremendous underlying business case) become apparent here. My next blog entry (or two) will focus on the requisite components and the underlying implementation methods to make this model a reality.

Wednesday, September 03, 2008

Clouds, Private Clouds and Data Center Automation

As part of Pacific Crest’s Mosaic Expert team, I had the opportunity to attend their annual Technology Leadership Forum in Vail last month. I participated in half-a-dozen panels and was fortunate to meet with several contributors in the technology research and investment arena. Three things seemed to rank high on everyone’s agenda: cloud computing and its twin enablers - virtualization and data center automation. The cloud juggernaut is making everyone want a piece of the action – investors want to invest in the next big cloud (pun intended!), researchers want to learn about it and CIOs would like to know when and how to best leverage it.

Interestingly, even “old-world” hosting vendors like Savvis and Rackspace are repurposing their capabilities to become cloud computing providers. In a similar vein InformationWeek recently reported some of the telecom behemoths like AT&T and Verizon with excess data center capacity have jumped into the fray with Synaptic Hosting and Computing as a Service - their respective cloud offerings. And to add to the mix, terms such as private clouds are floating around to refer to organizations that are applying SOA concepts to data center management making server, storage and application resources available as a service for users, project teams and other IT customers to leverage (complete with resource metering and billing) – all behind the corporate firewall.

As already stated in numerous publications, there are obvious concerns around data security, compliance, performance and uptime predictability. But the real question seems to be: what makes an effective cloud provider?

Google’s Dave Girourad was a keynote presenter at Pacific Crest and he touched upon some of the challenges facing Google as they opened up their Google Apps offering in the cloud. In spite of pouring hundreds of millions of dollars on cloud infrastructure, they are still grappling with stability concerns. It appears that size of the company and type of cloud (public or private) is less relevant, and more relevant is the technology components and corresponding administrative capabilities behind the cloud architecture.

Take another example: Amazon. They are one of the earliest entrants to cloud clouding and have the broadest portfolio of services in this space. Their AWS (Amazon Web Services) offering includes storage, queuing, database and a payment gateway in addition to core computing resources. Similar to Google, they have invested millions of dollars, yet are prone to outages.

In my opinion, while concerns over privacy, compliance and data security are legitimate and will always remain, the immediate issue is around scalability and predictability of performance and uptime. Clouds are being touted as a good way for smaller businesses and startups to gain resources, as well as for businesses with cyclical resource needs (e.g., retail) to gain incremental resources at short notice. I believe the current crop of larger cloud computing providers such as Amazon, Microsoft and Google can do a way better job with compliance and data security than the average startup/small business. (Sure, users and CIOs need to weigh their individual risk versus upside prior to using a particular cloud provider.) However for those businesses that rely on the cloud for their bread-and-butter operations whether cyclical or around-the-year, uptime and performance considerations are crucial. If the service is not up, they don’t have a business.

Providing predictable uptime and performance always boils down to a handful of areas. If provisioned and managed correctly, cloud computing has the potential to be used as the basis for real-time business (rather than being relegated to the status of backup/DR infrastructure.) But the key question that CIOs need to ask their vendors is: what is behind the so-called cloud architecture? How stable is that technology? How many moving parts does it have? Can the vendor provide component-level SLA and visibility? As providers like AT&T and Verizon enter the fray, they can learn a lot from Amazon and Google’s recent snafus and leverage technologies that can simplify the environment enabling it to operate in lights-out mode – making the difference behind a reliable cloud offering and one that’s prone to failures.

The challenge however, as Om Malik points out on his GigaOm blog, is that much of cloud computing infrastructure is fragile because providers are still using technologies built for a much less strenuous web. Data centers are still being managed with a significant amount of manual labor. “Standards” merely imply processes documented across reams of paper and plugged into Sharepoint-type portals. No doubt, people are trained to use these standards. But documentation and training doesn’t always account for those operators being plain forgetful, or even sick, on vacation or leaving the company and being replaced (temporarily or permanently) with other people who may not have the same operating context within the environment. Analyst studies frequently refer to the fact that over 80% of outages are due to human errors.

The problem is, many providers while issuing weekly press releases proclaiming their new cloud capabilities, haven’t really transitioned their data center management from manual to automated. They may have embraced virtualization technologies like VMware and Hyper-V, but they are still grappling with the same old methods combined with some very hard-working and talented people. Virtualization makes deployment fast and easy, but it also significantly increases the workload for the team that’s managing that new asset behind the scenes. Because virtual components are so much easier to deploy, it results in server and application sprawl and demands for work activities such as maintenance, compliance, security, incident management and service request management go through the roof. Companies (including the well-funded cloud providers) do not have the luxury of indefinitely adding head-count, nor is throwing more bodies at the problem always a good idea. They need to examine each layer in the IT stack and evaluate it for cloud readiness. They need to leverage the right technology to manage that asset throughout its lifecycle in lights-out mode – right from provisioning to upgrades and migrations, and everything in between.

That’s where data center automation comes in. Data center automation technologies have been around now for almost as long as virtualization and are proven to have the kind of maturity required for reliable lights-out automation. Data center automation products from companies such as HP (on the server, storage and network levels) and Stratavia (on the server, database and application levels) make a compelling case for marrying both physical and virtual assets behind the cloud with automation to enable dynamic provisioning and post-provisioning life-cycle management with reduced errors and stress on human operators.

Data center automation is a vital component of cloud computing enablement. Unfortunately, service providers (internal or external) that make the leap from antiquated assets to virtualization to the cloud without proper planning and deployment of automation technologies tend to provide patchy services giving a bad name to the cloud model. Think about it… Why can some providers offer dynamic provisioning and real-time error/incident remediation in the cloud, while others can’t? How can some providers be agile in getting assets online and keeping them healthy, while others falter (or don’t even talk about it)? Why do some providers do a great job with offering server cycles or storage space in the cloud, but a lousy job with databases and applications? The difference is, well-designed and well-implemented data center automation - at every layer across the infrastructure stack.