There is much emphasis on enterprise management tools being able to assure the availability of IT delivered services from a centralized event console. You can search the Web and find many solutions that claim they can consolidate events into one single dashboard. However, availability is only part of the story when it comes to building an application-aware infrastructure to deliver always-on business services with better-than-expected service levels. A central management point for performance across all domains is essential in assuring delivery of business services to customers.
Recent posts here have addressed how Web design models such as REST are setting the stage for standing up new services more quickly and providing some semblance of portability across private, public, and hybrid clouds. Standing on the brink of delivering on the promise of cloud computing, organizations need to ensure both the availability—and performance—of business-critical applications as they move to non-traditional deployments that still support demanding customers mostly disinterested in how computing gets done.
The Root of the Problem
You usually can’t see the forest for the trees in the legacy data center, with the often-discussed silo approach to compute, network, and storage. Organizational structure and tools are typically oriented to specific functions and domain expertise. While roles have been evolving to a more horizontal approach, new tools have mostly focused on managing availability, but not performance.
Centralized event management has evolved to encompass all events coming in from all domains (server, network, and storage) and platforms (physical, virtual, and converged). These events are consolidated into one view so we can quickly identify the root-cause of an issue.
Many data centers still often use disparate tools with their own dashboards and with little understanding of the impact on business services. These vendor-specific tools are often domain-oriented appealing to and satisfying the needs of only the network or storage administrator, for example. This silo approach to performance management and the inability to centralize and monitor multi-domain performance is characterized by:
- Limited visibility: The emphasis on service-levels for business-critical as well as less important applications has led to a growing need for better visibility. IT managers need better views from application/ virtual machine (VM)/ server through network traffic flows to LUN to identify and remedy any impact on performance, whether an outright outage or a degradation due to an intermittent problem or inadequate capacity planning. This need is compounded by the growth in virtualization and adoption of cloud which introduces abstractions to the couplings between applications and underlying compute, network, and storage resources.
- Restricted growth: Virtual machine and storage growth strain data center infrastructure rendering many legacy performance monitoring tools and processes incapable of matching the compute and storage expansion. Federated architectures and the combination of private and public cloud computing only make it worse.
- Domain-orientated: Many legacy performance monitoring tools are oriented to administrator needs and not services models. They lack multi-tenancy capabilities such as service classifications and integration with external service-management databases. It is difficult, if not impossible, to manage service-level agreements (SLAs) in real-time because information comes from many different sources which takes time to consolidate and analyze. Without a data-center wide view, it is impossible to isolate problems; each domain believes the other to be at fault.
- Technology-specific: Performance monitoring is often also limited to domain-related technologies providing information on only one layer of the stack, whether it be compute, network, or storage. It takes manual data collection and spreadsheet analysis and time to get a complete view of any performance issues and this process is subject to human error.
Cross-domain performance management solutions should be flexible, scalable, and have a service-orientation that goes cross-domain to cover servers, network, and storage, as well as physical, virtual, and converged infrastructures. Let’s look at some of the key capabilities for effective performance management more closely:
- Flexible: Data collection, analysis, and reporting needs to span your entire infrastructure, and be in real-time. Suspicious activity needs to be documented and alerts sent as soon as potential problems are detected. Reporting should encompass predefined, ready-to-use reports as well as the ability for your customers and operators to create custom analysis with user-defined key-performance indicators (KPIs) from the ground up (e.g. wizard-driven).
- Scalable: Your performance tools must have the ability to monitor thousands of devices and process millions of indicators, and be designed to scale smoothly as you grow. Additionally, these tools should be architected to support open standards, as well as cloud standards like SOAP, REST, and OpenStack.
- Service-oriented: Access should be multi-tenant with LDAP authentication support and profile-based security. It should also be through a central portal controlled by customizable roles/profiles that drive what service-level and performance information each of your users or user groups sees. Your customers should get service-level and performance summaries for their specific sites, while your operators get real-time and historical visibility for troubleshooting and capacity-trending purposes. Your executives should have access to clear and concise summary reports on-demand.
- Cross-domain: Your performance management system should provide historical and trend reports to help you predict future requirements and aid in planning capital expenditures more accurately. You should also be able to track and predict usage across compute, network, and storage to support provisioning infrastructure-as-a-service (IaaS) in private and hybrid cloud deployments.
On the Horizon
As cloud standards emerge and get adopted, the deployment possibilities made possible with cloud computing get more real. But, unlike virtualization, cloud computing not only abstracts on-premise computing, it also abstracts where data, applications, and compute resources reside opening up greater complexities and greater needs for effective data center performance monitoring and management.
Some technologies like EMC Watch4net are well-positioned to complete the service-assurance model for the cloud computing era providing the performance compliment to EMC Smarts (formerly, EMC Ionix IT Operations Intelligence) availability solution. But, before you make any investment, you should know what you need for performance management in a cross-domain world.