In this blog, I will cover a key strategic new addition to VMAX’s extensive database feature set called Database Storage Analyzer (DSA). This feature (some of you may have heard us call DB Classify) is integrated with Unisphere and allows database and storage administrators to have a view into storage using database objects. Its current capabilities allow for an unprecedented level of troubleshooting and diagnostics that can help avoid finger-pointing between database administrators and storage administrators. In subsequent generations, DSA hints will allow you to do much more.
In the rest of the blog, I will explain how DSA works and the key benefits it provides through three troubleshooting scenarios.
Introduction to DSA
DSA collects data from both the database and the storage. It runs as part of Unisphere for VMAX so any customer who runs Unisphere 8.0.0 with the new VMAX3 family or VMAX 10K, 20K or 40K has this functionality built in at no additional cost. Currently, DSA supports Oracle 10g and above; and support for Microsoft SQL server is expected later this year .
DSA collects data every five minutes using a “guest” user in the database, fetching data from Oracle dictionary tables (e.g. v$active_session_history). It also runs a lightweight process on the database server in order to map the Oracle files to the VMAX devices. DSA then correlates the key performance indicators (response time, IOs/s, MB/s and IO size) stored in Unisphere Performance Analyzer with the database files and objects. This correlated data easily allow identifying and troubleshooting performance issues.
We will further take advantage of this correlation capability in future releases1 of DSA by providing hints to the storage array and using those hints in conjunction with VMAX3 tiering
Having introduced DSA, let’s take a look at a few examples of typical database performance issues and how to tackle them by using DSA:
Use case #1: DB response time is high while the storage response time is low
In this case, the DBA observes increased read latency on the database, and tries to identify the issue. By looking at the two DSA charts, DBA realizes that the increased latency is caused by increased server load leading to “Ora wait CPU” event (Oracle sessions waiting for CPU) to go up. The “Average active session wait” view in DSA shows that sessions are waiting for CPU (the first chart below). The second chart shows the read latency observed by the database and by storage volume. As depicted, the storage latency remained the same (~2.9ms) while the database latency went up due to other server load.
To further troubleshoot the server issue, under the Analytics tab we can see the top Oracle programs that were impacted by the load on the server:
Looking at the server side we can see that there were processes running outside of the database consuming significant CPU resources causing the database sessions to wait for CPU. This is something a database admin can address working with a system/server/VM administrator.
Use case #2: Both DB and Storage Response time are high
In this case, as depicted in the next chart, both the database response time and the storage response time are relatively high (~11ms) which means that we need to take a closer look at the storage side.
Once DSA starts monitoring a DB, a “Database” entity that comprises of the database devices is created in Unisphere which means that the Storage Admin can now view all of his familiar dashboards in the context of a database under the performance tab in Unisphere Performance Analyzer. In this case, by looking at the dashboard for this specific database, we can quickly identify that the database is suffering from high queuing on the two front-end directors as well as a high utilization on the SATA drives
By looking at the performance tab in DSA, we can look at the back-end activities for the devices that are associated with the database to confirm that the IO load on 7.2K rpm drives is too high.
The following actions would address this situation:
- In VMAX3: Change the SLO setting from a low level (e.g. Silver or Bronze) to a higher level (e.g. gold, platinum or diamond)
- In VMAX 10K, 20K, 40K: Change or modify the FAST policy in VMAX to allow use of higher performing disk tiers
Use case #3: Redo log writes response time is high
By correlating database to storage, DSA has ability to distinguish VMAX volumes that store data files from those that store redo logs. Therefore, DSA can display redo write statistics from the database and the key performance indicators for VMAX devices.
In this case a database batch load started pushing the redo write response times to very high values (~80ms). In DSA, when we compare the redo log latency in database and write IO latency in VMAX, we can see that the response time on the storage side is the same level as it was prior to the batch load started. By looking at the IO Wait chart we can see that the database sessions are waiting for “configuration” class. Further investigation in the analytics tab shows that session are waiting for “log file switch” due to a bad redo log configuration and there is nothing wrong on the storage side.
In this blog, I introduced the new Database Storage Analyzer in Unisphere for VMAX along with several examples on how to troubleshoot performance issues. DSA is a major advance in correlation of database and storage metrics to enable fast, easy and straightforward troubleshooting.
 Any roadmap comments are subject to change, and do not constitute a commitment by the writer or EMC.
 Oracle best practice is to put the redo logs on dedicated volumes.