Saturday, June 27, 2009

tech: July 2009 SQL Server usergroup

The July 2009 SQL Server usergroup (at Microsoft London Office) was interesting. The format was informal, with plenty of Q&A and good audience participation.

  • 18:30 - 18:40 Intro, what’s in the news, Q&A,
  • 18:40 - 18:45 Nugget 1 %%lockres%% - James Rowland-Jones
  • 18:45 - 18:50 Nugget 2 - Common MDX Gotchas when aggregating inside Calculated Members - Ashwani Roy
  • 18:50 - 19:40 End-to-End SQL Server Performance Troubleshooting - Iain Kick
  • 20:00 - 21:00 Replaying Trace Files - Martin Bell


The July 2009 SQL Server usergroup (at Microsoft London Office) was interesting. The format was informal, with plenty of Q&A and good audience participation.

Iain Kick gave an excellent presentation, on the various sources of information you should be using, when "SQL Server Performance Troubleshooting". There were striking parallels for me with the recent Oracle / SAP performance work I have been doing. I am going to go through a few of the key sources which Iain discussed and their equivalents with my recent SAP/Basis investigations.

1) Monitor and Examine your logs

"The SQL Server Errors Logs should be monitored"

Actually logs (error, system, job...) is an area I think he could have spoken more about, in Oracle / SAP sphere I regularly check:

a) The Oracle Alert Logs - as an Oracle DBA I am comfortable going through the alert log, this can highlight important database errors (space management problems, database corruptions, file system problems, ...)

b) Within SAP R3 systems, I regularly check the "System Logs" (SM21) and for "Short Dumps" (ST22) - these are tricky to read, however, if you are unsure about a message, you should investigate. Often I check historical the logs - have we seen these message before? Is this a warning or a more routine alert. Ideally your system should be tuned so that routine errors are not reported; for example when running the database checks from SAP, you adjust the check parameters so that only genuine errors are reported. However but this is not always possible.

c) Checking Job Logs (e.g. SM37 in SAP) - Job Logs and abnormal execution times are another useful indicator of performance problems. Batch jobs can also provide a useful benchmark of overall system performance and give an early indication of general performance problems (e.g. stable/routine jobs start taking significantly longer, or certain job perhaps at critical times start taking much longer ...)

At the end of Iain's talk, he went on to tell us how the great automated monitoring and analysis tools which Quest provide will analyse much for this and present the results in a more administrator friendly format. Quest have a strong reputation, their Oracle product TOAD, is very well regarded. Within a SAP system the Central Monitoring (RZ20) allows job to configure a large tree like structure of alerts - with a simple traffic light approach i.e. Red, Yellow or Green. On top of this SAPs Solution Manager product allows you to monitor all your SAP system for a single central location. This sort of functionality is also provided within Oracle by Enterprise Manager.

2) Windows Perfmon/Sysmon (passive background montoring)

I typically associate Perfmon with O/S level statistics (e.g Paging rates/sec, CPU % usage, and Disc Queue Lengths). However Windows Perfmon goes beyond this allowing you to collect metrics specifically for SQL Server.

Brent Ozar (also a SQL Server DBA working for Quest) gives a good overview of these in his podcast "Perfmon Tutorial Video" (http://www.brentozar.com/archive/2006/12/dba-101-using-perfmon-for-sql-performance-tuning/).

Brent recommends the "Best SQL Server Performance Monitor Counters to Analyze":

  • Memory – Available MBytes
  • Paging File – % Usage
  • Physical Disk – % Disk Time
  • Physical Disk – Avg. Disk Queue Length
  • Physical Disk – Avg. Disk sec/Read
  • Physical Disk – Avg. Disk sec/Write
  • Physical Disk – Disk Reads/sec
  • Physical Disk – Disk Writes/sec
  • Processor – % Processor Time
  • SQLServer:Buffer Manager – Buffer cache hit ratio
  • SQLServer:Buffer Manager – Page life expectancy
  • SQLServer:General Statistics – User Connections
  • SQLServer:Memory Manager – Memory Grants Pending
  • System – Processor Queue Length

I have watched several of Brent's video podcasts and they are excellent.
Most of the systems I have been working on recently are running on Solaris or HP (a few have been on Windows Servers). Typically for this sort of background monitoring of O/S level statistics, I have been using

  • vmstat - (Solaris/Linux) focusing n CPU queues and Paging
  • prstat - (Solaris/Linux) overall CPU % usage, CPU usage er process and CPU sage aggregated by user (prstat -a)
  • top - (HP) process monitoring

I always have my copy of "Unix for Oracle DBAs" (by Donald K. Burleson http://oreilly.com/catalog/9780596000660/) to hand at work. This a great pocket guide for Database and Application Server administrators, showing how to perform routine administration tasks under the various flavours Unix (Solaris, HP, Aix and Linux).

Return Iain's presentation, he did also mention a couple of other practical pointers when working with Perfmon (a) the default interval s 15secs, for analyzing SQL performance you may want to reduce the interval to 3 seconds and (b) you should for readable purposes report on more than 10 counters at one time (although if you are records to a file instead of charting, this is presumably not relevant).

NB The terms perfmon and sysmon are used synonymously, formally the product is called sysmon but the term perfmon appears to be more popular.

3) SQL Profilers (active foreground monitoring)

This is a very hand tool for (a) pin pointing problems within the database and (b) sampling live production loads which can then be played back in a PVT (Performance Volume Testing) system.

Iain also discussed filtering, for example if a particular user is experiencing performance problems you extract just sql they are running with details of executions and associated costs - enabling you to much more precisely idenitfy the source of the problem.

Next Iain gave a nice demonstration of how to review/step through a captured sql profile and monitor sysmon metrics at the same time! As you step through the "critical period" (a few seconds or a few minutes) where you have captured your bottleneck/deadlock/performance problem; on the lower half of the screen you can see a chart displaying the progress of your pre-selected sysmon metrics. I have seen this before, a great feature enabling you to pinpoint and accurately access your "resource problem": CPU utilisation, CPU queues, Paging Rates, Disc Queue Lengths (Data drives, Log drives, System/Software drives), locking contentions ...

There are tracing tools bult into Oracle, where you trace individual sessions or instances. These are powerful monitoring tools but in my experience are not so easy to use and do not directly link into other non-database monitor toos like sysmon?

4) SQL Execution Plans (SSMS - SQL Server Management Studio)

Some points on how to read SQL Execution Plans:

a) Start by reading the plan from the bottom right hand corner i.e. the lowest levels of access like "index reads" or "table scans"; then work backwards up the page, here the various data-sources are joined and aggregated until the final result-set is returned.

b) When using the graphical tools to represent the various elements making up an execution plan, the "fatter linking arrows" show immediately where the most expensive elements are.

c) Alternatively you can use a text-based programmatic approach

SET STATISTICS TIME { ON | OFF }
SET STATISTICS IO { ON | OFF }


d) "In general avoid full table scans, index reads are better". This is a generalisation which often annoys me as developers and middle management hold this as an article of faith but can be wrong/misleading. I believe the rule of thumb for optimizers, is that if the optimizer calculates more than 20% of the rows are to be returned then a full table scan is more efficient then index reads (as for each row in an "index read" you need to read the index page/block then the data page/block. So seeing a full table scan is not necessarily an indicator of poor performance!

However on the other hand I recently worked on performance problem, where a full table scan being replaced by an index read improved performance by a factor of 4000 times.

Understanding these points, helps us understand why good statistics are critical - for a large table, if the stats indicate that only a few rows should be returned, then index reads will be used; however if the stats are wrong, then the optimizer may not choose a full table scan which would be much quicker (I have seen this sort of performance problem before as well).

e) Estimated vs Actual query plans

I wasn't the only surprised when he briefly mentioned the icons for the estimated query plan and actual query plan.

I had thought that unless there is a change in the structure of the database (e.g. add a new index) or fresh database statistics are gathered or an execution plan is aged-out, then the execution plan doesn't change? I'm not aware of any dynamic feedback where by the DBMS engine detects automatically what it perceives to be "slow/inefficient sql plans", uses the information gathered during the execution and potentially comes up with a new plan!? Yes modern database engines capture workload which can be analysed using automatic tools to highlight performance bottlenecks but this is not directly feedback into the execution plan.

Unfortunately when the panel where asked (by someone else in the audience) to explain what the difference between estimated vs actual query plans was, there was quite a lot of speculation around the idea that "the statistics and the actual data might be widely different" but no one gave a satisfactory to me. I did add "my two pence worth" that, I wasn't aware that the actual execution can be used to feedback directly in to the execution plan because the statistics were misleading? However I let this point drop and decided to "goggle it" later. I think the following explanation by Microsoft's "SQL Server Customer Advisory team" is helpful

While the “Estimated” and “Actual” query plans are same in most cases, there are some exception. If a table (temporary or permanent) is created in the same batch with the query, SQL Server has to recompile the query since the query definition is unknown when the batch is compiled the first time.


The authors also give a nice example of where the two query plans do diverge (see link above)

In terms of Oracle Execution plans, there are direct equivalents - showing the expected executions for each element in the plan.

5) Dynamic Management Views

Oracle has always had it V$ views which allows DBAs to script performance and general database monitoring commands. These views, showing a huge gamut of database operational metrics (since instance start-up). Being able to script the extract of such information is sometimes much more practical than a GUI interface: easier to automate, easier to document, more repeatable ...

Since SQL2005 Microsoft has introduced "dynamic management views":

The dynamic management views (DMVs) in SQL Server 2005 are designed to give you a window into what's going on inside SQL Server. They can provide information on what's currently happening inside the server as well as the objects it's storing. They are designed to be used instead of system tables and the various functions provided in SQL Server 2000. This article provides an introduction to DMVs and covers a few of the basic views and functions.


Iain briefly went through a few of the most useful views:

sys.dm_io_virtual_file_stats
Similar to fn_virtualfilestats in SQL Server 2000, this DMV function returns I/O statistics for data and log files for one or all databases and one or all files within the database(s).

sys.dm_os_sys_info
Returns a miscellaneous set of useful information about the computer, such as the hyper-thread ratio, the max worker count, and other resources used by and available to SQL Server.

sys.dm_os_sys_memory*
Returns a complete picture of memory at the operating system level, including information about total and available physical memory, total and available page memory, system cache, kernel space and so forth.

sys.dm_exec_requests
Returns one row for each request executing within SQL Server, but does not contain information for code that executes outside of SQL Server, such as distributed queries or extended stored procedures

sys.dm_tran_locks
Returns information about currently active requests to the lock manager, broken into a resource group and a request group. The request status may be active, convert, or may be waiting (wait) and includes details such as the resource on where the lock request wants a log (for the resource group) or the lock request itself (for the request group).



Iain was rather pleased to highlight one metric where SQL Server metrics have "stolen a lead" over Oracle (there appears to be a degree to rivalry between the Oracle and SQL teams at Quest, with the Oracle team being a little superior at times?). With DMV SYS.DM_DB_INDEX_USAGE_STATS can a "not in" clause you can easily identify any indexes which haven't been used since start-up:


USE AdventureWorks
GO
DECLARE @dbid INT
SELECT
@dbid = DB_ID(DB_NAME())
SELECT OBJECTNAME = OBJECT_NAME(I.OBJECT_ID),
INDEXNAME = I.NAME,
I.INDEX_ID
FROM SYS.INDEXES I
JOIN SYS.OBJECTS O
ON I.OBJECT_ID = O.OBJECT_ID
WHERE OBJECTPROPERTY(O.OBJECT_ID,'IsUserTable') = 1
AND I.INDEX_ID NOT IN (
SELECT S.INDEX_ID
FROM SYS.DM_DB_INDEX_USAGE_STATS S
WHERE S.OBJECT_ID = I.OBJECT_ID
AND I.INDEX_ID = S.INDEX_ID
AND DATABASE_ID = @dbid)
ORDER BY OBJECTNAME,
I.INDEX_ID,
INDEXNAME ASC
GO




This is probably a useful set of metrics on index usage and I wouldn't be surprised if other database vendors (i.e. Oracle) provide these details soon ....



No comments: