Google
 

Wednesday, January 2, 2008

Storing I/O Stats at Meaningful Intervals

Analyzing I/O statistics gathered during peak usage is much more informative than trying to work with numbers that included both peak and off-peak times of day. The solution presented in this section stores the raw information provided by fn_virtualfilestats in a way that provides information in a user selected time slice. To accomplish this task, I have created a table, dbaFilestats, and a stored procedure usp_Perf_FilestatGathering that I run every day as a scheduled job.

DbaFilestats is a table that I keep in my DBA database for gathering the statistics. Here is the CREATE TABLE script:

CREATE TABLE dbo.dbaFilestats (
ID int IDENTITY (1, 1) NOT NULL ,
dbid int NOT NULL ,
fileid int NOT NULL ,
Reads bigint NULL ,
Reads_FromStart bigint NULL ,
Writes bigint NULL ,
Writes_FromStart bigint NULL ,
BytesRead bigint NULL ,
BytesRead_FromStart bigint NULL ,
BytesWritten bigint NULL ,
BytesWritten_FromStart bigint NULL ,
IostallMS bigint NULL ,
IostallMS_FromStart bigint NULL ,
RecordedDT datetime NULL ,
timestamp bigint NULL ,
IntervalMSec bigint NULL ,
FirstMeasure_FromStart bit NULL
)

I've chosen to keep both the interval measures, such as Reads and BytesWritten, and the measures from the instance startup (%_FromStart). It is possible to do away with the _FromStart numbers by keeping them in a temporary table as the stored procedure runs, but I have found that every so often, I want to see them.

The stored procedure to perform the information gathering is dba_Perf_FilestatGathering. It takes four parameters, which are listed in Table 3.

Table 3 Parameters to dba_Perf_FilestatGathering

Parameter Data Type Description
@Duration Datetime The amount of time to collect data.
@IntervalSec Int The interval in seconds between data gathering.
@WhichDB Int DB_ID of the database to gather for. -1 for all databases.
@WhichFile Int File_ID to gather information for. -1 for all files.

To create the procedure on your system, you will need a UDF, udf_DT_AddTime, which you will find at this URL: http://www.novicksoftware.com/UDFofWeek/Vol1/T-SQL-UDF-Volume-1-Number-38-udf_DT_Addtime.htm. Here is the CREATE PROC script:

CREATE PROC dbo.dba_Perf_FilestatGathering

@Duration datetime = '01:00:00' -- Duration of data collection
-- long time gathers forever

, @IntervalSec int = 600 -- Approx sec in the gathering interval
, @WhichDB int = -1 -- DB_ID to monitor, -1 for all
, @WhichFile int = -1 -- File_ID of file to monitor, -1 for all
/*
* Procedure to monitor fn_virtualfilestats and record their value at
* intervals in the dbaFilestats table.
*
* Example:
-- TRUNCATE TABLE DBAFILESTATS
DECLARE @RC int, @StartTime datetime, @thisDBID int
SELECT @StartTime = getdate(), @thisDBID = db_id()
EXEC @RC = dba_Perf_FilestatGathering '00:10:21', 60, @ThisDBID , 1
select * from dbafilestats where RecordedDT > @StartTime
and dbid=@thisDBID and fileid = 1 Order by ID
********************************************************************/

AS

DECLARE @TerminateGatheringDT datetime -- when to stop gathering
, @WaitFor_Interval datetime
, @LastRecordingDT datetime
, @RecordingDT datetime
, @myDBID int
, @myError int -- Local copy of @@ERROR
, @myRowCount int -- Local copy of @@RowCount
, @msgText nvarchar(4000) -- for error messages


SELECT @WaitFor_Interval = DATEADD (s, @IntervalSec
, CONVERT (datetime, '00:00:00', 108)
)
, @TerminateGatheringDT =
dbo.udf_DT_AddTime (getdate(), @Duration)
, @myDBID = db_id()

WHILE getdate() <= @TerminateGatheringDT BEGIN

SELECT @LastRecordingDT = @RecordingDT
SELECT @RecordingDT = getdate() --

-- Get Stats for All files
INSERT INTO dbo.dbaFilestats (
dbid
, fileid
, Reads
, Reads_FromStart
, Writes
, Writes_FromStart
, BytesRead
, BytesRead_FromStart
, BytesWritten
, Byteswritten_FromStart
, IOStallMS
, iostallMs_FromStart
, RecordedDT
, [timestamp]
, IntervalMSec
, FirstMeasure_FromSTart
)
SELECT vfs.dbid -- Database ID
, vfs.fileid -- File ID
, vfs.numberreads - dbaf.reads_fromstart as Reads
, vfs.numberreads as Reads_FromStart
, vfs.numberwrites - dbaf.writes_fromstart
as Writes
, vfs.numberwrites as Writes_FromStart
, vfs.bytesread - dbaf.bytesRead_FromStart
as BytesRead
, vfs.bytesread as BytesRead_FromStart
, vfs.bytesWritten - dbaf.BytesWritten_FromStart
as BytesWritten
, vfs.byteswritten as BytesWritten_FromStart
, vfs.IOStallMS - dbaf.IOStallMS_FromStart
as IOStallMS
, vfs.iostallms as IOStallMS_FromStart
, @RecordingDT
, vfs.[timestamp]
, CASE WHEN @LastRecordingDT IS NULL
THEN NULL
ELSE DATEDIFF
(ms, dbaf.RecordedDT, @RecordingDT)
END as IntervalMSec
, CASE WHEN @LastRecordingDT IS NULL THEN 1 ELSE 0 END
as
FirstMeasure_FromStart
FROM ::fn_virtualfilestats(@WhichDB , @WhichFile) vfs
LEFT OUTER JOIN dbaFilestats dbaf
ON vfs.dbid = dbaf.dbid and vfs.fileid = dbaf.fileid
WHERE (@LastRecordingDT IS NULL
OR dbaf.RecordedDT = @LastRecordingDT)
SELECT @myError = @@Error, @myRowCount = @@RowCount
IF @myError != 0 BEGIN
SET @msgText = 'Error ' + CONVERT(varchar(9), @myError)
+ ' in usp_FileStatGathering during INSERT'
RAISERROR (@msgText, 16, 1) WITH LOG
RETURN @myError
END

WAITFOR DELAY @WaitFor_Interval -- delay

END

I usually run for 10 minutes and 21 seconds ('00:10:21'), gathering information every 60 seconds for all databases (-1) and all files (-1). The reason for adding the 21 seconds to the 10 minutes is that I want 10 measurements and the 21 seconds allows for the time that it takes to make and store the measurements.

Let's take a look at some of the raw data from a run against my database 16, which had a fairly heavy load at the time:

SELECT Top 5 [ID], dbid, [fileid] as [File]
, Reads, Writes, BytesRead, BytesWritten, IOStallMS
, IntervalMSec as [MSec]
FROM dbaFilestats
WHERE FirstMeasure_FromStart = 0
ORDER BY [ID]
Go
(Results)
ID   dbid File   Reads Writes BytesRead BytesWritten IOStallMS MSec
---- ---- ----- ------ ------ --------- ------------ --------- ------
2 16 1 1243 564 35774464 25452544 74530 72846
3 16 1 1911 847 75489280 48136192 62485 64186
4 16 1 244 1356 37617664 53551104 3890 62263
5 16 1 1393 1315 64258048 38379520 40297 63500
6 16 1 152 1485 13426688 66273280 4733 62703

As you can see from the MSec column, it took longer than 60 seconds to get through each of the measurements. The actual time will depend on the load on the system and the number of databases.

Looking at the raw numbers from dbaFilestats is helpful. However, I find the most useful use of the information is to turn most of the measurements into rates. This query does the job:

SELECT Top 5 
CAST(Reads/(IntervalMSec/1000.0) as Numeric (18,2))
as [Reads/Sec]
, CAST(Writes/(IntervalMsec/1000.0) as Numeric (18,2))
as [Writes/Sec]
, (Reads+Writes) as [I/O Count]
, CAST((Reads+Writes)/(IntervalMsec/1000.0) as Numeric(18,2))
as [I/O /sec]
, Iostallms/CASE WHEN (reads+writes)>0
THEN reads+writes
ELSE 1 END AS [IoStallMS/Op]
FROM dbaFilestats
WHERE FirstMeasure_FromStart = 0
AND dbid = 16 and fileid=1
ORDER BY [id]
GO
(Results)
Reads/Sec   Writes/Sec   I/O Count  IO /sec    IoStallMS/Op
----------- ------------ --------- ---------- ------------
17.06 7.74 1807 24.81 41
29.77 13.20 2758 42.97 22
3.92 21.78 1600 25.70 2
21.94 20.71 2708 42.65 14
2.42 23.68 1637 26.11 2

These rates indicate that there may be a problem with the disk(s) that database 16 is using. I know that because of the high IoStallMS/Op measurement (IoStall Milliseconds per I/O operation). Anything over twenty indicates that there is some type of problem. That is in spite of the fact that the I/O rates are moderate, in the range of 25-43 per second. A modern server disk should be able to achieve 85 to 100 operations per second. The situation warrants further investigation.

Conclusion

Input/Output is at the heart of what SQL Server does. This article has shown several methods for getting I/O statistics from SQL Server as an aid in diagnosing performance issues. I find that the most useful technique is to gather statistics during peak usage at one-minute intervals. The key indicators are the I/O operations per second and the IoStallMS/Op. You might want to investigate other ratios for your server.

No comments: