Fast File Copy With Managed Code: UBCopy update!

If you have been following my trials with working with C# and files you know it hasn’t been a bed of roses. I ran into a roadblock when I was trying to build a high performance file copy tool in the CLR. I eventually found a solution. It works but it isn’t very clean. So, I did a little more work, removed a broken bit and simplified another bit to make it a little more stable. I’m not done yet. I know there are a couple of places I can clean the code up a bit more and a way to speed things up a bit more at the cost of using more memory, but that is for another blog post.

Logging, to the rescue

The first major change was adding a logging facility to the UBCopy program via log4net. I had stayed away from adding it fearing to much overhead but after some tweaking it seems pretty darn fast. This allowed me to do some tight timings and log debug information to a log file to track long running copies without much fuss. It also allows me to turn off all console messaging since this is used in an automated fashion if something fails I’ve got the log file to look at.

Please don’t lock my file

There are a few places where the file target file being written to could potentially be locked. Right now I have to create the file and set the file length then close it before opening it for unbuffered writes. I haven’t done anything about this grab yet. The second one is when I am writing the last block to the file. Since you have to write in page aligned multiples I was closing the file, reopening it in a buffered mode and flushing the last partial buffer to disk. I haven’t fixed these up yet but I am working on a solution. I haven’t encountered a problem yet where the file gets locked during this process but it is still a potential threat that must be dealt with. Even though the file stream is seekable and writeable you can’t issue a setlength command on it. I plan on looking deeper into the filestream code and see what they are doing when you issue a setlength. If I can eliminate the open,close and open routine it will speed things up a bit and keep anyone from grabbing my file handle.
Here’s and example of what I’m talking about, after the set file size is done there is a very small window of opportunity here.

//open output file set length to prevent growth and file fragmentation and close it.
            //We do this to prevent file fragmentation and make the write as fast as possible.
            try
            {
                _outfile = new FileStream(_outputfile, FileMode.Create, FileAccess.Write, FileShare.None, 8,
                                          FileOptions.WriteThrough);
                //set file size to minimum of one buffer to cut down on fragmentation
                _outfile.SetLength(_infilesize > CopyBufferSize ? _infilesize : CopyBufferSize);
                _outfile.Close();
                _outfile.Dispose();
            }
            catch (Exception e)
            {
                throw;
            }
            //open file for write unbuffered
            try
            {
                _outfile = new FileStream(_outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
                                          FileOptions.WriteThrough | FileFlagNoBuffering);

            }
            catch (Exception e)
            {
                throw;
            }

Unbuffered, mostly

Since unbuffered writes prevent you from writing data that isn’t memory page aligned (usually 4096), you get an error when writing to the file. To get around this all writes are the same size period, this really only effects the last write which may have a partially full buffer. It will write down the full buffer, including invalid data. Luckily, an easy fix is to reset the end of file pointer back to the correct location and everything is happy again. This allows me to eliminate an additional bit of code. here is the old code.

//open file for write buffered We do this so we can write the tail of the file
//it is a cludge but hey you get what you get in C#
outfile = new FileStream(outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
             FileOptions.WriteThrough);
//go to the right position in the file
outfile.Seek(infilesize - bytesRead1, 0);
//flush the last buffer syncronus and buffered.
outfile.Write(Buffer1, 0, bytesRead1);

And now the new, faster and better way!

//close the file handle that was using unbuffered and write through and move the EOF pointer.
Log.Debug("Close Write File Unbuffered");
_outfile.Close();
_outfile.Dispose();

try
{
    if (IsDebugEnabled)
    {
        Log.Debug("Open File Set Length");
    }
    _outfile = new FileStream(_outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
                                FileOptions.WriteThrough);
    _outfile.SetLength(_infilesize);
    _outfile.Close();
    _outfile.Dispose();
}
catch (Exception e)
{
    if (IsDebugEnabled)
    {
        Log.Debug("Failed to open for write set length");
        Log.Debug(e);
    }
    throw;
}

Buffered or Unbuffered

If the file was smaller than the size of the buffer I was just copying the file buffered. This was a check and route to a diffrent code path for copying files. Not a huge deal but just another place for problems to creep up over time. So, I removed it when I changed the way unbuffered multi-threaded writes were happening I was able to get rid of the buffered copy routine all together. Now, it will initially set the file size to a single buffer. Write the data and reset the file pointer to the original size.

You want a MOVE but get a COPY

There was also a bug that on small files UBCopy wouldn’t do a move just a copy in some cases. There was an error in the code that prevented the delete of the source file after the copy to the new location due to a lock that was being held by the program. Not the end of the world, just wasn’t working like it should 100% of the time.

So, here is the new AsyncUnbuffCopy routine. Calling it is super easy. If you want to use the code as-is you will need to download log4net and include it in your project.

//
// AsyncUnbuffCopy.cs
//
// Authors:
//  Wesley D. Brown <wes@planetarydb.com>
//
// Copyright (C) 2010 SQLServerIO (http://www.SQLServerIO.com)
//
// Permission is hereby granted, free of charge, to any person obtaining
// a copy of this software and associated documentation files (the
// "Software"), to deal in the Software without restriction, including
// without limitation the rights to use, copy, modify, merge, publish,
// distribute, sublicense, and/or sell copies of the Software, and to
// permit persons to whom the Software is furnished to do so, subject to
// the following conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
// MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
// LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
// OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
// WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
//

using System;
using System.IO;
using System.Security.Cryptography;
using System.Text;
using System.Threading;
using log4net;

namespace UBCopy
{
    internal class AsyncUnbuffCopy
    {
        private static readonly ILog Log = LogManager.GetLogger(typeof(AsyncUnbuffCopy));
        private static readonly bool IsDebugEnabled = Log.IsDebugEnabled;

        //file names
        private static string _inputfile;
        private static string _outputfile;

        //checksum holders
        private static string _infilechecksum;
        private static string _outfilechecksum;

        //show write progress
        private static bool _reportprogress;

        //cursor position
        private static int _origRow;
        private static int _origCol;

        //number of chunks to copy
        private static int _numchunks;

        //track read state and read failed state
        private static bool _readfailed;

        //syncronization object
        private static readonly object Locker1 = new object();

        //buffer size
        public static int CopyBufferSize;
        private static long _infilesize;

        //buffer read
        public static byte[] Buffer1;
        private static int _bytesRead1;

        //buffer overlap
        public static byte[] Buffer2;
        private static bool _buffer2Dirty;
        private static int _bytesRead2;

        //buffer write
        public static byte[] Buffer3;

        //total bytes read
        private static long _totalbytesread;
        private static long _totalbyteswritten;

        //filestreams
        private static FileStream _infile;
        private static FileStream _outfile;

        //secret sauce for unbuffered IO
        const FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;

        private static void AsyncReadFile()
        {
            //open input file
            try
            {
                _infile = new FileStream(_inputfile, FileMode.Open, FileAccess.Read, FileShare.None,
CopyBufferSize, FileFlagNoBuffering);
            }
            catch (Exception e)
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Failed to open for read");
                    Log.Debug(e);
                }
                throw;
            }
            //if we have data read it
            while (_totalbytesread < _infilesize)
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Read _buffer2Dirty    : " + _buffer2Dirty);
                }
                _bytesRead1 = _infile.Read(Buffer1, 0, CopyBufferSize);
                Monitor.Enter(Locker1);
                try
                {
                    while (_buffer2Dirty) Monitor.Wait(Locker1);
                    Buffer.BlockCopy(Buffer1, 0, Buffer2, 0, _bytesRead1);
                    _buffer2Dirty = true;
                    _bytesRead2 = _bytesRead1;
                    _totalbytesread = _totalbytesread + _bytesRead1;
                    Monitor.PulseAll(Locker1);
                    if (IsDebugEnabled)
                    {

                        Log.Debug("Read       : " + _totalbytesread);
                    }
                }
                catch (Exception e)
                {
                    Log.Fatal("Read Failed.");
                    Log.Fatal(e);
                    _readfailed = true;
                    throw;
                }
                finally { Monitor.Exit(Locker1); }
            }
            //clean up open handle
            _infile.Close();
            _infile.Dispose();
        }

        private static void AsyncWriteFile()
        {
            //open output file set length to prevent growth and file fragmentation and close it.
            //We do this to prevent file fragmentation and make the write as fast as possible.
            try
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Open File Set Length");
                }
                _outfile = new FileStream(_outputfile, FileMode.Create, FileAccess.Write, FileShare.None, 8,
                                          FileOptions.WriteThrough);

                //set file size to minimum of one buffer to cut down on fragmentation
                _outfile.SetLength(_infilesize > CopyBufferSize ? _infilesize : CopyBufferSize);

                _outfile.Close();
                _outfile.Dispose();
            }
            catch (Exception e)
            {
                Log.Fatal("Failed to open for write set length");
                Log.Fatal(e);
                throw;
            }

            //open file for write unbuffered
            try
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Open File Write Unbuffered");
                }
                _outfile = new FileStream(_outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
                                          FileOptions.WriteThrough | FileFlagNoBuffering);

            }
            catch (Exception e)
            {
                Log.Fatal("Failed to open for write unbuffered");
                Log.Fatal(e);
                throw;
            }

            var pctinc = 0.0;
            var progress = pctinc;

            //progress stuff
            if (_reportprogress)
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Report Progress : True");
                }
                pctinc = 100.00 / _numchunks;
            }
            if (IsDebugEnabled)
            {
                Log.Debug("While Write _totalbyteswritten          : " + _totalbyteswritten);
                Log.Debug("While Write _infilesize - CopyBufferSize: " + (_infilesize - CopyBufferSize));
            }
            while ((_totalbyteswritten < _infilesize) && !_readfailed)
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Write Unbuffered _buffer2Dirty    : " + _buffer2Dirty);
                }
                lock (Locker1)
                {
                    if (IsDebugEnabled)
                    {
                        Log.Debug("Write Unbuffered Lock");
                    }
                    while (!_buffer2Dirty) Monitor.Wait(Locker1);
                    if (IsDebugEnabled)
                    {
                        Log.Debug("Write Unbuffered _buffer2Dirty    : " + _buffer2Dirty);
                    }
                    Buffer.BlockCopy(Buffer2, 0, Buffer3, 0, _bytesRead2);
                    _buffer2Dirty = false;
                    if (IsDebugEnabled)
                    {
                        Log.Debug("Write Unbuffered _buffer2Dirty    : " + _buffer2Dirty);
                    }
                    _totalbyteswritten = _totalbyteswritten + CopyBufferSize;
                    if (IsDebugEnabled)
                    {
                        Log.Debug("Written Unbuffered : " + _totalbyteswritten);
                    }
                    Monitor.PulseAll(Locker1);
                    //fancy dan in place percent update on each write.

                    if (_reportprogress && !IsDebugEnabled)
                    {
                        Console.SetCursorPosition(_origCol, _origRow);
                        if (progress < 101 - pctinc)
                        {
                            progress = progress + pctinc;
                            Console.Write("%{0}", Math.Round(progress, 0));
                        }
                    }
                }
                try
                {
                    _outfile.Write(Buffer3, 0, CopyBufferSize);
                }
                catch (Exception e)
                {
                    Log.Fatal("Write Unbuffered Failed");
                    Log.Fatal(e);
                    throw;
                }
            }

            //close the file handle that was using unbuffered and write through
            Log.Debug("Close Write File Unbuffered");
            _outfile.Close();
            _outfile.Dispose();

            try
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Open File Set Length");
                }
                _outfile = new FileStream(_outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
                                          FileOptions.WriteThrough);
                _outfile.SetLength(_infilesize);
                _outfile.Close();
                _outfile.Dispose();
            }
            catch (Exception e)
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Failed to open for write set length");
                    Log.Debug(e);
                }
                throw;
            }
        }

        public static int AsyncCopyFileUnbuffered(string inputfile, string outputfile, bool overwrite,
bool movefile, bool checksum, int buffersize, bool reportprogress)
        {
            if (IsDebugEnabled)
            {
                Log.Debug("inputfile      : " + inputfile);
                Log.Debug("outputfile     : " + outputfile);
                Log.Debug("overwrite      : " + overwrite);
                Log.Debug("movefile       : " + movefile);
                Log.Debug("checksum       : " + checksum);
                Log.Debug("buffersize     : " + buffersize);
                Log.Debug("reportprogress : " + reportprogress);
            }
            //report write progress
            _reportprogress = reportprogress;

            //set file name globals
            _inputfile = inputfile;
            _outputfile = outputfile;

            //setup single buffer size, remember this will be x3.
            CopyBufferSize = buffersize * 1024 * 1024;

            //buffer read
            Buffer1 = new byte[CopyBufferSize];

            //buffer overlap
            Buffer2 = new byte[CopyBufferSize];

            //buffer write
            Buffer3 = new byte[CopyBufferSize];

            //clear all flags and handles
            _totalbytesread = 0;
            _totalbyteswritten = 0;
            _bytesRead1 = 0;
            _buffer2Dirty = false;

            //if the overwrite flag is set to false check to see if the file is there.
            if (File.Exists(outputfile) && !overwrite)
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Destination File Exists!");
                }
                Console.WriteLine("Destination File Exists!");
                return 0;
            }

            //create the directory if it doesn't exist
            if (!Directory.Exists(outputfile))
            {
                try
                {
                    // ReSharper disable AssignNullToNotNullAttribute
                    Directory.CreateDirectory(Path.GetDirectoryName(outputfile));
                    // ReSharper restore AssignNullToNotNullAttribute
                }
                catch (Exception e)
                {
                    Log.Fatal("Create Directory Failed.");
                    Log.Fatal(e);
                    Console.WriteLine("Create Directory Failed.");
                    Console.WriteLine(e.Message);
                    throw;
                }
            }

            //get input file size for later use
            var inputFileInfo = new FileInfo(_inputfile);
            _infilesize = inputFileInfo.Length;

            //get number of buffer sized chunks used to correctly display percent complete.
            _numchunks = (int)((_infilesize / CopyBufferSize) <= 0 ? (_infilesize / CopyBufferSize) : 1);

            if (IsDebugEnabled)
            {
                Log.Debug("File Copy Started");
            }
            Console.WriteLine("File Copy Started");

            //create read thread and start it.
            var readfile = new Thread(AsyncReadFile) { Name = "ReadThread", IsBackground = true };
            readfile.Start();

            if (IsDebugEnabled)
            {
                //debug show if we are an even multiple of the file size
                Log.Debug("Number of Chunks: " + _numchunks);
            }

            //create write thread and start it.
            var writefile = new Thread(AsyncWriteFile) { Name = "WriteThread", IsBackground = true };
            writefile.Start();

            if (_reportprogress)
            {
                //set fancy curor position
                _origRow = Console.CursorTop;
                _origCol = Console.CursorLeft;
            }

            //wait for threads to finish
            readfile.Join();
            writefile.Join();

            //leave a blank line for the progress indicator
            if (_reportprogress)
                Console.WriteLine();

            if (IsDebugEnabled)
            {
                Log.Debug("File Copy Done");
            }

            Console.WriteLine("File Copy Done");

            if (checksum)
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Checksum Source File Started");
                }
                Console.WriteLine("Checksum Source File Started");
                //create checksum read file thread and start it.
                var checksumreadfile = new Thread(GetMD5HashFromInputFile) {
Name = "checksumreadfile", IsBackground = true };
                checksumreadfile.Start();

                if (IsDebugEnabled)
                {
                    Log.Debug("Checksum Destination File Started");
                }
                Console.WriteLine("Checksum Destination File Started");
                //create checksum write file thread and start it.
                var checksumwritefile = new Thread(GetMD5HashFromOutputFile) {
Name = "checksumwritefile", IsBackground = true };
                checksumwritefile.Start();

                //hang out until the checksums are done.
                checksumreadfile.Join();
                checksumwritefile.Join();

                if (_infilechecksum.Equals(_outfilechecksum))
                {
                    if (IsDebugEnabled)
                    {
                        Log.Debug("Checksum Verified");
                    }
                    Console.WriteLine("Checksum Verified");
                }
                else
                {
                    if (IsDebugEnabled)
                    {
                        Log.Debug("Checksum Failed");
                        Log.DebugFormat("Input File Checksum : {0}", _infilechecksum);
                        Log.DebugFormat("Output File Checksum: {0}", _outfilechecksum);
                    }
                    Console.WriteLine("Checksum Failed");
                    Console.WriteLine("Input File Checksum : {0}", _infilechecksum);
                    Console.WriteLine("Output File Checksum: {0}", _outfilechecksum);
                }
            }

            if (movefile && File.Exists(inputfile) && File.Exists(outputfile))
                try
                {
                    File.Delete(inputfile);
                }
                catch (IOException ioex)
                {
                    if (IsDebugEnabled)
                    {
                        Log.Error("File in use or locked cannot move file.");
                        Log.Error(ioex);
                    }
                    Console.WriteLine("File in use or locked");
                    Console.WriteLine(ioex.Message);
                }
                catch (Exception ex)
                {
                    if (IsDebugEnabled)
                    {
                        Log.Error("File Failed to Delete");
                        Log.Error(ex);
                    }
                    Console.WriteLine("File Failed to Delete");
                    Console.WriteLine(ex.Message);
                }
            return 1;
        }

        //hash input file
        public static void GetMD5HashFromInputFile()
        {
            var fs = new FileStream(_inputfile, FileMode.Open, FileAccess.Read, FileShare.None,
CopyBufferSize);
            MD5 md5 = new MD5CryptoServiceProvider();
            byte[] retVal = md5.ComputeHash(fs);
            fs.Close();

            var sb = new StringBuilder();
            for (var i = 0; i < retVal.Length; i++)
            {
                sb.Append(retVal[i].ToString("x2"));
            }
            _infilechecksum = sb.ToString();
        }

        //hash output file
        public static void GetMD5HashFromOutputFile()
        {
            var fs = new FileStream(_outputfile, FileMode.Open, FileAccess.Read, FileShare.None,
CopyBufferSize);
            MD5 md5 = new MD5CryptoServiceProvider();
            byte[] retVal = md5.ComputeHash(fs);
            fs.Close();

            var sb = new StringBuilder();
            for (var i = 0; i < retVal.Length; i++)
            {
                sb.Append(retVal[i].ToString("x2"));
            }
            _outfilechecksum = sb.ToString();
        }
    }
}

SQLDIY: Alert On Blocking Chains

Continuing the SQLDIY monitoring project we will take a look at tracking blocking events. I actually received a request to update this from a script I had made available at SQLServerCentral. This was a script from the grand old days of SQL Server 2000 and actually used xp_smtp_sendmail That Gert Drapers wrote about a million years ago in SQL Server time. With the advent of database mail it only made sense to update the procedure to use it. This still relies on sysprocesses but since it isn’t deprecated yet I’ve got at least three versions of SQL Server before I need to fix that. I did replace fn_get_sql with sys.dm_exec_sql_text since it is deprecated. Also, it simplified things by reducing the temporary table count by one thanks to the power of cross apply.

This module serves two purposes. One, to alert you based on criteria and thresholds. Two, track blocking issues over time. I found that concurrency issues crop up over time due to increase in data sizes. Tracking these blocking events may help you get in front of what may be a design flaw before it cripples your application.

As always, we create a new table BlockingChains in our local management database.

use Management
GO
drop table BlockingChains
go
CREATE TABLE [dbo].[BlockingChains] (
	[ServerName] [varchar] (255) NOT NULL,
	[SampleTime] [datetime] NOT NULL ,
	[Spid] [int] NULL ,
	[SpidBlocked] [int] NULL ,
	[WaitType] [varchar] (255) NULL ,
	[WaitTime] [bigint] NULL ,
	[PhysicalIO] [bigint] NULL ,
	[CPUInSeconds] [bigint] NULL ,
	[MemoryUsed] [bigint] NULL ,
	[Name] [nvarchar] (128)  NOT NULL ,
	[NumberOfOpenTransactions] [tinyint] NULL ,
	[Status] [varchar] (20)  NULL ,
	[HostName] [varchar] (50)  NULL ,
	[ProgramName] [varchar] (100)  NULL ,
	[CommandIssued] [varchar] (100)  NULL ,
	[DomainName] [varchar] (100)  NULL ,
	[DomainUserName] [varchar] (200)  NULL ,
	[LoginName] [varchar] (100)  NULL ,
	[EventTpe] [varchar] (255)  NULL ,
	[Parameters] [varchar] (255)  NULL ,
	[EventInfo] [varchar] (4000)  NULL ,
	[CommandText] [varchar] (max)  NULL
)
GO

Next comes the stored procedure. We have to get creative if we want to capture blocking chains under a minute. To do that we side step the sql agent a bit and have it start our job and let it run. The procedure then enters a loop and waits.

The procedure is pretty self explanatory and follows a pattern you will see me use again for gathering information at a resolution less than one minute. @Recivers should be a list of emails separated by a semicolon(;). the “ignore” variables are all comma separated lists. You may wish to ignore some things to keep the noise level down.

USE management

GO

CREATE PROCEDURE Alertonblocking @Duration          DATETIME = '08:00:00',
                                 -- Duration of data collection in hours.
                                 @IntervalSec       INT = 30,
                                 -- Approximate time in seconds the gathering interval.
                                 @MaxWaitTime       INT = 28000,
                                 -- This is in milliseconds.
                                 @Recivers          VARCHAR(8000) =
'test@email.com',-- Who all gets the emails.
                                 @ProcessesToIgnore VARCHAR(8000) = '',
                                 -- Ignore any processes that you don't want to trigger an alert.
                                 @HostsToIgnore     VARCHAR(8000) = '',
                                 -- Ignore any host that you don't want to trigger an alert.
                                 @LoginsToIgnore    VARCHAR(8000) = ''
-- Ignore any login that you don't want to trigger an alert.
AS
  SET nocount ON

  CREATE TABLE #active_spids
    (
       spid           INT,
       blocked        INT,
       waittype       VARCHAR(255),
       waittime       BIGINT,
       physical_io    BIGINT,
       cpu            BIGINT,
       memusage       BIGINT,
       [dbid]         INT,
       open_tran      TINYINT,
       [status]       VARCHAR(20),
       hostname       VARCHAR(50),
       [program_name] VARCHAR(100),
       cmd            VARCHAR(100),
       nt_domain      VARCHAR(100),
       nt_username    VARCHAR(200),
       loginame       VARCHAR(100),
       [sql_handle]   [BINARY] (20) NOT NULL,
       [stmt_start]   [INT] NOT NULL,
       [stmt_end]     [INT] NOT NULL,
       [sql_text]     [VARCHAR] (MAX)
    )

  CREATE TABLE #active_spids_info
    (
       spid           INT,
       blocked        INT,
       waittype       VARCHAR(255),
       waittime       BIGINT,
       physical_io    BIGINT,
       cpu            BIGINT,
       memusage       BIGINT,
       [dbid]         INT,
       open_tran      TINYINT,
       [status]       VARCHAR(20),
       hostname       VARCHAR(50),
       [program_name] VARCHAR(100),
       cmd            VARCHAR(100),
       nt_domain      VARCHAR(100),
       nt_username    VARCHAR(200),
       loginame       VARCHAR(100),
       [sql_handle]   [BINARY] (20) NOT NULL,
       [stmt_start]   [INT] NOT NULL,
       [stmt_end]     [INT] NOT NULL,
       eventtype      VARCHAR(255),
       parameters     VARCHAR(255),
       eventinfo      VARCHAR(4000),
                [VARCHAR] (MAX)
    )

  CREATE TABLE #event_info
    (
       spid         INT,
       eventtype    VARCHAR(255),
       [Parameters] VARCHAR(255),
       eventinfo    VARCHAR(4000)
    )

  DECLARE @TerminateGatheringDT DATETIME,-- when to stop gathering
          @WaitFor_Interval     DATETIME,
          @LastRecordingDT      DATETIME,
          @RecordingDT          DATETIME,
          @myError              INT,-- Local copy of @@ERROR
          @myRowCount           INT,-- Local copy of @@RowCount
          @msgText              NVARCHAR(4000),-- for error messages
          @dbname               VARCHAR(255),
          @svrname              VARCHAR(255),
          @datestart            AS DATETIME,
          @tstamp               VARCHAR(255),
          @spid1                VARCHAR(255),
          @dbname1              VARCHAR(255),
          @status               VARCHAR(255),
          @hostname             VARCHAR(255),
          @programname          VARCHAR(255),
          @cmd                  VARCHAR(255),
          @nt_domain            VARCHAR(255),
          @nt_username          VARCHAR(255),
          @loginame             VARCHAR(255),
          @text                 VARCHAR(8000),
          @msg                  VARCHAR(8000),
          @sub                  VARCHAR(8000),
          @timestamp            AS DATETIME,
          @spid                 INT,
          @sqlhandle            BINARY(20),
          @tsqlhandle           AS VARCHAR(255),
          @waittime             VARCHAR(255),
          @waittype             VARCHAR(255),
          @buffer               VARCHAR(255),
          @diffmsec             BIGINT

  --SET @Duration = '08:00:00' -- Duration of data collection
  --SET @IntervalSec = 30 -- Approx sec in the gathering interval
  --SET @MaxWaitTime = 28000 -- This is in miliseconds!!!
  --SET @Recivers = '' --who all gets the emails
  SET @diffmsec = Datediff(ms, CONVERT(DATETIME, '00:00:00', 8), @Duration)

  SELECT @WaitFor_Interval = Dateadd (s, @IntervalSec,
                             CONVERT (DATETIME, '00:00:00', 108
                                    )),
         @TerminateGatheringDT = Dateadd(ms, @diffmsec, Getdate())

  WHILE Getdate() <= @TerminateGatheringDT
    BEGIN
        TRUNCATE TABLE #active_spids

        TRUNCATE TABLE #active_spids_info

        TRUNCATE TABLE #event_info

        INSERT INTO #active_spids
        SELECT spid,
               blocked,
               waittype,
               waittime,
               physical_io,
               cpu,
               [memusage],
               a.dbid,
               open_tran,
               a.status,
               hostname,
               [program_name],
               cmd,
               nt_domain,
               nt_username,
               loginame,
               [sql_handle],
               [stmt_start],
               [stmt_end],
               
        FROM   (SELECT spid,
                       blocked,
                       'waittype' = CASE
                                      WHEN waittype = 0x0001 THEN
                                      'Exclusive table lock'
                                      WHEN waittype = 0x0003 THEN
                                      'Exclusive intent lock'
                                      WHEN waittype = 0x0004 THEN
                                      'Shared table lock'
                                      WHEN waittype = 0x0005 THEN
                                      'Exclusive page lock'
                                      WHEN waittype = 0x0006 THEN
                                      'Shared page lock'
                                      WHEN waittype = 0x0007 THEN
                                      'Update page lock'
                                      WHEN waittype = 0x0013 THEN
                                      'Buffer resource lock (exclusive) request'
                                      WHEN waittype = 0x0013 THEN
                       'Miscellaneous I/O (sort, audit, direct xact log I/O)'
                       WHEN waittype = 0x0020 THEN 'Buffer in I/O'
                       WHEN waittype = 0x0022 THEN 'Buffer being dirtied'
                       WHEN waittype = 0x0023 THEN 'Buffer being dumped'
                       WHEN waittype = 0x0081 THEN 'Write the TLog'
                       WHEN waittype = 0x0200 THEN 'Parallel query coordination'
                       WHEN waittype = 0x0208 THEN 'Parallel query coordination'
                       WHEN waittype = 0x0420 THEN 'Buffer I/O latch'
                       WHEN waittype = 0x0421 THEN 'Buffer I/O latch'
                       WHEN waittype = 0x0422 THEN 'Buffer I/O latch'
                       WHEN waittype = 0x0423 THEN 'Buffer I/O latch'
                       WHEN waittype = 0x0424 THEN 'Buffer I/O latch'
                       WHEN waittype = 0x0425 THEN 'Buffer I/O latch'
                       WHEN waittype = 0x0800 THEN 'Network I/O completion'
                       WHEN waittype = 0x8001 THEN 'Exclusive table lock'
                       WHEN waittype = 0x8003 THEN 'Exclusive intent lock'
                       WHEN waittype = 0x8004 THEN 'Shared table lock'
                       WHEN waittype = 0x8005 THEN 'Exclusive page lock'
                       WHEN waittype = 0x8006 THEN 'Shared page lock'
                       WHEN waittype = 0x8007 THEN 'Update page lock'
                       WHEN waittype = 0x8011 THEN
                       'Buffer resource lock (shared) request'
                       ELSE 'OLEDB/Miscellaneous'
                                    END,
                       waittime,
                       physical_io,
                       cpu,
                       [memusage],
                       sp.dbid,
                       open_tran,
                       status,
                       hostname,
                       [program_name],
                       cmd,
                       nt_domain,
                       nt_username,
                       loginame,
                       [sql_handle],
                       [stmt_start],
                       [stmt_end],
                       
                FROM   MASTER.dbo.sysprocesses sp WITH(nolock)
                       CROSS APPLY sys.Dm_exec_sql_text([sql_handle]))a
        WHERE  blocked > 0
               AND waittime > @MaxWaitTime
        UNION ALL
        SELECT spid,
               blocked,
               waittype,
               waittime,
               physical_io,
               cpu,
               [memusage],
               a.dbid,
               open_tran,
               a.status,
               hostname,
               [program_name],
               cmd,
               nt_domain,
               nt_username,
               loginame,
               [sql_handle],
               [stmt_start],
               [stmt_end],
               
        FROM   (SELECT spid,
                       blocked,
                       'waittype' = CASE
                                      WHEN waittype = 0x0001 THEN
                                      'Exclusive table lock'
                                      WHEN waittype = 0x0003 THEN
                                      'Exclusive intent lock'
                                      WHEN waittype = 0x0004 THEN
                                      'Shared table lock'
                                      WHEN waittype = 0x0005 THEN
                                      'Exclusive page lock'
                                      WHEN waittype = 0x0006 THEN
                                      'Shared page lock'
                                      WHEN waittype = 0x0007 THEN
                                      'Update page lock'
                                      WHEN waittype = 0x0013 THEN
                                      'Buffer resource lock (exclusive) request'
                                      WHEN waittype = 0x0013 THEN
                       'Miscellaneous I/O (sort, audit, direct xact log I/O)'
                       WHEN waittype = 0x0020 THEN 'Buffer in I/O'
                       WHEN waittype = 0x0022 THEN 'Buffer being dirtied'
                       WHEN waittype = 0x0023 THEN 'Buffer being dumped'
                       WHEN waittype = 0x0081 THEN 'Write the TLog'
                       WHEN waittype = 0x0200 THEN 'Parallel query coordination'
                       WHEN waittype = 0x0208 THEN 'Parallel query coordination'
                       WHEN waittype = 0x0420 THEN 'Buffer I/O latch'
                       WHEN waittype = 0x0421 THEN 'Buffer I/O latch'
                       WHEN waittype = 0x0422 THEN 'Buffer I/O latch'
                       WHEN waittype = 0x0423 THEN 'Buffer I/O latch'
                       WHEN waittype = 0x0424 THEN 'Buffer I/O latch'
                       WHEN waittype = 0x0425 THEN 'Buffer I/O latch'
                       WHEN waittype = 0x0800 THEN 'Network I/O completion'
                       WHEN waittype = 0x8001 THEN 'Exclusive table lock'
                       WHEN waittype = 0x8003 THEN 'Exclusive intent lock'
                       WHEN waittype = 0x8004 THEN 'Shared table lock'
                       WHEN waittype = 0x8005 THEN 'Exclusive page lock'
                       WHEN waittype = 0x8006 THEN 'Shared page lock'
                       WHEN waittype = 0x8007 THEN 'Update page lock'
                       WHEN waittype = 0x8011 THEN
                       'Buffer resource lock (shared) request'
                       ELSE 'OLEDB/Miscellaneous'
                                    END,
                       waittime,
                       physical_io,
                       cpu,
                       [memusage],
                       sp.dbid,
                       open_tran,
                       status,
                       hostname,
                       [program_name],
                       cmd,
                       nt_domain,
                       nt_username,
                       loginame,
                       [sql_handle],
                       [stmt_start],
                       [stmt_end],
                       
                FROM   MASTER.dbo.sysprocesses sp WITH(nolock)
                       CROSS APPLY sys.Dm_exec_sql_text([sql_handle])
                WHERE  spid IN (SELECT blocked
                                FROM   MASTER.dbo.sysprocesses WITH(nolock)
                                WHERE  blocked > 0
                                       AND waittime > @MaxWaitTime)) a
        ORDER  BY blocked

        --loop through the spids without a cursor
        WHILE (SELECT COUNT(spid)
               FROM   #active_spids) > 0
          BEGIN
              SET @spid = (SELECT TOP 1 spid
                           FROM   #active_spids
                           ORDER  BY spid)

              --grab the top spid
              INSERT INTO #active_spids_info
                          (spid,
                           blocked,
                           waittype,
                           waittime,
                           physical_io,
                           cpu,
                           [memusage],
                           dbid,
                           open_tran,
                           status,
                           hostname,
                           [program_name],
                           cmd,
                           nt_domain,
                           nt_username,
                           loginame,
                           [sql_handle],
                           [stmt_start],
                           [stmt_end],
                           )
              SELECT TOP 1 spid,
                           blocked,
                           waittype,
                           waittime,
                           physical_io,
                           cpu,
                           [memusage],
                           dbid,
                           open_tran,
                           status,
                           hostname,
                           [program_name],
                           cmd,
                           nt_domain,
                           nt_username,
                           loginame,
                           [sql_handle],
                           [stmt_start],
                           [stmt_end],
                           [sql_text]
              FROM   #active_spids
              ORDER  BY spid

              INSERT INTO #event_info
                          (eventtype,
                           parameters,
                           eventinfo)
              EXEC('DBCC INPUTBUFFER (' + @spid + ') WITH NO_INFOMSGS')

              --get the inputbuffer
              EXEC('update #event_info set spid = '+@spid+' where spid IS NULL')

              --add the spid to the input buffer data
              SELECT @sqlhandle = sql_handle
              FROM   #active_spids
              WHERE  spid = @spid

              DELETE FROM #active_spids
              WHERE  spid = @spid
          --remove the spid processed
          END

        UPDATE #active_spids_info
        SET    #active_spids_info.eventtype = #event_info.eventtype,
               #active_spids_info.parameters = #event_info.parameters,
               #active_spids_info.eventinfo = #event_info.eventinfo
        FROM   #active_spids_info,
               #event_info
        WHERE  #active_spids_info.spid = #event_info.spid

        --join all the info into one table
        SET @timestamp = Getdate()

        --select statement to return results
        INSERT INTO management.dbo.blockingchains
        SELECT @@SERVERNAME,
               @timestamp       AS tstamp,
               a.spid,
               a.blocked,
               a.waittype,
               a.waittime,
               a.physical_io,
               ( a.cpu / 1000 ) AS cpu_in_seconds,
               a.[memusage],
               b.[name],
               a.open_tran,
               a.status,
               a.hostname,
               a.[program_name],
               a.cmd,
               a.nt_domain,
               a.nt_username,
               a.loginame,
               a.eventtype,
               a.parameters,
               a.eventinfo,
               a.TEXT
        FROM   #active_spids_info a
               INNER JOIN MASTER.dbo.sysdatabases b
                 ON a.dbid = b.dbid

        IF ( (SELECT MAX(sampletime)
              FROM   management.dbo.blockingchains
              WHERE  spidblocked = 0
                     AND programname NOT IN( @ProcessesToIgnore )
                     AND hostname NOT IN( @HostsToIgnore )
                     AND ( domainname NOT IN( @LoginsToIgnore )
                            OR loginname NOT IN( @LoginsToIgnore ) )) =
             @timestamp
           )
          BEGIN
              SELECT @sub = 'Blocking Issues - ' + @@SERVERNAME

              SELECT @tstamp = sampletime,
                     @spid1 = spid,
                     @status = status,
                     @hostname = Isnull(hostname, ''),
                     @programname = Isnull([programname], ''),
                     @cmd = Isnull(commandissued, ''),
                     @nt_domain = Isnull(domainname, ''),
                     @nt_username = Isnull(domainusername, ''),
                     @loginame = Isnull(loginname, ''),
                     @text = Isnull(commandtext, ''),
                     @waittime = (SELECT MAX(waittime)
                                  FROM   management.dbo.blockingchains
                                  WHERE  sampletime = (SELECT MAX(sampletime)
                                                       FROM
                                         management.dbo.blockingchains)),
                     @waittype = Isnull(waittype, ''),
                     @buffer = Isnull(eventinfo, '')
              FROM   management.dbo.blockingchains
              WHERE  sampletime = (SELECT MAX(sampletime)
                                   FROM   management.dbo.blockingchains)
                     AND spidblocked = 0

              SELECT @msg =
  'The user below is at the head of the blocking chain on the listed server:'
  +
         CHAR(13) +
  '__________________________________________________________________________'
  +
                CHAR(13) + 'Server Name:' + @@SERVERNAME + CHAR(13) +
                'TimeStamp: ' + @tstamp + CHAR(13) + 'SPID: ' + @spid1 + CHAR(13)
  + 'Login Name: ' + @loginame + CHAR(13) + 'NT Domain: ' + @nt_domain + CHAR(13)
  + 'NT Username: ' + @nt_username + CHAR(13) + 'Host Name: ' + @hostname +  CHAR(13) 
  + 'Command: ' + @cmd + CHAR(13) + 'Program Name: ' +
                @programname + CHAR(13) + 'Wait Type: ' + @waittype + CHAR(13)
  +
                'Maximum Wait Time For Blocked Thread: ' + @waittime + CHAR(13) +
                'Input Buffer: ' + @buffer + CHAR(13) + 'Status: ' + @status +
                CHAR(13) + 'SQL String:' + CHAR(13) +
                '--WARNING CAN BE LONG AND MAY NOT BE THE WHOLE TEXT!!!--' +
  CHAR(13) + @text

  EXEC msdb.dbo.Sp_send_dbmail
    @recipients = @Recivers,
    @body = @msg,
    @subject = @sub;
  END

  WAITFOR delay @WaitFor_Interval -- delay
  END

  DROP TABLE #active_spids

  DROP TABLE #active_spids_info

  DROP TABLE #event_info

If you find a bug or have an issue just add a comment here and I’ll address it.

Updated:
Fixed bug where EventInfo was too small move from 255 to 4000

My Upcoming Schedule, Come Get Your Learn On!

It’s official, I’ll be speaking at SQLSaturday #63 April 2nd. It is something near and dear to my heart, solid state storage. If you have seen me speak recently at any of the other SQLSaturdays you know I love talking about I/O. This time around I’ll be focusing on one of the new technologies, flash storage. It always surprises me just how little some people understand a technology before pushing it into service! Just deciding to implement solid state disks without doing your due diligence can be a very expensive mistake. I have spent quite a bit of time over the last couple of years getting to know solid state disks and the underlying technology of NAND flash. I’ve been writing about it and I’ve been fortunate enough to have first hand experience with it as well. I’m really exited to teach about this new technology with a SQL Server slant to it.

It’s also been officially announced that I’ll be speaking at SQLRally in Orlando! I cannot tell you how exited I am about being a speaker at the very first SQLRally. To be honest with you I was a little stunned and a lot humbled by the nomination committee and your votes. Just as I was getting use to the idea of a “regular” 60 minute session, I was asked if I could expand it to 90 minutes and do a “deep dive” session instead. This will be an extended version of Understanding Storage Systems and SQL Server with more coverage on SAN’s and monitoring IO. 

What Is PASS SQLRally?

PASS’ new regional event fills the gap between our free PASS SQLSaturday 1-day training events and PASS Summit, the world’s largest conference for the SQL Server community. At SQLRally you’ll meet with industry experts, authors, Microsoft MVPs, and other people just like you to learn about best practices, effective troubleshooting, how to prevent issues, save money, and build a better SQL Server environment for your company or clients.

Fundamentals of Storage Systems, IO Latency and SQL Server

A Thousand Men Marching Still Only March As Fast As One Man.

la·ten·cy – Computers . the time required to locate the first bit or character in a storage location, expressed as access timeminus word time.

Often when talking to people about performance they get rapped around the MB/Sec number and ignore a critical factor, latency. SQL Server is one of those few applications that is very sensitive to disk and network latency. Latency is what the end user sees. If your SQL Server is waiting around for disk or network they will start to complain. In an OLTP environment SQL Server accesses data on disk in a nonlinear fashion, aka random IO’s. The size of these IO request can be pretty small. In a good application you really try and limit the amount of data returned to keep things speedy. The down side of small Random IO’s is the system will never be faster than a single seek operation on your disk. So, if you have a 15k SAS drive that is around 2.5ms. Caching and buffering schemes aside for now, 2.5ms is your floor. You will never be faster than that. Depending on the size of the IO request you spend more time waiting for the seek operation than you do actually transferring the data from the disk. We group disks together in larger arrays to give us more aggregate throughout and higher operations per second, but you are ever only as fast as your slowest disk. You can always get more aggregate throughput up to several gigabytes a second but you still have to wait on that first bit to travel to you.

To get around these short comings, SQL Server buffers as much data as it can in memory. We also add large amounts of cache to our SAN’s and disk controllers. This can mask some of the problem, but at some point it still needs to come from your disk drives.

On the network side things are actually better. With latency under a millisecond on a LAN you usually waiting on disk to deliver the data. There are other factors like the speed of the network equipment and number of hops across interfaces you have to make can be more significant than the actual transmittion rate. TCP/IP can be a factor as well. Out of the box SQL Server is configured at 4KB(4096 byte) packet. This is a good general setting for most workloads. If you are working on a highly tuned OLTP system you may want to set that to something smaller or align it with the TCP packet size for your network, usually 1500 bytes. If it is a OLAP system with lots of streaming throughput latency will make up a very small part of the overall transmission time and going with a larger packet size possibly aligned to the 8KB page size with increase throughput and decrease the time to transmit overall. If you do set a large packet size you should consider enabling jumbo frames on your network card. Make sure that your network equipment can support the jumbo frame from end to end.

Another place where we are starting to see more latency issues is with database mirroring. In synchronous mode, the default setting, you are now adding network latency plus the disk latency of the other server to the overall transaction time.

Mirroring isn’t the only game in town. We have had SAN level replication for quite a while as well. In most of the scenarios where we were using SAN level replication it was site to site across several miles. To give you an idea of how much latency can be added in this situation go ping yahoo.com or google.com, I’ll wait….. Ok, from my location ether of them is 45ms~75ms, or 18 times slower than your spinning disks. All the sudden, your network is the major player in delaying transactions. I’ve used fibre optics to connect to sites and the latency can still be a killer for OLTP systems. The general rule of thumb is 7.5 microseconds for every  1 1/2 miles. If our target SAN is 125 miles away we just added 2ms of latency to the 4ms of latency the two sets of disks are providing. In reality, it is worse than that when you again figure in network equipment. I personally haven’t seen synchronous setups more than 50 miles apart.

Just something to keep in mind as you plan your SQL Server infrastructure. Latency in its myriad forms is king.

Series To Date:
  1. Introduction
  2. The Basics of Spinning Disks
  3. The System Bus
  4. Disk Controllers, Host Bus Adapters and Interfaces
  5. RAID, An Introduction
  6. RAID and Hard Disk Reliability, Under The Covers
  7. Stripe Size, Block Size, and IO Patterns
  8. Capturing IO Patterns
  9. Testing IO Systems
  10. Latency – You are here!

SQLDIY: Manage and Monitor SQL Server Yourself

As an old school DBA and always having cheap frugal employers, I have rolled my own management and monitoring solutions. I’ve shared bits and pieces of it over the years to colleagues and to the community but never the whole thing at one go. I did try and build a commercial version. With the birth of my son, doing contract work and building a company from scratch just wasn’t in the cards. So, my loss is your gain! I’ve decided to take what I have built up over the years, clean it up and post it as a series of articles. The code is hosted at github. It may take a while to unwind all the pieces, update it for SQL Server 2008 and clean it up for general consumption but when it is done you should have a solid foundation for a solution of your own.

My goal as a DBA is to provide secure, accurate, consistent and  efficient data access to all the databases under my care.

To meet those goals I should:

  1. Provide monitoring and alerting for critical errors.
  2. Provide reliable maintenance on all databases.
  3. Perform backup and recovery operations, including testing backups to guarantee recoverability.
  4. Analyze performance issues and be proactive about possible issues.
  5. Secure the databases and monitor for any breaches in security policies and procedures.

If you have a few SQL Server instances to manage a single person may be able to do these things without any level of automation. As you scale to dozens or hundreds of instances it becomes impossible to throw enough people at the problem. You get inconsistency across servers in configuration and management. Introduce inaccuracy in your monitoring and long term growth projections. Add a whole new level of unreliability as some servers just fall off the radar and your DBA team spends all their time reacting to problems.

My core philosophy for dealing with these issues is pretty straight forward.

Manage By Exception
This means setting up your monitoring and alerting systems to work for you not against you. To many alerts are just as deadly as too few. As you handle problems feed the solution back into your automated system.

Automate Everything
Make your servers work for you and not against you! There is always a cost up front to automate the management of your environment. The time you get back by not having your very valuable and highly skilled workers not doing menial tasks more than makes up for it.

Standardize Across Servers
As servers come on line you may find that they aren’t setup to the company policies. Fixing all of this by hand is also a huge time waster. Define a standard and modify it as needed. Through automation you can make a single change and then deploy it across your entire environment.

To meet these needs I’ve written several modules, a centralized system for managing them and the servers in my care. I will update this list with links as the articles come on line. If I use any code that I haven’t written I’ll be very clear who did write it, if I made changes to it and what the license to redistribute the code falls under.

Data Gathering

Monitoring And Alerting

  • Server Blocking
  • Check Backup Space
  • Server Database Transaction Rate
  • Server Poll
  • System Database Roles
  • Long Running Queries
  • Long Running Stored Procedures
  • Log Shipping

Administration

  • Backup Database
  • Update Statistics
  • Re-indexing
  • QA Refresh
  • Test Backups
  • Log Shipping

Reporting

  • Daily Status
  • Data Dictionary
  • Long Running Queries
  • Long Running Stored Procedures

Trend Analysis

  • Backup/Restore Failure Analysis
  • Backup Growth
  • Drive Space Detail
  • Server Blocking
  • Database I/O
  • Database Transaction Rate
  • Server Wait Statistics
  • Long Running Queries
  • Long Running Stored Procedures