Monthly Archives: September 2010

Software Review: Idera’s Virtual Database

Ever since the upgrade from SQL Server 6.5 to 7.0 one of the most requested features I’ve heard people complain about loosing was the ability to backup a single table. During my tenure as Product Manager for Quest’s SQL Litespeed extracting tables was one of the things we were constantly asked about. Eventually built it, and it worked. So, believe me when I tell you it is no easy feat to extract data from a backup file, let alone make it look like a normal database and attach it to a running SQL Server instance. The holy grail to me was always “select * from backup”. Virtual Database is as close to using a backup file as a real database that I have ever seen.

Setting Up Your Virtual Database

The main screen isn’t your traditional style of GUI. This is where everything happens. It has options to attach backup files, ether a single full or multiple files to a point in time. There is a run queries option, which is redundant since you can use management studio to access the database like you would any other database. The Tips @Tricks is a little pointless since this is installed on your server and if you are like me it is locked down pretty tight so cruising web sites is out. Finally, Help & More it a lot more important than you think.

To attach a full backup is dead simple. Give it the file name, SQL Server instance name and a new database name.

If you have used Idera’s SQL Safe to do the initial backup with and had it generate the content map file the attach time is fast. I didn’t test with native backup files since we don’t do any native backups where I work. If the content file isn’t there it will have to generate it and depending on the size of the backup file it can take a while.

Attaching multiple files is a little more difficult. The wizard steps you through it though.

If you accessing files from a network share it will prompt you for credentials so it can read the backup files. Virtual Database fully supports UNC mounting.

If you have encrypted your backups using SQL Safe, it will prompt you for the password as well. You are encrypting your backups right?

It looks through the backup files to pull out key information needed to do the point in time recovery. This is also very fast. I actually had to do it a few times to get this screen shot!

Once you have the files added it will show them sorted. Any files it thinks there is a problem with it will highlight in red for you. These were LSN out of sync errors. It doesn’t stop you from going to the next screen though.

Once you have settled on a file list you can now choose your point in time. You can have multiple full, differential, or transaction log backup files specified. When you choose a point in time it selects the files needed to do the recovery. It will choose the backup files that are needed. If you have done what I just did and selected everything in a folder it won’t apply three full backups plus all the transaction log files.

Like the single file attachment process, you still need to provide an instance name and database name.

Finally, it gives you a summary of what is going to happen. After that the database is ether attached, or fails. If it fails you have to start all over.

Configuration Options

There are very few configuration changes that you can make the product.

I’m not thrilled with this screen setup. It’s got a lot going on and some of the functions like most of the help stuff, doesn’t work on my locked down server.

Where you put your support files may an impact on performance. SQLvdb has some support files it needs to do its magic. This is also where it records changes you make. If you plan on making a lot of changes you may want to move the location from the C drive to another faster and larger drive.

Some files aren’t deleted when you drop the virtual database and require a manual cleanup step. You can do it here or through the command line tool.

The drive letter is basically a protection mechanism. SQLvdb is looking for a specific set of file extensions like .mxf or .lxf. So, don’t use those extensions for anything else. You can change it if you do have a file extension collision.

The service connection allows you to change the port but not the server name. This means I can’t have the GUI installed on my desktop and access my SQL Server where the services and driver is installed. I don’t know if this is supposed to be this way or if it is a bug.

The Good

What can I say, it is select * from backup. You mount it and like magic your backup file is a live database again.

You can also create objects in this new database. That’s right, you can create a new table and put data in it. That data and structure aren’t stored in the backup file though and if you remove the database all changes are lost. You can do a backup of the “new” database just like you would a normal backup though, saving your changes to a new backup file.

You can also drop tables, alter stored procedures you name it. I haven’t found a DDL statement I couldn’t execute yet. Just like before, if you need to reset the database just drop it and re-attach it, just like magic you have a clean slate again.

It works across the network. If you don’t have enough space on your machine for the backup file you can still mount it via UNC. This allows me to look at backup files from my development or test servers without having to consume the space to house the database. Don’t expect the performance to be great, you are accessing a file across the network.

Any tool that works with SQL Server works with this. If you have a favorite scripting tool you can use it. Written your own programs? No problem you can use them too.

Never having to restore to a test or development environment again. When you are working with large databases getting an exact duplicate for functional testing is a huge undertaking, not to mention expense. You still have to buy all the disk space. Before SQLvdb I would restore the database, set it to simple mode, issue a checkpoint and then shrink all the files to the bare minimum. On our largest database this would take eleven hours or more. Now, I copy the backup file and attach the virtual database. Copying takes about an hour an the attachment takes about 20 minutes. This is a 380GB backup file of a 1.2TB database. If I didn’t mind accessing it from a network share it would cut that down to 10 or 20 minutes tops.

Never having to restore the entire database to get a single object from a backup again. Same as above but now I don’t have the boss standing behind me for several hours asking “Is it done restoring yet?”

Doing it all from the command line. This means I can automate a ton of stuff and build processes around SQLvdb.

The Bad

The User Interface

I really don’t like the GUI. It is a far enough departure from what we are familiar with that it it makes it difficult to use. Why split up single file and multiple file attachment? It took me some time to find where to change the configuration options as well.

Do I need a Run Queries option? The whole point of the product is being able to use SSMS or whatever you need to access the database. What if I don’t have SSMS installed on my production server?

Tips & Tricks is also a waste of screen space. Why isn’t this in the help system? It doesn’t point out the more creative uses like offsetting restores to development boxes that I can find.

Can’t use the console remotely. It has the option built in but greyed out. With a locked down server getting remote access may be a problem, not to mention all the help is build into the GUI as well.

The wizard for multiple file attach will warn you in one stage about missing or invalid backup files but doesn’t in the second stage. It will let you attempt to attach the virtual database then fail. You can’t edit the virtual database ether. Your only choice is to drop it and start over.

The Command Line

Almost zero documentation on the command line options. I had to root around and play with the options to figure out that there is some switches that are order dependent.

The GUI allows you to kill all users out of a SQLvdb before dropping it, the CLI doesn’t seem to have that option. This is just an extra step I have to put into my automation scripts and hope I get it right.

Other Issues

You can’t run a DBCC CHECKDB on it. Since DBCC uses sparse files and snapshots to get a consistent look at the data it can’t run on a SQLvdb. SQLvdb is also using sparse files to do some of the things it does. What’s worse is the DBCC command just hangs out and looks like it is running OK. I let it run for a couple of hours before killing it.

Doing anything in Management Studio that requires it to pull a list of databases locks up SSMS while the attach is in progress. If you leave SSMS alone it will come back. If you don’t and keep clicking around it can lock it up or crash it all together. I’ve been told they are working on a fix for this.

I ran into some other odd issues that may be bugs or something wrong with my server I don’t know yet. I am working with Idera to identify the problem, when I do I’ll update this post.

Final Thoughts

I can honestly say that this is a great product. It is everything I had hoped it would be. Aside from the small issues it has been a solid purchase. I don’t think Idera sees all the uses for SQLvdb. Being able to get a table or stored procedure from a backup file is nice, but it is an insurance policy. Being able to offset eleven ours of restores to development every week and cut disk space usage by 70%, well that is money in the bank.

Idera SQL Doctor First Look

I recently saw a tweet about Idera’s new SQL Doctor tool that is currently in beta. This differs from other tools you may think of like Diagnostic Manager. DM and other tools like it gather some of the same information but are geared for real time alerts. This is more like the Best Practices Analyzer. It takes a look at several key points on your SQL Server instance and the OS. It makes recommendations on how to improve any problems it identifies. You may ask, does this do anything I can’t do on my own? No, it doesn’t. If you are doing this in house with your own tool set you may not need this, or any other tool like it. To be honest, you aren’t the target audience for SQL Doctor. This is really designed for shops that ether don’t have enough DBA’s to watch everything, or don’t have a full time DBA at all. That isn’t to say it has no use even in a highly monitored and optimized shop. I picked a horribly abused development box and let SQL Doctor tell me just how bad things were.

Idera_sql_doctor1

The interface is clean and simple. First thing we have to do is pick a server and connect.

Idera_sql_doctor2

Next it starts the interview process. It isn’t extremely in depth, and it isn’t suppose to be.

Idera_sql_doctor3

I would be a little worried if I didn’t know if the server was in production or not. But, the person running SQL Doctor may not be the DBA.

Idera_sql_doctor4

I’m sure they wrestled with this one. How do you describe OLTP workload vs. OLAP workload if the person running the tool doesn’t know if the server is production or not?

Idera_sql_doctor5

That’s pretty much all the questions at all. You can change some things in the settings tab like the databases

Idera_sql_doctor6

I’d say the time estimate can be a bit optimistic. It took over an hour to run on this box. The progress bar is a tad misleading as well since it spent the bulk of that time basically at 99% complete.

Idera_sql_doctor7

Looks scary! Some of these problems are. They do scale them according to problem and push the less likely stuff to the bottom for you. It is a lot of information to look at though. 600 recommendations, wow. Lots of index issues and query issues. Time to dig in.

Idera_sql_doctor8

They do a good job of grouping issues together. Like other tools in this category, sometimes the recommendations aren’t 100%. They throw the disk queue around quite a bit but not disk latency, which is really what we are worried about here. It does pick up some pretty nifty stuff that other tools I have looked at don’t really catch. It warned me about a NIC setting being optimized for file sharing and that it could put memory pressure on the box. It also hit on some other fun stuff at the OS level for memory like lock pages in memory that I bet plenty of folks may not be using. I was pretty happy with the level of OS recommendations over all. I am concerned with some of the recommendations that seem to be out of date. It recommended a file per CPU core on tempdb for example. I would recommend that they get a few more SQL Server folks to look at the recommendations and submit tweaks to them. It also will script the SQL Server changes for you or point you to Microsoft Kb’s to get them fixed up. Again, my worry with this kind of tool is someone is just going to blanket run every recommendation, and some of them like disabling the CLR could be detrimental if that feature is on for a reason. I know its hard to get everyone to read the warnings. Lets be honest though, if they don’t know if it is a production OLTP system they probably don’t know if the CLR is on for a reason ether.

I have to say, for a beta and for a tool of this type, I am impressed. With a little extra work I think it will be a worthy addition to any shop that is lacking in deep SQL Server expertise.

At The End of the IO Road With C#

Previously I’ve written about doing fun IO stuff in C#. I found out that some of my old tricks still worked in C# but….

Now having done a lot of C++ I knew about async IO buffered and un-buffered and could have made unmanaged code calls to open or create the file and pass the handle back, but just like it sounds it is kind of a pain to setup and if you are going down that path you might as well code it all up in C++ anyway.

I was mostly right. I have been working on a file sync tool for managing all my SQL Sever backup files. Naturally, I wanted to be as fast as humanly possible. Wanting that speed and getting it from the CLR are two completely different things. I know how to do asynchronous IO, and with a little trick, you can do un-buffered IO as well. The really crappy part is you can’t do both in the CLR.

From my previous post, you know that SQL Server does asynchronous, un-buffered IO on reads and writes. The CLR allows you to so asynchronous reads with a fun bit of coding and an call back structure. I took this code from one of the best papers on C# and IO: Sequential File Programming Patterns and Performance with .NET I made some minor changes and cleaned up the code a bit.

internal class AsyncFileCopy
    {
        // globals
        private const int Buffers = 8; // number of outstanding requests
        private const int BufferSize = 8*1024*1024; // request size, one megabyte
        public static FileStream Source; // source file stream
        public static FileStream Target; // target file stream
        public static long TotalBytes; // total bytes to process    
        public static long BytesRead; // bytes read so far    
        public static long BytesWritten; // bytes written so far
        public static long Pending; // number of I/O's in flight
        public static Object WriteCountMutex = new Object[0]; // mutex to protect count
        // Array of buffers and async results.  
        public static AsyncRequestState[] Request = new AsyncRequestState[Buffers];

        public static void AsyncBufferedFileCopy(string inputfile, string outputfile)
        {
            Source = new FileStream(inputfile, // open source file
                                    FileMode.Open, // for read
                                    FileAccess.Read, //
                                    FileShare.Read, // allow other readers
                                    BufferSize, // buffer size
                                    FileOptions.Asynchronous); // use async
            Target = new FileStream(outputfile, // create target file
                                    FileMode.Create, // fault if it exists
                                    FileAccess.Write, // will write the file
                                    FileShare.None, // exclusive access
                                    BufferSize, // buffer size
                                    FileOptions.Asynchronous); //unbuffered async
            TotalBytes = Source.Length; // Size of source file
            Target.SetLength(TotalBytes); //Set target file lenght to avoid file growth
            var writeCompleteCallback = new AsyncCallback(WriteCompleteCallback);
            for (int i = 0; i < Buffers; i++) Request[i] = new AsyncRequestState(i);
            // launch initial async reads
            for (int i = 0; i < Buffers; i++)
            {
                // no callback on reads.                     
                Request[i].ReadAsyncResult = Source.BeginRead(Request[i].Buffer, 0, BufferSize, null, i);
                Request[i].ReadLaunched.Set(); // say that read is launched
            }
            // wait for the reads to complete in order, process buffer and then write it. 
            for (int i = 0; (BytesRead < TotalBytes); i = (i + 1)%Buffers)
            {
                Request[i].ReadLaunched.WaitOne(); // wait for flag that says buffer is reading
                int bytes = Source.EndRead(Request[i].ReadAsyncResult); // wait for read complete
                BytesRead += bytes; // process the buffer <your code goes here>
                Target.BeginWrite(Request[i].Buffer, 0, bytes, writeCompleteCallback, i); // write it
            } // end of reader loop
            while (Pending > 0) Thread.Sleep(10); // wait for all the writes to complete                 
            Source.Close();
            Target.Close(); // close the files                     
        }

        // structure to hold IO request buffer and result.

        // end AsyncRequestState declaration
        // Asynchronous Callback completes writes and issues next read
        public static void WriteCompleteCallback(IAsyncResult ar)
        {
            lock (WriteCountMutex)
            {
                // protect the shared variables
                int i = Convert.ToInt32(ar.AsyncState); // get request index
                Target.EndWrite(ar); // mark the write complete
                BytesWritten += BufferSize; // advance bytes written
                Request[i].BufferOffset += Buffers*BufferSize; // stride to next slot 
                if (Request[i].BufferOffset < TotalBytes)
                {
                    // if not all read, issue next read
                    Source.Position = Request[i].BufferOffset; // issue read at that offset
                    Request[i].ReadAsyncResult = Source.BeginRead(Request[i].Buffer, 0, BufferSize, null, i);
                    Request[i].ReadLaunched.Set();
                }
            }
        }

        #region Nested type: AsyncRequestState

        public class AsyncRequestState
        {
            // data that tracks each async request
            public byte[] Buffer; // IO buffer to hold read/write data
            public long BufferOffset; // buffer strides thru file BUFFERS*BUFFER_SIZE
            public IAsyncResult ReadAsyncResult; // handle for read requests to EndRead() on.
            public AutoResetEvent ReadLaunched; // Event signals start of read 

            public AsyncRequestState(int i)
            {
                // constructor    
                BufferOffset = i*BufferSize; // offset in file where buffer reads/writes
                ReadLaunched = new AutoResetEvent(false); // semaphore says reading (not writing)
                Buffer = new byte[BufferSize]; // allocates the buffer
            }
        }

        #endregion
    }

The Fun bit about this code is you don’t need to spawn your own threads to do the work. All of this happens from a single thread call and the async happens in the background. I do make sure and grow the file to prevent dropping back into synchronous mode on file growths.

This next bit is the un-buffered stuff.

internal class UnBufferedFileCopy
{
    public static int CopyBufferSize = 8 * 1024 * 1024;

    public static byte[] Buffer = new byte[CopyBufferSize];

    const FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;

    public static int CopyFileUnbuffered(string inputfile, string outputfile)
    {
        var infile = new FileStream(inputfile,
                                    FileMode.Open, FileAccess.Read, FileShare.None, 8
, FileFlagNoBuffering | FileOptions.SequentialScan);
        var outfile = new FileStream(outputfile, FileMode.Create, FileAccess.Write,
                                     FileShare.None, 8, FileOptions.WriteThrough);

        int bytesRead;
        while ((bytesRead = infile.Read(Buffer, 0, CopyBufferSize)) != 0)
        {
            outfile.Write(Buffer, 0, bytesRead);
        }

        outfile.Close();
        outfile.Dispose();
        infile.Close();
        infile.Dispose();
        return 1;
    }
}

Since this is a synchronous call I’m not worried about extending the file for performance. There is the fragmentation issue to worry about. Without that the code is a bit cleaner. The secret sauce on this one is creating your own file option and passing it in.

const FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;

I hear you asking now, where did this thing come from? Well, that is simple it is a regular flag you can pass in if you are doing things in C or C++ when you create a file handle. I got curious as to what the CLR was actually doing in the background. It has to make a call to the OS at some point and that means unmanaged code.

internal class UnmanagedFileCopy
{
    public static int CopyBufferSize = 8 * 1024 * 1024;

    public static byte[] Buffer = new byte[CopyBufferSize];

    private const int FILE_FLAG_NO_BUFFERING = unchecked(0x20000000);
    private const int FILE_FLAG_OVERLAPPED = unchecked(0x40000000);
    private const int FILE_FLAG_SEQUENTIAL_SCAN = unchecked(0x08000000);
    private const int FILE_FLAG_WRITE_THROUGH = unchecked((int)0x80000000);
    private const int FILE_FLAG_NONE = unchecked(0x00000000);

    public static FileStream infile;
    public static SafeFileHandle inhandle;
    public static FileStream outfile;
    public static SafeFileHandle outhandle;

    [DllImport("KERNEL32", SetLastError = true, CharSet = CharSet.Auto, BestFitMapping = false)]
    private static extern SafeFileHandle CreateFile(String fileName,
                                                    int desiredAccess,
                                                    FileShare shareMode,
                                                    IntPtr securityAttrs,
                                                    FileMode creationDisposition,
                                                    int flagsAndAttributes,
                                                    IntPtr templateFile);

    public static void CopyUnmanaged(string inputfile, string outputfile)
    {
        outhandle = CreateFile(outputfile,
                   (int)FileAccess.Write,
                   (int)FileShare.None,
                   IntPtr.Zero,
                   FileMode.Create,
                   FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH,
                   IntPtr.Zero);

        inhandle = CreateFile(inputfile,
                                  (int)FileAccess.Read,
                                  (int)FileShare.None,
                                  IntPtr.Zero,
                                  FileMode.Open,
                                  FILE_FLAG_NO_BUFFERING | FILE_FLAG_SEQUENTIAL_SCAN,
                                  IntPtr.Zero);

        outfile = new FileStream(outhandle, FileAccess.Write, 8, false);
        infile = new FileStream(inhandle, FileAccess.Read, 8, false);

        int bytesRead;
        while ((bytesRead = infile.Read(Buffer, 0, CopyBufferSize)) != 0)
        {
            outfile.Write(Buffer, 0, bytesRead);
        }

        outfile.Close();
        outfile.Dispose();
        outhandle.Close();
        outhandle.Dispose();
        infile.Close();
        infile.Dispose();
        inhandle.Close();
        inhandle.Dispose();
    }
}

If I was building my own unmanaged calls this would be it. When you profile the managed code for object creates/destroys you see that it is making calls to SafeFileHandle. Being the curious guy I am I did a little more digging. For those of you who don’t know there is an open source implementation of the Common Language Runtime called Mono. That means you can download the source code and take a look at how things are done. Poking around in the FileStream and associated code I saw that had all the file flags in the code but commented out un-buffered… Now I had a mystery on my hands. I tried to implement asynchronous un-buffered IO using all unmanaged code calls and couldn’t do it. There is a fundamental difference between a byte array in the CLR and what I can setup in native C++. One of the things you have to be able to do if you want asynchronous un-buffered IO is to sector align all reads and writes, including in and out of memory buffers. You can’t do it in C#. You have to allocate an unmanaged segment of memory and handle the reads and writes through that buffer. At the end of the day, you have written all the C++ you need to do the file copy stuff and rapped it in a managed code loop.

So, you can do asynchronous OR un-buffered but not both. From Sequential File Programming Patterns and Performance with .NET

the FileStream class does a fine job. Most applications do not need or want un-buffered IO. But, some applications like database systems and file copy utilities want the performance and control un-buffered IO offers.

And that is a real shame, I’d love to write some high performance IO stuff in C#. I settled on doing un-buffered IO since these copies are from a SQL Server which will always be under some kind of memory pressure, to the file server. If I could do both asynchronous and un-buffered I could get close to wire speed, around 105 to 115 megabytes a second. Just doing un-buffered gets me around 80 megabytes per second. Not horrible, but not the best.