Category Archives: Programming

Giving Back, SQLMeetings.com Is Going Live Soon

I’ve been pretty quite since the PASS Summit and with good reason. Every year we have a chapter leader meeting. Every year, there is a laundry list of things that chapters would like PASS HQ to do for them. Time and again I’ve watched other people in the community step up and build something to fill a void in the PASS structure. In the early days SQLServerCentral.com hosted websites for chapters until PASS HQ got the infrastructure in place. We lean on other tools like Google groups or meetup.com to get other things done as well but aren’t controlled by PASS or PASS HQ. It always strikes me as odd that infrastructure related items are always on the list and are always backlogged. We are a professional organization of technology people and deal with stuff like this every single day.

So, I have decided to help out. I’m staring up SQLMeetings.com. The goals are pretty simple to start with.

Provide an easy way to email your user group.
Provide an easy way for your users to RSVP.
Provide an easy way for group leaders to manage lists.
Provide an easy way for group leaders to track RSVP’s.

Provide an easy way to email your user group.
Sounds pretty straight forward. Just fire up outlook and BCC your group the email field. For a long time that is basically what I did. It was a pain to manage email changes, RSVP’s and bounced email. That led me to setup an email list server setup and moved that list there as a read only list. That was better, it provided bounce management but it was still hard to get people on to the list. Luckily, it used a SQL Server back end and I wrote an integration point with our then DotNetNuke website. If you singed up via DNN it automatically added you to the email list. If your email ever bounced you were deactivated from the list. You couldn’t change your email though, or RSVP easily ether. Now that we have moved off DNN I’ve lost the signup integration point and have fallen back to telling people to subscribe using cactuss_meetings@wesworld.net again.

Provide and easy way for users to RSVP.
I got nothing on this one. I’ve tried using meetup.com but it pushes your users to another website from your own and another barrier for them to easily RSVP. Basically, I get emails from people saying they will be there I pad the numbers and add some fudge in and order the food.

Provide an easy way from group leaders to manage lists.
If you have ever used a traditional list server everything is done via email with commands embedded in the body of the mail. It isn’t the slickest of user experiences. I was just editing the list server tables by hand, being a SQL Server expert and all.

Provide an easy way for group leaders to track RSVP’s.
Over time, you like to see how your RSVP’s stack up to actual attendance and watch the growth of your group over time. Again, I did this with a high tech piece of kit, pen and paper.

This is my goal for “1.0” feature sets.
user groups can have a personalized email.
lists are <email>@sqlmeetings.com. For example my local UG would be cactuss@sqlmeetings.com. This account is used to receive, process, and send all emails.
List owners are the only people allowed to email the list for distribution. You can have multiple list owners so one person isn’t stuck sending out the email.
List management is handled two ways. If you want to subscribe via email you send an email with subscribe in the subject to cactuss@sqlmeetings.com and it handles the rest. If you are a list owner you can manage the list via the web. Things like adding users, marking users as list owners and deactivating users is done via web.
To RSVP the only thing you have to do is hit the reply button. The list server tracks what users have replied to what email. As a list owner you can look at response rate via the web site.

The list server part is done. I wrote a windows service that handles processing the emails. The web UI will be done by the end of the week(Keep your fingers crossed). I am horrible at web stuff and have asked a couple of other folks to help out. This first iteration is beta stuff to flesh out functionality.

This is where you come in. Do you need something like this? If you want to use it just drop me an email admin (at) wesworld (dot) net or hit me up on twitter @WesBrownSQL. I need some folks to test out the base functions and start suggesting things for the 2.0 like twitter integration and post to posterous.

Oh, did I mention this is free? It is something I needed for my UG and figured others would like it too.

At The End of the IO Road With C#? Pave New Road!

Not being one for letting a problem get the best of me, I took another look at the asynchronous overlapped IO problem. If you read my last post on the subject, you know I’ve done a lot of work on this already. None of the things I said last time have changed at all. If you want to do asynchronous and un-buffered IO in C# using the native file stream calls you can’t… So, I rolled my own. The kicker is, I don’t use any unmanaged code to do this. No call to VirtualAlloc() or anything else using DLL imports. Oh, and the speed is spectacular.

The Goal

My ultimate goal was to build a routine that would do un-buffered asynchronous IO. That means I don’t want the OS doing any buffering or funny stuff with the IO’s I issue. That goes for reads and writes. SQL Server uses this method to harden writes to the disk and it also performs well with excellent predictability. If you have ever use windows to do a regular copy you will see it eating up memory to buffer both reads and writes. If you copy the same file a couple of times you will notice that the first time it runs in about the speed you expect it, but the second time it may run twice as fast. This is all Windows, buffering as much data and holding on to that buffer. That’s great for smaller files but if you are pushing around multi-gigabyte files it is a disaster. As the system becomes starved for memory it pages then starts throttling back. Your 100MB/sec copy is now crawling along at 20MB/sec.

Where we left off..

I had settled on a simple routine that would allow me to do un-buffered reads from a file and write to a buffered file ether on disk or across the network.

internal class UnBufferedFileCopy
{
	public static int CopyBufferSize = 8 * 1024 * 1024;
	public static byte[] Buffer = new byte[CopyBufferSize];
	const FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;

	public static int CopyFileUnbuffered(string inputfile, string outputfile)
	 {
		var infile = new FileStream(inputfile, FileMode.Open, FileAccess.Read
		, FileShare.None, 8, FileFlagNoBuffering | FileOptions.SequentialScan);
		var outfile = new FileStream(outputfile, FileMode.Create, FileAccess.Write
		, FileShare.None, 8, FileOptions.WriteThrough);

		int bytesRead;
		while ((bytesRead = infile.Read(Buffer, 0, CopyBufferSize)) != 0)
		{
			outfile.Write(Buffer, 0, bytesRead);
		}

		outfile.Close();
		outfile.Dispose();
		infile.Close();
		infile.Dispose();
		return 1;
	}
}

There are two problems with this routine. First off, only the read from source is truly un-buffered. C# offers the write through flag and I thought that would be enough. I fired up process monitor and watched the IO issued on writes and it wasn’t buffer sized requests, it was always broken up into 64k chunks. So, the read request would fetch say 16MB of data and pass that to the write request who would then break that up into chunks. This wasn’t the behavior I was going for! Doing some additional research I found adding the no buffering flag to the write through flag gave me the results I was after, almost. You can’t do un-buffered writes. Synchronous or asynchronous doesn’t matter. To do a un-buffered write the buffer area that you build from the byte array must be page aligned in memory and all calls must return a multiple of the page size. Again, this just isn’t possible in managed code. So, I investigated a horrible kludge of a solution. I do un-buffered writes until I get to the last block of data. Then I close and reopen the file in a buffered mode and write the last block. It isn’t pretty but it works. It also means that I can’t use write through and un-buffered on a file smaller than the buffer size. Not a huge deal but something to be aware of if you are doing a lot of small files. If you are going the small file route the first routine will probably be OK.

internal class UnBufferedFileCopy
{
	public static int CopyBufferSize = 8 * 1024 * 1024;
	public static byte[] Buffer1 = new byte[CopyBufferSize];
	const FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;

	public static int CopyFileUnbuffered(string inputfile, string outputfile)
	{
		var infile = new FileStream(inputfile, FileMode.Open, FileAccess.Read
			, FileShare.None, 8, FileFlagNoBuffering | FileOptions.SequentialScan);
		//open output file set length to prevent growth and file fragmentation and close it.
		//We have to do it this way so we can do unbuffered writes to it later
		outfile = new FileStream(outputfile, FileMode.Create, FileAccess.Write
			, FileShare.None, 8, FileOptions.WriteThrough);
		outfile.SetLength(infilesize);
		outfile.Dispose();

		//open file for write unbuffered
		outfile = new FileStream(outputfile, FileMode.Open, FileAccess.Write
			, FileShare.None, 8, FileOptions.WriteThrough | FileFlagNoBuffering);
		long totalbyteswritten;
		int bytesRead1;
		//hold back one buffer
		while (totalbyteswritten &lt; infilesize - CopyBufferSize)
		{
			bytesRead1 = _infile.Read(Buffer1, 0, CopyBufferSize);
			totalbyteswritten = totalbyteswritten + CopyBufferSize;
			outfile.Write(Buffer1, 0, bytesRead1);
		}

		//close the file handle that was using unbuffered and write through
		outfile.Dispose();

		//open file for write buffered We do this so we can write the tail of the file
		//it is a cludge but hey you get what you get in C#
		outfile = new FileStream(outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
		FileOptions.WriteThrough);

		//go to the right position in the file
		outfile.Seek(infilesize - bytesRead1, 0);
		//flush the last buffer syncronus and buffered.
		outfile.Write(Buffer1, 0, bytesRead1);

		outfile.Dispose();
		infile.Dispose();
		return 1;
	}
}

This is as close to fully un-buffered IO on both the read and write side of things. There is a lot going on, but it is still synchronous all the way. If you look at performance monitor it will show you a saw tooth pattern as you read then write since you are only ever doing one or the other. Using this to copy a file across the LAN to another server never got better than 75MB/Sec throughput. Not horrible but a long way from the 105MB/Sec I get from something like FastCopy or TerraCopy. Heck, it’s not even close to the theoretical 125MB/Sec a gigabit connection could support. That leaves the last piece of the puzzle, going asynchronous.

Threading in C#, To Produce or Consume?

We know that using the asynchronous file IO built into C# isn’t an option. That doesn’t mean we can’t pattern something of our own like it. I’ve done quite a bit of threading in C#. It isn’t as difficult as C/C++ but you can still blow your foot off. It adds a whole other level of complexity to your code. This is where a little thought and design on paper and using a flow chart can help you out quite a bit. Also, it’s good to research design patterns and multi-threading. A lot of smart people have tackled these problems and have developed well designed solutions. Our particular problem is a classic producer consumer pattern, a simple one at that. We have a producer, the read thread, putting data in a buffer. We have a consumer, the write thread, that takes that data and writes it to disk. My first priority is to model this as simply as possible. I’m not worried with multiple readers or writers. I am concerned with locking and blocking. Keeping the time something has to be locked to a minimum is going to be key. That lead me to a simple solution. One read thread and the buffer it reads into, one write thread and the buffer it reads from and one intermediate buffer to pass data between them. Basically, an overlap buffer that is the same size as the read and write buffer. To give you a better visual example before showing you the code here are a couple of flow charts.

Read File
http://www.lucidchart.com/documents/view/4cac057f-d81c-472e-9764-52c00afcbe04

Write File
http://www.lucidchart.com/documents/view/4cac0726-dd14-46a6-8d44-53710afcbe04

There a few of things you need to be aware of. There is no guarantee of order on thread execution. That is why I’m using a lock object and a semaphore flag to let me know if the buffer is actually available to be written or read from. Keep the lock scope small. The lock can be a bottleneck and basically drop you back into a synchronous mode. Watch for deadlocks. With the lock and the semaphore flag in play if your ordering is wrong you can get into a deadlock between the two threads where they just sit and spin waiting for ether the lock or the flag to clear. At this point I’m confident I don’t have any race or deadlocking situations.

Here is a simplified sample, I’m serious this is as small a sample as I could code up.

internal class AsyncUnbuffCopy
{
	//file names
	private static string _inputfile;
	private static string _outputfile;
	//syncronization object
	private static readonly object Locker1 = new object();
	//buffer size
	public static int CopyBufferSize;
	private static long _infilesize;
	//buffer read
	public static byte[] Buffer1;
	private static int _bytesRead1;
	//buffer overlap
	public static byte[] Buffer2;
	private static bool _buffer2Dirty;
	private static int _bytesRead2;
	//buffer write
	public static byte[] Buffer3;
	//total bytes read
	private static long _totalbytesread;
	private static long _totalbyteswritten;
	//filestreams
	private static FileStream _infile;
	private static FileStream _outfile;
	//secret sauce for unbuffered IO
	const FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;

	private static void AsyncReadFile()
	{
		//open input file
		_infile = new FileStream(_inputfile, FileMode.Open, FileAccess.Read, FileShare.None, CopyBufferSize,
		FileFlagNoBuffering);
		//if we have data read it
		while (_totalbytesread &lt; _infilesize)
		{
			_bytesRead1 = _infile.Read(Buffer1, 0, CopyBufferSize);
			lock (Locker1)
			{
				while (_buffer2Dirty)Monitor.Wait(Locker1);
				Buffer.BlockCopy(Buffer1, 0, Buffer2, 0, _bytesRead1);
				_buffer2Dirty = true;
				Monitor.PulseAll(Locker1);
				_bytesRead2 = _bytesRead1;
				_totalbytesread = _totalbytesread + _bytesRead1;
			}
		}
		//clean up open handle
		_infile.Close();
		_infile.Dispose();
	}

	private static void AsyncWriteFile()
	{
		//open output file set length to prevent growth and file fragmentation and close it.
		//We have to do it this way so we can do unbuffered writes to it later
		_outfile = new FileStream(_outputfile, FileMode.Create, FileAccess.Write, FileShare.None, 8,
		FileOptions.WriteThrough);
		_outfile.SetLength(_infilesize);
		_outfile.Close();
		_outfile.Dispose();
		//open file for write unbuffered
		_outfile = new FileStream(_outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
		FileOptions.WriteThrough | FileFlagNoBuffering);
		while (_totalbyteswritten &lt; _infilesize - CopyBufferSize)
		{
			lock (Locker1)
			{
				while (!_buffer2Dirty) Monitor.Wait(Locker1);

				Buffer.BlockCopy(Buffer2, 0, Buffer3, 0, _bytesRead2);
				_buffer2Dirty = false;
				Monitor.PulseAll(Locker1);
				_totalbyteswritten = _totalbyteswritten + CopyBufferSize;
			}
			_outfile.Write(Buffer3, 0, CopyBufferSize);
		}
		//close the file handle that was using unbuffered and write through
		_outfile.Close();
		_outfile.Dispose();
		lock (Locker1)
		{
			while (!_buffer2Dirty) Monitor.Wait(Locker1);
			//open file for write buffered We do this so we can write the tail of the file
			//it is a cludge but hey you get what you get in C#
			_outfile = new FileStream(_outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
			FileOptions.WriteThrough);
			//this should always be true but I haven't run all the edge cases yet
			if (_buffer2Dirty)
			{
				//go to the right position in the file
				_outfile.Seek(_infilesize - _bytesRead2, 0);
				//flush the last buffer syncronus and buffered.
				_outfile.Write(Buffer2, 0, _bytesRead2);
			}
		}
		//close the file handle that was using unbuffered and write through
		_outfile.Close();
		_outfile.Dispose();
	}
	
	public static int AsyncCopyFileUnbuffered(string inputfile, string outputfile, int buffersize)
	{
		//set file name globals
		_inputfile = inputfile;
		_outputfile = outputfile;
		//setup single buffer size, remember this will be x3.
		CopyBufferSize = buffersize * 1024 * 1024;
		//buffer read
		Buffer1 = new byte[CopyBufferSize];
		//buffer overlap
		Buffer2 = new byte[CopyBufferSize];
		//buffer write
		Buffer3 = new byte[CopyBufferSize];
		//get input file size for later use
		var f = new FileInfo(_inputfile);
		long s1 = f.Length;
		_infilesize = s1;

		//create read thread and start it.
		var readfile = new Thread(AsyncReadFile) { Name = &quot;ReadThread&quot;, IsBackground = true };
		readfile.Start();

		//create write thread and start it.
		var writefile = new Thread(AsyncWriteFile) { Name = &quot;WriteThread&quot;, IsBackground = true };
		writefile.Start();

		//wait for threads to finish
		readfile.Join();
		writefile.Join();
		Console.WriteLine();
		return 1;
	}
}

As you can see, we have gotten progressively more complex with each pass until we have finally arrived at my goal. With zero unmanaged code and only one undocumented flag I’ve built a C# program that actually does fast IO like the low level big boys. To handle the small file issue I just drop back to my old copy routine to move these files along. You can see a working sample at http://github.com/SQLServerIO/UBCopy It also has an MD5 verification built in as well.

So, how well does it work?

FastCopy 1.99r4
TotalRead = 1493.6 MB
TotalWrite = 1493.6 MB
TotalFiles = 1 (0)
TotalTime= 15.25 sec
TransRate= 97.94 MB/s
FileRate = 0.07 files/s

UBCopy 1.5.2.1851 — Managed Code
File Copy Started
%100
File Copy Done
File Size MB : 1493.62
Elapsed Seconds : 15.26
Megabytes/sec : 102.63
Done.

I think it will due just fine.

At The End of the IO Road With C#

Previously I’ve written about doing fun IO stuff in C#. I found out that some of my old tricks still worked in C# but….

Now having done a lot of C++ I knew about async IO buffered and un-buffered and could have made unmanaged code calls to open or create the file and pass the handle back, but just like it sounds it is kind of a pain to setup and if you are going down that path you might as well code it all up in C++ anyway.

I was mostly right. I have been working on a file sync tool for managing all my SQL Sever backup files. Naturally, I wanted to be as fast as humanly possible. Wanting that speed and getting it from the CLR are two completely different things. I know how to do asynchronous IO, and with a little trick, you can do un-buffered IO as well. The really crappy part is you can’t do both in the CLR.

From my previous post, you know that SQL Server does asynchronous, un-buffered IO on reads and writes. The CLR allows you to so asynchronous reads with a fun bit of coding and an call back structure. I took this code from one of the best papers on C# and IO: Sequential File Programming Patterns and Performance with .NET I made some minor changes and cleaned up the code a bit.

internal class AsyncFileCopy
    {
        // globals
        private const int Buffers = 8; // number of outstanding requests
        private const int BufferSize = 8*1024*1024; // request size, one megabyte
        public static FileStream Source; // source file stream
        public static FileStream Target; // target file stream
        public static long TotalBytes; // total bytes to process    
        public static long BytesRead; // bytes read so far    
        public static long BytesWritten; // bytes written so far
        public static long Pending; // number of I/O's in flight
        public static Object WriteCountMutex = new Object[0]; // mutex to protect count
        // Array of buffers and async results.  
        public static AsyncRequestState[] Request = new AsyncRequestState[Buffers];

        public static void AsyncBufferedFileCopy(string inputfile, string outputfile)
        {
            Source = new FileStream(inputfile, // open source file
                                    FileMode.Open, // for read
                                    FileAccess.Read, //
                                    FileShare.Read, // allow other readers
                                    BufferSize, // buffer size
                                    FileOptions.Asynchronous); // use async
            Target = new FileStream(outputfile, // create target file
                                    FileMode.Create, // fault if it exists
                                    FileAccess.Write, // will write the file
                                    FileShare.None, // exclusive access
                                    BufferSize, // buffer size
                                    FileOptions.Asynchronous); //unbuffered async
            TotalBytes = Source.Length; // Size of source file
            Target.SetLength(TotalBytes); //Set target file lenght to avoid file growth
            var writeCompleteCallback = new AsyncCallback(WriteCompleteCallback);
            for (int i = 0; i < Buffers; i++) Request[i] = new AsyncRequestState(i);
            // launch initial async reads
            for (int i = 0; i < Buffers; i++)
            {
                // no callback on reads.                     
                Request[i].ReadAsyncResult = Source.BeginRead(Request[i].Buffer, 0, BufferSize, null, i);
                Request[i].ReadLaunched.Set(); // say that read is launched
            }
            // wait for the reads to complete in order, process buffer and then write it. 
            for (int i = 0; (BytesRead < TotalBytes); i = (i + 1)%Buffers)
            {
                Request[i].ReadLaunched.WaitOne(); // wait for flag that says buffer is reading
                int bytes = Source.EndRead(Request[i].ReadAsyncResult); // wait for read complete
                BytesRead += bytes; // process the buffer <your code goes here>
                Target.BeginWrite(Request[i].Buffer, 0, bytes, writeCompleteCallback, i); // write it
            } // end of reader loop
            while (Pending > 0) Thread.Sleep(10); // wait for all the writes to complete                 
            Source.Close();
            Target.Close(); // close the files                     
        }

        // structure to hold IO request buffer and result.

        // end AsyncRequestState declaration
        // Asynchronous Callback completes writes and issues next read
        public static void WriteCompleteCallback(IAsyncResult ar)
        {
            lock (WriteCountMutex)
            {
                // protect the shared variables
                int i = Convert.ToInt32(ar.AsyncState); // get request index
                Target.EndWrite(ar); // mark the write complete
                BytesWritten += BufferSize; // advance bytes written
                Request[i].BufferOffset += Buffers*BufferSize; // stride to next slot 
                if (Request[i].BufferOffset < TotalBytes)
                {
                    // if not all read, issue next read
                    Source.Position = Request[i].BufferOffset; // issue read at that offset
                    Request[i].ReadAsyncResult = Source.BeginRead(Request[i].Buffer, 0, BufferSize, null, i);
                    Request[i].ReadLaunched.Set();
                }
            }
        }

        #region Nested type: AsyncRequestState

        public class AsyncRequestState
        {
            // data that tracks each async request
            public byte[] Buffer; // IO buffer to hold read/write data
            public long BufferOffset; // buffer strides thru file BUFFERS*BUFFER_SIZE
            public IAsyncResult ReadAsyncResult; // handle for read requests to EndRead() on.
            public AutoResetEvent ReadLaunched; // Event signals start of read 

            public AsyncRequestState(int i)
            {
                // constructor    
                BufferOffset = i*BufferSize; // offset in file where buffer reads/writes
                ReadLaunched = new AutoResetEvent(false); // semaphore says reading (not writing)
                Buffer = new byte[BufferSize]; // allocates the buffer
            }
        }

        #endregion
    }

The Fun bit about this code is you don’t need to spawn your own threads to do the work. All of this happens from a single thread call and the async happens in the background. I do make sure and grow the file to prevent dropping back into synchronous mode on file growths.

This next bit is the un-buffered stuff.

internal class UnBufferedFileCopy
{
    public static int CopyBufferSize = 8 * 1024 * 1024;

    public static byte[] Buffer = new byte[CopyBufferSize];

    const FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;

    public static int CopyFileUnbuffered(string inputfile, string outputfile)
    {
        var infile = new FileStream(inputfile,
                                    FileMode.Open, FileAccess.Read, FileShare.None, 8
, FileFlagNoBuffering | FileOptions.SequentialScan);
        var outfile = new FileStream(outputfile, FileMode.Create, FileAccess.Write,
                                     FileShare.None, 8, FileOptions.WriteThrough);

        int bytesRead;
        while ((bytesRead = infile.Read(Buffer, 0, CopyBufferSize)) != 0)
        {
            outfile.Write(Buffer, 0, bytesRead);
        }

        outfile.Close();
        outfile.Dispose();
        infile.Close();
        infile.Dispose();
        return 1;
    }
}

Since this is a synchronous call I’m not worried about extending the file for performance. There is the fragmentation issue to worry about. Without that the code is a bit cleaner. The secret sauce on this one is creating your own file option and passing it in.

const FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;

I hear you asking now, where did this thing come from? Well, that is simple it is a regular flag you can pass in if you are doing things in C or C++ when you create a file handle. I got curious as to what the CLR was actually doing in the background. It has to make a call to the OS at some point and that means unmanaged code.

internal class UnmanagedFileCopy
{
    public static int CopyBufferSize = 8 * 1024 * 1024;

    public static byte[] Buffer = new byte[CopyBufferSize];

    private const int FILE_FLAG_NO_BUFFERING = unchecked(0x20000000);
    private const int FILE_FLAG_OVERLAPPED = unchecked(0x40000000);
    private const int FILE_FLAG_SEQUENTIAL_SCAN = unchecked(0x08000000);
    private const int FILE_FLAG_WRITE_THROUGH = unchecked((int)0x80000000);
    private const int FILE_FLAG_NONE = unchecked(0x00000000);

    public static FileStream infile;
    public static SafeFileHandle inhandle;
    public static FileStream outfile;
    public static SafeFileHandle outhandle;

    [DllImport("KERNEL32", SetLastError = true, CharSet = CharSet.Auto, BestFitMapping = false)]
    private static extern SafeFileHandle CreateFile(String fileName,
                                                    int desiredAccess,
                                                    FileShare shareMode,
                                                    IntPtr securityAttrs,
                                                    FileMode creationDisposition,
                                                    int flagsAndAttributes,
                                                    IntPtr templateFile);

    public static void CopyUnmanaged(string inputfile, string outputfile)
    {
        outhandle = CreateFile(outputfile,
                   (int)FileAccess.Write,
                   (int)FileShare.None,
                   IntPtr.Zero,
                   FileMode.Create,
                   FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH,
                   IntPtr.Zero);

        inhandle = CreateFile(inputfile,
                                  (int)FileAccess.Read,
                                  (int)FileShare.None,
                                  IntPtr.Zero,
                                  FileMode.Open,
                                  FILE_FLAG_NO_BUFFERING | FILE_FLAG_SEQUENTIAL_SCAN,
                                  IntPtr.Zero);

        outfile = new FileStream(outhandle, FileAccess.Write, 8, false);
        infile = new FileStream(inhandle, FileAccess.Read, 8, false);

        int bytesRead;
        while ((bytesRead = infile.Read(Buffer, 0, CopyBufferSize)) != 0)
        {
            outfile.Write(Buffer, 0, bytesRead);
        }

        outfile.Close();
        outfile.Dispose();
        outhandle.Close();
        outhandle.Dispose();
        infile.Close();
        infile.Dispose();
        inhandle.Close();
        inhandle.Dispose();
    }
}

If I was building my own unmanaged calls this would be it. When you profile the managed code for object creates/destroys you see that it is making calls to SafeFileHandle. Being the curious guy I am I did a little more digging. For those of you who don’t know there is an open source implementation of the Common Language Runtime called Mono. That means you can download the source code and take a look at how things are done. Poking around in the FileStream and associated code I saw that had all the file flags in the code but commented out un-buffered… Now I had a mystery on my hands. I tried to implement asynchronous un-buffered IO using all unmanaged code calls and couldn’t do it. There is a fundamental difference between a byte array in the CLR and what I can setup in native C++. One of the things you have to be able to do if you want asynchronous un-buffered IO is to sector align all reads and writes, including in and out of memory buffers. You can’t do it in C#. You have to allocate an unmanaged segment of memory and handle the reads and writes through that buffer. At the end of the day, you have written all the C++ you need to do the file copy stuff and rapped it in a managed code loop.

So, you can do asynchronous OR un-buffered but not both. From Sequential File Programming Patterns and Performance with .NET

the FileStream class does a fine job. Most applications do not need or want un-buffered IO. But, some applications like database systems and file copy utilities want the performance and control un-buffered IO offers.

And that is a real shame, I’d love to write some high performance IO stuff in C#. I settled on doing un-buffered IO since these copies are from a SQL Server which will always be under some kind of memory pressure, to the file server. If I could do both asynchronous and un-buffered I could get close to wire speed, around 105 to 115 megabytes a second. Just doing un-buffered gets me around 80 megabytes per second. Not horrible, but not the best.

What happens when Windows is the step-child? Adventures in Ruby on Rails.

Like many of you I’ve heard the developer community going on about Rails for quite a while now. It wasn’t until recently I had any reason to dip into that world. Over at Nitosphere the website is all run on Rails. We got a inexpensive web host and it was pretty easy to get it up and running. Like most shared web host, it is all linux/open source based. We have now grown to the point that hosting our own server would be cheap enough and give us complete control over the box. As a Microsoft ISV I thought it would be nice to have our new box be a Windows box. It would also be nice to hook in to SQL Server instead of MySQL as well. After a little digging I did find that Microsoft is sponsoring IronRuby, a Ruby clone that runs on the .net platform. Unfortunately, it isn’t completely compatible with one of the packages that we need to run the website on. So, back to Ruby. There is also a gem to run Rails apps against SQL Server, It isn’t compatible with some of the stuff on our website ether. Finally, I fell back to ODBC to connect to SQL Server. Everything wired up but there was still an incompatibility issue. I’ll keep trying to work it out but our fall back was MySQL.

You will need:

Source Control:

If you plan on getting the source for anything you will need ether GIT or Subversion.

GIT for windows:

http://code.google.com/p/msysgit/

http://msysgit.googlecode.com/files/Git-1.7.0.2-preview20100309.exe

Subversion clients:

http://www.sliksvn.com/en/download/ basic client 32bit or 64bit

some folks prefer tortoisesvn

http://tortoisesvn.tigris.org/ 

 

Ruby core:

http://rubyinstaller.org/

rubyinstaller-1.8.7-p249-rc2.exe

Ruby 1.8.7 has the most compatibility with existing gems. Get the latest installer if the one linked isn’t it. There are installers for 1.8.6 and 1.9.1. Again, check to see if they will support the gems you will need to get your site up and running!

http://rubyforge.org/frs/download.php/66888/devkit-3.4.5r3-20091110.7z

install the dev kit if you would like to compile some gems instead of manually downloading them. If you don’t install the devkit some gems will fail to install since they can’t compile to a native extension. To get around that you can also use –platform=mswin32 when you install a gem.

Example: gem install fastercsv –platform=mswin32 will fetch the precompiled windows gem if it exists. for more information on the devkit check out this link. http://www.akitaonrails.com/2008/7/26/still-playing-with-ruby-on-windows

 

Databases:

Sqlite

http://www.sqlite.org/download.html

http://www.sqlite.org/sqlite-3_6_23.zip

http://www.sqlite.org/sqlitedll-3_6_23.zip

By default when you create an application Ruby on Rails defaults to sqlite3. If you want to use Sqlite3 you will need to download the dll’s and the command line executable.

MySQL

http://www.mysql.com/downloads/mysql/5.1.html 

Since I couldn’t get our site to talk to SQL Server we are staying on MySQL for now. You can use the latest installer 64 bit or 32 bit but you must have the 32 bit library for Ruby to work properly. The libmySQL.dll from the 5.0.15 install did the trick for me.

http://downloads.mysql.com/archives.php?p=mysql-5.0&v=5.0.15 

SQL Server

If you would like to try the SQL Server adapter just do a gem install activerecord-sqlserver-adapter to get it.

 

HTTP Ruby Server:

I went with mongrel, it may not be the best but it was pretty easy to setup and get up and running. I am running version 1.1.5 right now since that seems to work best with mongrel_service which you will need if you don’t want to stay logged into your web server with a dos prompt running. I opted for the latest beta of mongrel_service since it cut out some dependencies and seems pretty stable at the moment. As always adding –pre gets the latest beta gem. Also, –include-dependencies will grab everything the gem will need to run, including mongrel.

 

Gems specific to my install:

Substruct uses rmagick for thumbnail generation, which requires image magic to do the actual work.

http://www.imagemagick.org/script/binary-releases.php?ImageMagick=dv31jd0gev1d3lk182a4pma8i6#windows

http://www.imagemagick.org/download/binaries/ImageMagick-6.6.0-7-Q8-windows-dll.exe

Redcloth is a textile markup language for Ruby. If you didn’t install the devkit don’t for get to add –platform=mswin32 to your gem install commands.

http://rubyforge.org/frs/?group_id=216&release_id=36337

 

My installation steps:

Install Ruby

Make sure c:rubybin (or where you installed it to) is in the path. I recommend a path with no spaces so no c:program files.

extract the devkit to the c:ruby directory.

extract the sqlite exe and dlls to c:rubybin

extract libmySQL.dll from the 32 bit 5.0.15 archive

Open an command prompt with administrator privileges.

Issue these commands:

gem update –system

gem install rails –no-ri –no-rdoc

gem install sqlite3-ruby –no-ri –no-rdoc

gem install mysql –no-ri –no-rdoc

gem install mongrel_service –no-ri –no-rdoc –platform mswin32 –include-dependencies –pre

 

After that install any gems you need for your Rails app. Make sure and test that your app works in production mode with mongrel before anything else. There will be some kinks to work out I’m sure. Once you are happy that everything is running as expected you can install your mongrel service.

mongrel_rails service::install -N MyAppsServiceName -c c:appmyapp -p 3000 -e production

The –N is the service name. –c is where the app will be served from. –p is the port number that it will listen on. –e is the mode it will run in like development or production. I chose a few high ports 3000 to 3008 for my services to run in.

You can always remove a service if something is wrong.

mongrel_rails service::remove -N MyAppsServiceName

 

Setting up IIS7:

Install the application request routing 2.0 and URL Rewrite plug-ins using the Web Platform Installer

http://www.microsoft.com/web/Downloads/platform.aspx

Once that is done you will need to create a new web farm.

 create farm

Next you will need to add at least one server entry. You may want to edit your host file and add additional aliases to your IP Address so you can run multiple copies of mongrel to service all request. The recommendation is one per cpu/core.

add server

The last step the wizard ask if you want to add the routing rules. The answer is yes.

add rules

You can confirm the routing rules are in place.

edit inbound rule

Make sure you have a website in IIS running and listening on port 80. Without this there is nothing for IIS to route to your new server farm.

 

If you have any questions post them up. I’m not a Rails expert but I have just been through the pain of Rails on Windows!

Sometimes, you have to fix it yourself

The Problem

SQL Server is a huge product with lots of moving parts. Bugs happen. Microsoft has a place to voice your issues or problems. They allow you to vote on the issue and then decide when or if it will get fixed. I’ve used Connect when I hit a bug and I have voted on items that were important to me. Recently I hit a bug in sp_createstats. I use this system stored procedure generate statistics in an automated process I’ve got that manages statistics. I added a new vendor database to the system and on the first run hit “Column ‘DAYSOPEN’ in table ‘dbo.TBL_OPPORTUNITY’ cannot be used in an index or statistics or as a partition key because it is non-deterministic.”. Well, we all know you can’t create stats on a computed column! I quickly went to the connect site and someone else had already entered it. The down side was it had so few votes it was only slated to go into the next cumulative update/service pack. When I hit this issue they hadn’t yet announced service pack 4. I already had this procedure coded into my routines and really didn’t want to rewrite them to get past this one problem.

The Solution

!!WARNING!!

By doing what I am about to describe could break at a later date or randomly kill baby kittens.

Since it is a system stored procedure I am loathe to make any changes to it directly. There are ways to modify some system stored procedures but they involve the installation CD and creativity. With that door closed there was only one avenue open to me. Create my own system stored procedure with the fix in it. There is a problem with this solution as well, if it gets dropped due to a service pack or an upgrade anything calling it will break. The first thing I did was to see if the procedure text was available by executing sp_helptext sp_createstats. Luckily it was! Now all I had to do was figure out where it was broken. The procedure is pretty simple and uses some cursors to loop through all the objects and create column statistics where they don’t exist.

declare ms_crs_cnames cursor local for select c.name from sys.columns c  
     where c.object_id = @table_id  
     and (type_name(c.system_type_id) not in ('xml'))  
     and c.name not in (select col_name from #colpostab where col_pos = 1)  
     and ((c.name in (select col_name from #colpostab)) or (@indexonly <> 'INDEXONLY'))
    -- populate temporary table of all (column, index position) tuples for this table  

It was pretty easy to spot. The weren’t checking to see if the column was computed so I added a line to the where clause.

and c.is_computed = 0

That’s it. One little check to see if it is a computed column. Now that I had fixed it I created a new procedure named sp_createstats_fixed in the master database. Just creating it in master doesn’t make it act like the original procedure or make it a system stored procedure. For that I had to execute EXECUTE sp_MS_marksystemobject ‘sp_createstats_fix’. This is an undocumented stored procedure and could change or go way any time. The only way to unmark it in SQL Server 2005 is to drop the procedure and recreate it. Now it acts just like the old procedure. Next I had to replace all references to the old proc with the new one. I made an entry into our bug tracking system about the change so we would have a record of what I did and why.

Conclusions

This wasn’t the most elegant solution. It could break later. The upside is it only took me about 30 minutes to fix and deploy versus the hours of re-coding and then testing that I would have had to do before. Do I think you should go around creating your own system stored procedures? Not at all. I don’t recommend you put anything in the master database period. If the problem had been more complex I would have redone the original routines to exclude the broken procedure. This time it just happened to be a very quick fix to a non-critical part of our system.