Category Archives: IO

SQLSaturday #63, Great Event!

So,

I actually had a early morning sessions and gave my Solid State Storage talk and had a great time. The audience was awesome asked very smart questions and I didn’t run over time. The guys and gals here in Dallas have put on another great event and it isn’t even lunch time yet!

As promised here is the slide deck from todays session. As always if you have any questions please drop me a line.

Solid State Storage Deep Dive

Changing Directions

I See Dead Tech….

Knowing when a technology is dying is always a good skill to have. Like most of my generation we weren’t the first on the computer scene but lived through several of it’s more painful transitions. As a college student I was forced to learn antiquated technologies and languages. I had to take a semester of COBOL. I also had to take two years of assembler for the IBM 390 mainframe and another year of assembler for the x86 focused on the i386 when the Pentium was already on the market. Again and again I’ve been forced to invest time in dying technologies. Well not any more!

Hard drives are dead LONG LIVE SOLID STATE!

I set the data on a delicate rinse cycle

I’m done with spinning disks. Since IBM invented them in nineteen and fifty seven they haven’t improved much over the years. They got smaller and faster yes but they never got sexier than the original. I mean, my mom was born in the fifties, I don’t want to be associated with something that old and way uncool. Wouldn’t you much rather have something at least invented in the modern age in your state of the art server?

Don’t you want the new hotness?

I mean seriously, isn’t this much cooler? I’m not building any new servers or desktop systems unless they are sporting flash drives. But don’t think this will last. You must stay vigilant, NAND flash won’t age like a fine wine ether. There will be something new in a few years and you must be willing to spend whatever it takes to deploy the “solid state killer” when it comes out.

Tell Gandpa Relational is Soooo last century

The relational model was developed by Dr. EF Codd while at IBM in 1970, two years before I was born. Using some fancy math called tuple calculus he proved that the relational model was better at seeking data on these new “hard drives” that IBM had laying around. That later tuned into relational algebra that is used today. Holy cow! I hated algebra AND calculus in high school why would I want to work with that crap now?

NoSQL Is The Future!

PhD’s, all neck ties and crazy gray hair.

Internet Scale, web 2.0 has a much better haircut.

In this new fast paced world of web 2.0 and databases that have to go all the way to Internet scale, the old crusty relational databases just can’t hang. Enter, NoSQL! I know that NoSQL covers a lot of different technologies, but some of the core things they do very well is scale up to millions of users and I need to scale that high. They do this by side stepping things like relationships, transactions and verified writes to disk. This makes them blazingly fast! Plus, I don’t have to learn any SQL languages, I can stay with what I love best javascript and JSON. Personally, I think MongoDB is the best of the bunch they don’t have a ton of fancy PhD’s, they are getting it done in the real world! Hey, they have a Success Engineer for crying out loud!!! Plus if you are using Ruby, Python, Erlang or any other real Web 2.0 language it just works out of the box. Don’t flame me about your NoSQL solution and why it is better, I just don’t care. I’m gearing up to hit all the major NoSQL conferences this year and canceling all my SQL Server related stuff. So long PASS Summit, no more hanging out with people obsessed with outdated skills.

Head in the CLOUD

Racks and Racks of Spaghetti photo by: Andrew McKaskill

Do you want this to manage?

Or this?

With all that said, I probably won’t be building to many more servers anyway. There is a new way of getting your data and servers without the hassle of buying hardware and securing it, THE CLOUD!

“Cloud computing is computation, software, data access, and storage services that do not require end-user knowledge of the physical location and configuration of the system that delivers the services. Parallels to this concept can be drawn with the electricity grid where end-users consume power resources without any necessary understanding of the component devices in the grid required to provide the service.” http://en.wikipedia.org/wiki/Cloud_computing

Now that’s what I’m talking about! I just plug in my code and out comes money. I don’t need to know how it all works on the back end. I’m all about convenient, on-demand network access to a shared pool of configurable computing resources. You know, kind of like when I was at college and sent my program to a sysadmin to get a time slice on the mainframe. I don’t need to know the details just run my program. Heck, I can even have a private cloud connected to other public and private clouds to make up The Intercloud(tm). Now that is sexy!

To my new ends I will be closing this blog and starting up NoSQLServerNoIOTheCloud.com to document my new jersey, I’ll only be posting once a year though, on April 1st.

See you next year!

Fast File Copy With Managed Code: UBCopy update!

If you have been following my trials with working with C# and files you know it hasn’t been a bed of roses. I ran into a roadblock when I was trying to build a high performance file copy tool in the CLR. I eventually found a solution. It works but it isn’t very clean. So, I did a little more work, removed a broken bit and simplified another bit to make it a little more stable. I’m not done yet. I know there are a couple of places I can clean the code up a bit more and a way to speed things up a bit more at the cost of using more memory, but that is for another blog post.

Logging, to the rescue

The first major change was adding a logging facility to the UBCopy program via log4net. I had stayed away from adding it fearing to much overhead but after some tweaking it seems pretty darn fast. This allowed me to do some tight timings and log debug information to a log file to track long running copies without much fuss. It also allows me to turn off all console messaging since this is used in an automated fashion if something fails I’ve got the log file to look at.

Please don’t lock my file

There are a few places where the file target file being written to could potentially be locked. Right now I have to create the file and set the file length then close it before opening it for unbuffered writes. I haven’t done anything about this grab yet. The second one is when I am writing the last block to the file. Since you have to write in page aligned multiples I was closing the file, reopening it in a buffered mode and flushing the last partial buffer to disk. I haven’t fixed these up yet but I am working on a solution. I haven’t encountered a problem yet where the file gets locked during this process but it is still a potential threat that must be dealt with. Even though the file stream is seekable and writeable you can’t issue a setlength command on it. I plan on looking deeper into the filestream code and see what they are doing when you issue a setlength. If I can eliminate the open,close and open routine it will speed things up a bit and keep anyone from grabbing my file handle.
Here’s and example of what I’m talking about, after the set file size is done there is a very small window of opportunity here.

//open output file set length to prevent growth and file fragmentation and close it.
            //We do this to prevent file fragmentation and make the write as fast as possible.
            try
            {
                _outfile = new FileStream(_outputfile, FileMode.Create, FileAccess.Write, FileShare.None, 8,
                                          FileOptions.WriteThrough);
                //set file size to minimum of one buffer to cut down on fragmentation
                _outfile.SetLength(_infilesize > CopyBufferSize ? _infilesize : CopyBufferSize);
                _outfile.Close();
                _outfile.Dispose();
            }
            catch (Exception e)
            {
                throw;
            }
            //open file for write unbuffered
            try
            {
                _outfile = new FileStream(_outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
                                          FileOptions.WriteThrough | FileFlagNoBuffering);

            }
            catch (Exception e)
            {
                throw;
            }

Unbuffered, mostly

Since unbuffered writes prevent you from writing data that isn’t memory page aligned (usually 4096), you get an error when writing to the file. To get around this all writes are the same size period, this really only effects the last write which may have a partially full buffer. It will write down the full buffer, including invalid data. Luckily, an easy fix is to reset the end of file pointer back to the correct location and everything is happy again. This allows me to eliminate an additional bit of code. here is the old code.

//open file for write buffered We do this so we can write the tail of the file
//it is a cludge but hey you get what you get in C#
outfile = new FileStream(outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
             FileOptions.WriteThrough);
//go to the right position in the file
outfile.Seek(infilesize - bytesRead1, 0);
//flush the last buffer syncronus and buffered.
outfile.Write(Buffer1, 0, bytesRead1);

And now the new, faster and better way!

//close the file handle that was using unbuffered and write through and move the EOF pointer.
Log.Debug("Close Write File Unbuffered");
_outfile.Close();
_outfile.Dispose();

try
{
    if (IsDebugEnabled)
    {
        Log.Debug("Open File Set Length");
    }
    _outfile = new FileStream(_outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
                                FileOptions.WriteThrough);
    _outfile.SetLength(_infilesize);
    _outfile.Close();
    _outfile.Dispose();
}
catch (Exception e)
{
    if (IsDebugEnabled)
    {
        Log.Debug("Failed to open for write set length");
        Log.Debug(e);
    }
    throw;
}

Buffered or Unbuffered

If the file was smaller than the size of the buffer I was just copying the file buffered. This was a check and route to a diffrent code path for copying files. Not a huge deal but just another place for problems to creep up over time. So, I removed it when I changed the way unbuffered multi-threaded writes were happening I was able to get rid of the buffered copy routine all together. Now, it will initially set the file size to a single buffer. Write the data and reset the file pointer to the original size.

You want a MOVE but get a COPY

There was also a bug that on small files UBCopy wouldn’t do a move just a copy in some cases. There was an error in the code that prevented the delete of the source file after the copy to the new location due to a lock that was being held by the program. Not the end of the world, just wasn’t working like it should 100% of the time.

So, here is the new AsyncUnbuffCopy routine. Calling it is super easy. If you want to use the code as-is you will need to download log4net and include it in your project.

//
// AsyncUnbuffCopy.cs
//
// Authors:
//  Wesley D. Brown <wes@planetarydb.com>
//
// Copyright (C) 2010 SQLServerIO (http://www.SQLServerIO.com)
//
// Permission is hereby granted, free of charge, to any person obtaining
// a copy of this software and associated documentation files (the
// "Software"), to deal in the Software without restriction, including
// without limitation the rights to use, copy, modify, merge, publish,
// distribute, sublicense, and/or sell copies of the Software, and to
// permit persons to whom the Software is furnished to do so, subject to
// the following conditions:
//
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
// MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
// LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
// OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
// WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
//

using System;
using System.IO;
using System.Security.Cryptography;
using System.Text;
using System.Threading;
using log4net;

namespace UBCopy
{
    internal class AsyncUnbuffCopy
    {
        private static readonly ILog Log = LogManager.GetLogger(typeof(AsyncUnbuffCopy));
        private static readonly bool IsDebugEnabled = Log.IsDebugEnabled;

        //file names
        private static string _inputfile;
        private static string _outputfile;

        //checksum holders
        private static string _infilechecksum;
        private static string _outfilechecksum;

        //show write progress
        private static bool _reportprogress;

        //cursor position
        private static int _origRow;
        private static int _origCol;

        //number of chunks to copy
        private static int _numchunks;

        //track read state and read failed state
        private static bool _readfailed;

        //syncronization object
        private static readonly object Locker1 = new object();

        //buffer size
        public static int CopyBufferSize;
        private static long _infilesize;

        //buffer read
        public static byte[] Buffer1;
        private static int _bytesRead1;

        //buffer overlap
        public static byte[] Buffer2;
        private static bool _buffer2Dirty;
        private static int _bytesRead2;

        //buffer write
        public static byte[] Buffer3;

        //total bytes read
        private static long _totalbytesread;
        private static long _totalbyteswritten;

        //filestreams
        private static FileStream _infile;
        private static FileStream _outfile;

        //secret sauce for unbuffered IO
        const FileOptions FileFlagNoBuffering = (FileOptions)0x20000000;

        private static void AsyncReadFile()
        {
            //open input file
            try
            {
                _infile = new FileStream(_inputfile, FileMode.Open, FileAccess.Read, FileShare.None,
CopyBufferSize, FileFlagNoBuffering);
            }
            catch (Exception e)
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Failed to open for read");
                    Log.Debug(e);
                }
                throw;
            }
            //if we have data read it
            while (_totalbytesread < _infilesize)
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Read _buffer2Dirty    : " + _buffer2Dirty);
                }
                _bytesRead1 = _infile.Read(Buffer1, 0, CopyBufferSize);
                Monitor.Enter(Locker1);
                try
                {
                    while (_buffer2Dirty) Monitor.Wait(Locker1);
                    Buffer.BlockCopy(Buffer1, 0, Buffer2, 0, _bytesRead1);
                    _buffer2Dirty = true;
                    _bytesRead2 = _bytesRead1;
                    _totalbytesread = _totalbytesread + _bytesRead1;
                    Monitor.PulseAll(Locker1);
                    if (IsDebugEnabled)
                    {

                        Log.Debug("Read       : " + _totalbytesread);
                    }
                }
                catch (Exception e)
                {
                    Log.Fatal("Read Failed.");
                    Log.Fatal(e);
                    _readfailed = true;
                    throw;
                }
                finally { Monitor.Exit(Locker1); }
            }
            //clean up open handle
            _infile.Close();
            _infile.Dispose();
        }

        private static void AsyncWriteFile()
        {
            //open output file set length to prevent growth and file fragmentation and close it.
            //We do this to prevent file fragmentation and make the write as fast as possible.
            try
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Open File Set Length");
                }
                _outfile = new FileStream(_outputfile, FileMode.Create, FileAccess.Write, FileShare.None, 8,
                                          FileOptions.WriteThrough);

                //set file size to minimum of one buffer to cut down on fragmentation
                _outfile.SetLength(_infilesize > CopyBufferSize ? _infilesize : CopyBufferSize);

                _outfile.Close();
                _outfile.Dispose();
            }
            catch (Exception e)
            {
                Log.Fatal("Failed to open for write set length");
                Log.Fatal(e);
                throw;
            }

            //open file for write unbuffered
            try
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Open File Write Unbuffered");
                }
                _outfile = new FileStream(_outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
                                          FileOptions.WriteThrough | FileFlagNoBuffering);

            }
            catch (Exception e)
            {
                Log.Fatal("Failed to open for write unbuffered");
                Log.Fatal(e);
                throw;
            }

            var pctinc = 0.0;
            var progress = pctinc;

            //progress stuff
            if (_reportprogress)
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Report Progress : True");
                }
                pctinc = 100.00 / _numchunks;
            }
            if (IsDebugEnabled)
            {
                Log.Debug("While Write _totalbyteswritten          : " + _totalbyteswritten);
                Log.Debug("While Write _infilesize - CopyBufferSize: " + (_infilesize - CopyBufferSize));
            }
            while ((_totalbyteswritten < _infilesize) && !_readfailed)
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Write Unbuffered _buffer2Dirty    : " + _buffer2Dirty);
                }
                lock (Locker1)
                {
                    if (IsDebugEnabled)
                    {
                        Log.Debug("Write Unbuffered Lock");
                    }
                    while (!_buffer2Dirty) Monitor.Wait(Locker1);
                    if (IsDebugEnabled)
                    {
                        Log.Debug("Write Unbuffered _buffer2Dirty    : " + _buffer2Dirty);
                    }
                    Buffer.BlockCopy(Buffer2, 0, Buffer3, 0, _bytesRead2);
                    _buffer2Dirty = false;
                    if (IsDebugEnabled)
                    {
                        Log.Debug("Write Unbuffered _buffer2Dirty    : " + _buffer2Dirty);
                    }
                    _totalbyteswritten = _totalbyteswritten + CopyBufferSize;
                    if (IsDebugEnabled)
                    {
                        Log.Debug("Written Unbuffered : " + _totalbyteswritten);
                    }
                    Monitor.PulseAll(Locker1);
                    //fancy dan in place percent update on each write.

                    if (_reportprogress && !IsDebugEnabled)
                    {
                        Console.SetCursorPosition(_origCol, _origRow);
                        if (progress < 101 - pctinc)
                        {
                            progress = progress + pctinc;
                            Console.Write("%{0}", Math.Round(progress, 0));
                        }
                    }
                }
                try
                {
                    _outfile.Write(Buffer3, 0, CopyBufferSize);
                }
                catch (Exception e)
                {
                    Log.Fatal("Write Unbuffered Failed");
                    Log.Fatal(e);
                    throw;
                }
            }

            //close the file handle that was using unbuffered and write through
            Log.Debug("Close Write File Unbuffered");
            _outfile.Close();
            _outfile.Dispose();

            try
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Open File Set Length");
                }
                _outfile = new FileStream(_outputfile, FileMode.Open, FileAccess.Write, FileShare.None, 8,
                                          FileOptions.WriteThrough);
                _outfile.SetLength(_infilesize);
                _outfile.Close();
                _outfile.Dispose();
            }
            catch (Exception e)
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Failed to open for write set length");
                    Log.Debug(e);
                }
                throw;
            }
        }

        public static int AsyncCopyFileUnbuffered(string inputfile, string outputfile, bool overwrite,
bool movefile, bool checksum, int buffersize, bool reportprogress)
        {
            if (IsDebugEnabled)
            {
                Log.Debug("inputfile      : " + inputfile);
                Log.Debug("outputfile     : " + outputfile);
                Log.Debug("overwrite      : " + overwrite);
                Log.Debug("movefile       : " + movefile);
                Log.Debug("checksum       : " + checksum);
                Log.Debug("buffersize     : " + buffersize);
                Log.Debug("reportprogress : " + reportprogress);
            }
            //report write progress
            _reportprogress = reportprogress;

            //set file name globals
            _inputfile = inputfile;
            _outputfile = outputfile;

            //setup single buffer size, remember this will be x3.
            CopyBufferSize = buffersize * 1024 * 1024;

            //buffer read
            Buffer1 = new byte[CopyBufferSize];

            //buffer overlap
            Buffer2 = new byte[CopyBufferSize];

            //buffer write
            Buffer3 = new byte[CopyBufferSize];

            //clear all flags and handles
            _totalbytesread = 0;
            _totalbyteswritten = 0;
            _bytesRead1 = 0;
            _buffer2Dirty = false;

            //if the overwrite flag is set to false check to see if the file is there.
            if (File.Exists(outputfile) && !overwrite)
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Destination File Exists!");
                }
                Console.WriteLine("Destination File Exists!");
                return 0;
            }

            //create the directory if it doesn't exist
            if (!Directory.Exists(outputfile))
            {
                try
                {
                    // ReSharper disable AssignNullToNotNullAttribute
                    Directory.CreateDirectory(Path.GetDirectoryName(outputfile));
                    // ReSharper restore AssignNullToNotNullAttribute
                }
                catch (Exception e)
                {
                    Log.Fatal("Create Directory Failed.");
                    Log.Fatal(e);
                    Console.WriteLine("Create Directory Failed.");
                    Console.WriteLine(e.Message);
                    throw;
                }
            }

            //get input file size for later use
            var inputFileInfo = new FileInfo(_inputfile);
            _infilesize = inputFileInfo.Length;

            //get number of buffer sized chunks used to correctly display percent complete.
            _numchunks = (int)((_infilesize / CopyBufferSize) <= 0 ? (_infilesize / CopyBufferSize) : 1);

            if (IsDebugEnabled)
            {
                Log.Debug("File Copy Started");
            }
            Console.WriteLine("File Copy Started");

            //create read thread and start it.
            var readfile = new Thread(AsyncReadFile) { Name = "ReadThread", IsBackground = true };
            readfile.Start();

            if (IsDebugEnabled)
            {
                //debug show if we are an even multiple of the file size
                Log.Debug("Number of Chunks: " + _numchunks);
            }

            //create write thread and start it.
            var writefile = new Thread(AsyncWriteFile) { Name = "WriteThread", IsBackground = true };
            writefile.Start();

            if (_reportprogress)
            {
                //set fancy curor position
                _origRow = Console.CursorTop;
                _origCol = Console.CursorLeft;
            }

            //wait for threads to finish
            readfile.Join();
            writefile.Join();

            //leave a blank line for the progress indicator
            if (_reportprogress)
                Console.WriteLine();

            if (IsDebugEnabled)
            {
                Log.Debug("File Copy Done");
            }

            Console.WriteLine("File Copy Done");

            if (checksum)
            {
                if (IsDebugEnabled)
                {
                    Log.Debug("Checksum Source File Started");
                }
                Console.WriteLine("Checksum Source File Started");
                //create checksum read file thread and start it.
                var checksumreadfile = new Thread(GetMD5HashFromInputFile) {
Name = "checksumreadfile", IsBackground = true };
                checksumreadfile.Start();

                if (IsDebugEnabled)
                {
                    Log.Debug("Checksum Destination File Started");
                }
                Console.WriteLine("Checksum Destination File Started");
                //create checksum write file thread and start it.
                var checksumwritefile = new Thread(GetMD5HashFromOutputFile) {
Name = "checksumwritefile", IsBackground = true };
                checksumwritefile.Start();

                //hang out until the checksums are done.
                checksumreadfile.Join();
                checksumwritefile.Join();

                if (_infilechecksum.Equals(_outfilechecksum))
                {
                    if (IsDebugEnabled)
                    {
                        Log.Debug("Checksum Verified");
                    }
                    Console.WriteLine("Checksum Verified");
                }
                else
                {
                    if (IsDebugEnabled)
                    {
                        Log.Debug("Checksum Failed");
                        Log.DebugFormat("Input File Checksum : {0}", _infilechecksum);
                        Log.DebugFormat("Output File Checksum: {0}", _outfilechecksum);
                    }
                    Console.WriteLine("Checksum Failed");
                    Console.WriteLine("Input File Checksum : {0}", _infilechecksum);
                    Console.WriteLine("Output File Checksum: {0}", _outfilechecksum);
                }
            }

            if (movefile && File.Exists(inputfile) && File.Exists(outputfile))
                try
                {
                    File.Delete(inputfile);
                }
                catch (IOException ioex)
                {
                    if (IsDebugEnabled)
                    {
                        Log.Error("File in use or locked cannot move file.");
                        Log.Error(ioex);
                    }
                    Console.WriteLine("File in use or locked");
                    Console.WriteLine(ioex.Message);
                }
                catch (Exception ex)
                {
                    if (IsDebugEnabled)
                    {
                        Log.Error("File Failed to Delete");
                        Log.Error(ex);
                    }
                    Console.WriteLine("File Failed to Delete");
                    Console.WriteLine(ex.Message);
                }
            return 1;
        }

        //hash input file
        public static void GetMD5HashFromInputFile()
        {
            var fs = new FileStream(_inputfile, FileMode.Open, FileAccess.Read, FileShare.None,
CopyBufferSize);
            MD5 md5 = new MD5CryptoServiceProvider();
            byte[] retVal = md5.ComputeHash(fs);
            fs.Close();

            var sb = new StringBuilder();
            for (var i = 0; i < retVal.Length; i++)
            {
                sb.Append(retVal[i].ToString("x2"));
            }
            _infilechecksum = sb.ToString();
        }

        //hash output file
        public static void GetMD5HashFromOutputFile()
        {
            var fs = new FileStream(_outputfile, FileMode.Open, FileAccess.Read, FileShare.None,
CopyBufferSize);
            MD5 md5 = new MD5CryptoServiceProvider();
            byte[] retVal = md5.ComputeHash(fs);
            fs.Close();

            var sb = new StringBuilder();
            for (var i = 0; i < retVal.Length; i++)
            {
                sb.Append(retVal[i].ToString("x2"));
            }
            _outfilechecksum = sb.ToString();
        }
    }
}

Fundamentals of Storage Systems, IO Latency and SQL Server

A Thousand Men Marching Still Only March As Fast As One Man.

la·ten·cy – Computers . the time required to locate the first bit or character in a storage location, expressed as access timeminus word time.

Often when talking to people about performance they get rapped around the MB/Sec number and ignore a critical factor, latency. SQL Server is one of those few applications that is very sensitive to disk and network latency. Latency is what the end user sees. If your SQL Server is waiting around for disk or network they will start to complain. In an OLTP environment SQL Server accesses data on disk in a nonlinear fashion, aka random IO’s. The size of these IO request can be pretty small. In a good application you really try and limit the amount of data returned to keep things speedy. The down side of small Random IO’s is the system will never be faster than a single seek operation on your disk. So, if you have a 15k SAS drive that is around 2.5ms. Caching and buffering schemes aside for now, 2.5ms is your floor. You will never be faster than that. Depending on the size of the IO request you spend more time waiting for the seek operation than you do actually transferring the data from the disk. We group disks together in larger arrays to give us more aggregate throughout and higher operations per second, but you are ever only as fast as your slowest disk. You can always get more aggregate throughput up to several gigabytes a second but you still have to wait on that first bit to travel to you.

To get around these short comings, SQL Server buffers as much data as it can in memory. We also add large amounts of cache to our SAN’s and disk controllers. This can mask some of the problem, but at some point it still needs to come from your disk drives.

On the network side things are actually better. With latency under a millisecond on a LAN you usually waiting on disk to deliver the data. There are other factors like the speed of the network equipment and number of hops across interfaces you have to make can be more significant than the actual transmittion rate. TCP/IP can be a factor as well. Out of the box SQL Server is configured at 4KB(4096 byte) packet. This is a good general setting for most workloads. If you are working on a highly tuned OLTP system you may want to set that to something smaller or align it with the TCP packet size for your network, usually 1500 bytes. If it is a OLAP system with lots of streaming throughput latency will make up a very small part of the overall transmission time and going with a larger packet size possibly aligned to the 8KB page size with increase throughput and decrease the time to transmit overall. If you do set a large packet size you should consider enabling jumbo frames on your network card. Make sure that your network equipment can support the jumbo frame from end to end.

Another place where we are starting to see more latency issues is with database mirroring. In synchronous mode, the default setting, you are now adding network latency plus the disk latency of the other server to the overall transaction time.

Mirroring isn’t the only game in town. We have had SAN level replication for quite a while as well. In most of the scenarios where we were using SAN level replication it was site to site across several miles. To give you an idea of how much latency can be added in this situation go ping yahoo.com or google.com, I’ll wait….. Ok, from my location ether of them is 45ms~75ms, or 18 times slower than your spinning disks. All the sudden, your network is the major player in delaying transactions. I’ve used fibre optics to connect to sites and the latency can still be a killer for OLTP systems. The general rule of thumb is 7.5 microseconds for every  1 1/2 miles. If our target SAN is 125 miles away we just added 2ms of latency to the 4ms of latency the two sets of disks are providing. In reality, it is worse than that when you again figure in network equipment. I personally haven’t seen synchronous setups more than 50 miles apart.

Just something to keep in mind as you plan your SQL Server infrastructure. Latency in its myriad forms is king.

Series To Date:
  1. Introduction
  2. The Basics of Spinning Disks
  3. The System Bus
  4. Disk Controllers, Host Bus Adapters and Interfaces
  5. RAID, An Introduction
  6. RAID and Hard Disk Reliability, Under The Covers
  7. Stripe Size, Block Size, and IO Patterns
  8. Capturing IO Patterns
  9. Testing IO Systems
  10. Latency – You are here!

Moore’s Law May Be The Death of NAND Flash

"It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so." -  Mark Twain

I try and keep this quote in my mind whenever I’m teaching about new technologies. You often hear the same things parroted over and over again long after they quit being true. This problem is compounded by fast moving technologies like NAND Flash.

If you have read my previous posts about Flash memory you are already aware of NAND flash endurance and reliability. Just like CPU’s manufacturing processes flash receive boost in capacity as you decrease the size of the transistors/gates used on the device. In CPU’s you get increases in speed, on flash you get increases in size. The current generation of flash manufactured on a 32nm process. This nets four gigabytes per die. Die size isn’t the same as chip, or package size. Flash dies are actually stacked in the actual chip package giving us sixteen gigabytes per package. With the new die shrink to 25nm we double the size to eight gigabytes and thirty two gigabytes respectively. That sounds great, but there is a dark side to the ever shrinking die. As the size of the gate gets smaller it becomes more unreliable and has less endurance than the previous generation. MLC flash suffers the brunt of this but SLC isn’t completely immune.

Cycles And Errors

One of the things that always comes up when talking about flash is the fact it wears out over time. The numbers that always get bantered about are SLC is good for 100,000 writes to a single cell and MLC dies at 10,000 cycles. This is one of those things that just ain’t so any more. Right now the current MLC main stream flash based on the 32nm process write cycles are down to 5000 or so. 25nm cuts that even further to 3000 with higher error rates to boot.

Several manufactures has announced the transition to 25nm on their desktop drives. Intel and OCZ being two of the biggest. Intel is a partner with Micron. They are directly responsible for developing and manufacturing quite a bit of the NAND flash on the market. OCZ is a very large consumer of that product. So, what do you do to offset the issues with 25nm? Well, the same thing you did to offset that problem with 32nm, more spare area and more ECC. At 32nm it wasn’t unusual to see 24 bits of ECC per 512 bytes. Now, I’ve seen numbers as high as 55 bits per 512 bytes to give 25nm the same protection.

To give you an example here is OCZ’s lineup with raw and usable space listed.

Drive Model Production Process Raw Capacity (in GB) Affected Capacity (in GB)
OCZSSD2‐2VTXE60G 25nm 64 55
OCZSSD2‐2VTX60G 32nm 64 60
OCZSSD2‐2VTXE120G 25nm 128 118
OCZSSD2‐2VTX120G 32nm 128 120

As you can clearly see the usable space is significantly decreased. There is a second problem specific to the OCZ drives as well. Since they are now using higher density modules they are only using half as many of them. Since most SSD’s get their performance from multiple read/write channels cutting that in half isn’t a good thing.

SLC is less susceptible to this issue but it is happening. At 32nm SLC was still in the 80,000 to 100,000 range for write cycles but the error rate was getting higher. At 25nm that trend continues and we are starting to see some of the same techniques used in MLC coming to SLC as ECC creeps up from 1 bit per 512 bytes to 8 bits or more per 512 bytes. Of course the down side to SLC is it is half the capacity of MLC. As die shrinks get smaller SLC may be the only viable option in the enterprise space.

It’s Non-Volatile… Mostly

Another side effect of shrinking the floating gate size is the loss of charge due to voltage bleed off over time. When I say “over time” I’m talking weeks or months and not years or decades anymore. The data on these smaller and smaller chips will have to be refreshed every few weeks. We aren’t seeing this severe an issue at the 25nm level but it will be coming unless they figure out a way to change the floating gate to prevent it.

Smaller Faster Cheaper

If you look at trends in memory and CPU you see that every generation the die gets smaller, capacity or speed increases and they become cheaper as you can fit double the chips on a single wafer. There are always technical issues to overcome with every technology. But NAND flash is the only one that gets so inherently so unreliable at smaller and smaller die sizes. So, does this mean the end of flash? In the short term I don’t think so. The fact is we will have to come up with new ways to reduce writes and add new kinds of protection and more advanced ECC. On the pricing front we are still in a position where demand is outstripping supply. That may change somewhat as 25nm manufacturing ramps up and more factories come online but as of today, I wouldn’t expect a huge drop in price for flash in the near future. If it was just a case of SSD’s consuming the supply of flash it would be a different matter. The fact is your cell phone, tablet and every other small portable device uses the exact same flash chips. Guess who is shipping more, SSDs or iPhones?

So, What Do I Do?

The easiest thing you can do is read the label. Check what manufacturing process the SSD is using. In some cases like OCZ that wasn’t a straight forward proposition. In most cases though the manufacturer prints raw and formatted capacities on the label. Check the life cycle/warranty of the drive. Is it rated for 50 gigabytes of writes or 5 terabytes of writes a day? Does it have a year warranty or 5 years? These are indicators of how long the manufacturer expects the drive to last. Check the error rate! Usually the error rate will be expressed in unrecoverable write or read errors per bit. Modern hard drives are in the 10^15 ~ 10^17 range. Some enterprise SSDs are in the 10^30 range. This tells me they are doing more ECC than the flash manufacturer “recommends” to keep your data as safe as possible.