Category Archives: Syndicated

Fundamentals of Storage Systems, IO Latency and SQL Server

A Thousand Men Marching Still Only March As Fast As One Man.

la·ten·cy – Computers . the time required to locate the first bit or character in a storage location, expressed as access timeminus word time.

Often when talking to people about performance they get rapped around the MB/Sec number and ignore a critical factor, latency. SQL Server is one of those few applications that is very sensitive to disk and network latency. Latency is what the end user sees. If your SQL Server is waiting around for disk or network they will start to complain. In an OLTP environment SQL Server accesses data on disk in a nonlinear fashion, aka random IO’s. The size of these IO request can be pretty small. In a good application you really try and limit the amount of data returned to keep things speedy. The down side of small Random IO’s is the system will never be faster than a single seek operation on your disk. So, if you have a 15k SAS drive that is around 2.5ms. Caching and buffering schemes aside for now, 2.5ms is your floor. You will never be faster than that. Depending on the size of the IO request you spend more time waiting for the seek operation than you do actually transferring the data from the disk. We group disks together in larger arrays to give us more aggregate throughout and higher operations per second, but you are ever only as fast as your slowest disk. You can always get more aggregate throughput up to several gigabytes a second but you still have to wait on that first bit to travel to you.

To get around these short comings, SQL Server buffers as much data as it can in memory. We also add large amounts of cache to our SAN’s and disk controllers. This can mask some of the problem, but at some point it still needs to come from your disk drives.

On the network side things are actually better. With latency under a millisecond on a LAN you usually waiting on disk to deliver the data. There are other factors like the speed of the network equipment and number of hops across interfaces you have to make can be more significant than the actual transmittion rate. TCP/IP can be a factor as well. Out of the box SQL Server is configured at 4KB(4096 byte) packet. This is a good general setting for most workloads. If you are working on a highly tuned OLTP system you may want to set that to something smaller or align it with the TCP packet size for your network, usually 1500 bytes. If it is a OLAP system with lots of streaming throughput latency will make up a very small part of the overall transmission time and going with a larger packet size possibly aligned to the 8KB page size with increase throughput and decrease the time to transmit overall. If you do set a large packet size you should consider enabling jumbo frames on your network card. Make sure that your network equipment can support the jumbo frame from end to end.

Another place where we are starting to see more latency issues is with database mirroring. In synchronous mode, the default setting, you are now adding network latency plus the disk latency of the other server to the overall transaction time.

Mirroring isn’t the only game in town. We have had SAN level replication for quite a while as well. In most of the scenarios where we were using SAN level replication it was site to site across several miles. To give you an idea of how much latency can be added in this situation go ping yahoo.com or google.com, I’ll wait….. Ok, from my location ether of them is 45ms~75ms, or 18 times slower than your spinning disks. All the sudden, your network is the major player in delaying transactions. I’ve used fibre optics to connect to sites and the latency can still be a killer for OLTP systems. The general rule of thumb is 7.5 microseconds for every 1 1/2 miles. If our target SAN is 125 miles away we just added 2ms of latency to the 4ms of latency the two sets of disks are providing. In reality, it is worse than that when you again figure in network equipment. I personally haven’t seen synchronous setups more than 50 miles apart.

Just something to keep in mind as you plan your SQL Server infrastructure. Latency in its myriad forms is king.

Series To Date:

SQLDIY: Manage and Monitor SQL Server Yourself

As an old school DBA and always having cheap frugal employers, I have rolled my own management and monitoring solutions. I’ve shared bits and pieces of it over the years to colleagues and to the community but never the whole thing at one go. I did try and build a commercial version. With the birth of my son, doing contract work and building a company from scratch just wasn’t in the cards. So, my loss is your gain! I’ve decided to take what I have built up over the years, clean it up and post it as a series of articles. The code is hosted at github. It may take a while to unwind all the pieces, update it for SQL Server 2008 and clean it up for general consumption but when it is done you should have a solid foundation for a solution of your own.

My goal as a DBA is to provide secure, accurate, consistent and efficient data access to all the databases under my care.

To meet those goals I should:

Provide monitoring and alerting for critical errors.
Provide reliable maintenance on all databases.
Perform backup and recovery operations, including testing backups to guarantee recoverability.
Analyze performance issues and be proactive about possible issues.
Secure the databases and monitor for any breaches in security policies and procedures.

If you have a few SQL Server instances to manage a single person may be able to do these things without any level of automation. As you scale to dozens or hundreds of instances it becomes impossible to throw enough people at the problem. You get inconsistency across servers in configuration and management. Introduce inaccuracy in your monitoring and long term growth projections. Add a whole new level of unreliability as some servers just fall off the radar and your DBA team spends all their time reacting to problems.

My core philosophy for dealing with these issues is pretty straight forward.

Manage By Exception
This means setting up your monitoring and alerting systems to work for you not against you. To many alerts are just as deadly as too few. As you handle problems feed the solution back into your automated system.

Automate Everything
Make your servers work for you and not against you! There is always a cost up front to automate the management of your environment. The time you get back by not having your very valuable and highly skilled workers not doing menial tasks more than makes up for it.

Standardize Across Servers
As servers come on line you may find that they aren’t setup to the company policies. Fixing all of this by hand is also a huge time waster. Define a standard and modify it as needed. Through automation you can make a single change and then deploy it across your entire environment.

To meet these needs I’ve written several modules, a centralized system for managing them and the servers in my care. I will update this list with links as the articles come on line. If I use any code that I haven’t written I’ll be very clear who did write it, if I made changes to it and what the license to redistribute the code falls under.

Data Gathering

Database Space Detail
Space Detail At The Index Level
Server DB Configuration
Server Database I/O
Server Wait Stats
Database Backup Detail
Job Schedules
Table Statistics
Index Fragmentation Levels
Index Usage Statistics
Statistics Information Including Last Update

Monitoring And Alerting

Server Blocking
Check Backup Space
Server Database Transaction Rate
Server Poll
System Database Roles
Long Running Queries
Long Running Stored Procedures
Log Shipping

Administration

Backup Database
Update Statistics
Re-indexing
QA Refresh
Test Backups
Log Shipping

Reporting

Daily Status
Data Dictionary
Long Running Queries
Long Running Stored Procedures

Trend Analysis

Backup/Restore Failure Analysis
Backup Growth
Drive Space Detail
Server Blocking
Database I/O
Database Transaction Rate
Server Wait Statistics
Long Running Queries
Long Running Stored Procedures

Moore’s Law May Be The Death of NAND Flash

"It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so." - Mark Twain

I try and keep this quote in my mind whenever I’m teaching about new technologies. You often hear the same things parroted over and over again long after they quit being true. This problem is compounded by fast moving technologies like NAND Flash.

If you have read my previous posts about Flash memory you are already aware of NAND flash endurance and reliability. Just like CPU’s manufacturing processes flash receive boost in capacity as you decrease the size of the transistors/gates used on the device. In CPU’s you get increases in speed, on flash you get increases in size. The current generation of flash manufactured on a 32nm process. This nets four gigabytes per die. Die size isn’t the same as chip, or package size. Flash dies are actually stacked in the actual chip package giving us sixteen gigabytes per package. With the new die shrink to 25nm we double the size to eight gigabytes and thirty two gigabytes respectively. That sounds great, but there is a dark side to the ever shrinking die. As the size of the gate gets smaller it becomes more unreliable and has less endurance than the previous generation. MLC flash suffers the brunt of this but SLC isn’t completely immune.

Cycles And Errors

One of the things that always comes up when talking about flash is the fact it wears out over time. The numbers that always get bantered about are SLC is good for 100,000 writes to a single cell and MLC dies at 10,000 cycles. This is one of those things that just ain’t so any more. Right now the current MLC main stream flash based on the 32nm process write cycles are down to 5000 or so. 25nm cuts that even further to 3000 with higher error rates to boot.

Several manufactures has announced the transition to 25nm on their desktop drives. Intel and OCZ being two of the biggest. Intel is a partner with Micron. They are directly responsible for developing and manufacturing quite a bit of the NAND flash on the market. OCZ is a very large consumer of that product. So, what do you do to offset the issues with 25nm? Well, the same thing you did to offset that problem with 32nm, more spare area and more ECC. At 32nm it wasn’t unusual to see 24 bits of ECC per 512 bytes. Now, I’ve seen numbers as high as 55 bits per 512 bytes to give 25nm the same protection.

To give you an example here is OCZ’s lineup with raw and usable space listed.

Drive Model	Production Process	Raw Capacity (in GB)	Affected Capacity (in GB)
OCZSSD2‐2VTXE60G	25nm	64	55
OCZSSD2‐2VTX60G	32nm	64	60
OCZSSD2‐2VTXE120G	25nm	128	118
OCZSSD2‐2VTX120G	32nm	128	120

As you can clearly see the usable space is significantly decreased. There is a second problem specific to the OCZ drives as well. Since they are now using higher density modules they are only using half as many of them. Since most SSD’s get their performance from multiple read/write channels cutting that in half isn’t a good thing.

SLC is less susceptible to this issue but it is happening. At 32nm SLC was still in the 80,000 to 100,000 range for write cycles but the error rate was getting higher. At 25nm that trend continues and we are starting to see some of the same techniques used in MLC coming to SLC as ECC creeps up from 1 bit per 512 bytes to 8 bits or more per 512 bytes. Of course the down side to SLC is it is half the capacity of MLC. As die shrinks get smaller SLC may be the only viable option in the enterprise space.

It’s Non-Volatile… Mostly

Another side effect of shrinking the floating gate size is the loss of charge due to voltage bleed off over time. When I say “over time” I’m talking weeks or months and not years or decades anymore. The data on these smaller and smaller chips will have to be refreshed every few weeks. We aren’t seeing this severe an issue at the 25nm level but it will be coming unless they figure out a way to change the floating gate to prevent it.

Smaller Faster Cheaper

If you look at trends in memory and CPU you see that every generation the die gets smaller, capacity or speed increases and they become cheaper as you can fit double the chips on a single wafer. There are always technical issues to overcome with every technology. But NAND flash is the only one that gets so inherently so unreliable at smaller and smaller die sizes. So, does this mean the end of flash? In the short term I don’t think so. The fact is we will have to come up with new ways to reduce writes and add new kinds of protection and more advanced ECC. On the pricing front we are still in a position where demand is outstripping supply. That may change somewhat as 25nm manufacturing ramps up and more factories come online but as of today, I wouldn’t expect a huge drop in price for flash in the near future. If it was just a case of SSD’s consuming the supply of flash it would be a different matter. The fact is your cell phone, tablet and every other small portable device uses the exact same flash chips. Guess who is shipping more, SSDs or iPhones?

So, What Do I Do?

The easiest thing you can do is read the label. Check what manufacturing process the SSD is using. In some cases like OCZ that wasn’t a straight forward proposition. In most cases though the manufacturer prints raw and formatted capacities on the label. Check the life cycle/warranty of the drive. Is it rated for 50 gigabytes of writes or 5 terabytes of writes a day? Does it have a year warranty or 5 years? These are indicators of how long the manufacturer expects the drive to last. Check the error rate! Usually the error rate will be expressed in unrecoverable write or read errors per bit. Modern hard drives are in the 10^15 ~ 10^17 range. Some enterprise SSDs are in the 10^30 range. This tells me they are doing more ECC than the flash manufacturer “recommends” to keep your data as safe as possible.

SQLSaturday 57 In Houston, Here I Come!

Another year and staring it off right! I’ll be speaking at SQLSaturday in Houston, TX on the fundamentals of storage and SQL Server. I love this talk. I give it regularly and it’s like an old friend to me. Near and dear to my heart, IO subsystems and SQL Server have been my bread and butter for quite a while. So, if you are going to Houston come find me and say hi. There is only one thing more than educating people, and that is getting to know those people.

Understanding SQL Server and Storage Systems

The most important part of your SQL Server is also the slowest, Storage. This talk will take you through the fundamentals of your server’s Disk I/O System. From how hard drives work, through RAID configurations, and how to configure the file system. This session should give you a solid foundation over storage systems and help you understand why they are slow and how to overcome some of their limitations.

Session Goals

Understand the physical characteristics of IO hardware.
Understand the fundamentals of RAID.
Understand how to configure the file system.

See you there!

#SQLRally is coming, Go vote!

We are in the final stages of selecting the speakers for the SQLRally May 11th through the 13th in sunny Orlando Florida. The program selection is a little different than what we have done with the Summit. The committee narrowed the number of selections and is putting the rest up to a public vote. This is your opportunity to voice your opinion on what you would like to hear at this inaugural event! I’ve been fortunate enough to have two of my sessions put up for a vote. If you follow my blog you know I have a passion for moving bits of data around as fast as possible. Both my sessions focus on storage. As much as I would love to have your votes to see my sessions at SQLRally, I would like it even more if you voted on what YOU want to learn about the most. Having served on the program committee for Summit last year I know just how hard it can be choosing what I think people would like to learn about. having the opportunity to make your choice known directly is just awesome. I am very excited to see PASS expand and have training events that cover the gambit. Starting with local user groups and SQL Saturdays now growing with SQLRally and finishing it off with the Summit, there is something for every budget.

With that said, here are my abstracts so you can get a better idea of what I’m speaking on. GO VOTE!

Title:
Solid State Storage Deep Dive
Speaker:
Wesley Brown
Category:
Storage
Level:
100

Abstract:
If you have ever wanted to know how SSD’s and Flash memory works this talk is for you. We will cover the fundamentals of Flash in detail. I will also highlight some of the specific vendor implementations and what makes a particular SSD enterprise-ready vs. consumer grade. We will also cover SQL Server usage patterns what is a good fit for SSD’s and when it may be better to go with hard disks. Solid State Storage isn’t a cure-all for every situation, this presentation will give you the tools you need to make the right choice for your SQL Server environment.

Session Goals

Understand the fundamental building block of Flash memory.
Get a clear explanation of what makes some SSD’s robust enough for enterprise use.
Learn where SSD will and won’t make a real difference in your SQL Server environment.

Title:
Understanding Storage Systems and SQL Server
Speaker:
Wesley Brown
Category:
Storage
Level:
100

Abstract:
The most important part of your SQL Server is also the slowest, Storage. This talk will take you through the fundamentals of your server’s Disk I/O System. From how hard drives work, through RAID configurations, and how to configure the file system. This session should give you a solid foundation over storage systems and help you understand why they are slow and how to overcome some of their limitations.

Session Goals

Understand the physical characteristics of IO hardware.
Understand the fundamentals of RAID.
Understand how to configure the file system.