Author Archives: Wes Brown

About Wes Brown

Wes Brown is Vice President at Cantata Health and maintains his blog at http://www.sqlserverio.com. Previous experiences include Product Manager for SQL Litespeed by Quest software and consultant to fortune 500 companies. He specializes in high availability, disaster recovery and very large database performance tuning. He is a frequent speaker at local user groups and SQLSaturdays.

The PASS Session Selection Process, My Experience Part 1

Well, I have just finished up my part in the abstract selection process for the PASS Summit. This was my first year as a volunteer in this process so I don’t have any specific previous experience to draw on. I have some experiences that are similar. Some of you may know that I was actually a double major Mass Communications and Theatre in college. I have experience in putting together competitions and also as a judge at several events. So, while I might not have PASS experience I have had to judge others on their abstract writing and presentation skills. This process isn’t an easy one. It also is getting more difficult every year just due to the volume of submissions. To put into perspective, there were 7 regular session professional development slots with 4 alternates and 47 submitters. We had more seasoned speakers than slots for sure. I won’t know if the sessions I promoted will be selected or not. Unlike Steve, I didn’t get the impression my word was the final one. I understand why Steve isn’t happy about the selection process, I just assumed that my recommendations were just that, recommendations and someone else would have the final say. There are other factors I have no control over. If a speaker is chosen by other tracts there is no guarantee that the speaker will be available for the PD track.

Abstracts

Quite a number of the abstracts were well written with clear goals. A number of them though needed some revisions or additions. I went to the PASS website and looked at the abstract submittal guidelines and found them incomplete. If these had been submitted without some additional information or not by a known speaker and leader in the community I saw them as incomplete. Lets break it down.

The Title:
Being witty is fun, but if I don’t know what are actually talking about the people seeing it in the program guide won’t ether and may skip you over.

The Bio:
Even if you are well known, a Bio just helps set that in stone. It also keeps me from doing two things, digging around the internet to see what you have done or relying on my less than perfect memory about you. Without a Bio you are trusting me to gather the information and set you apart from the other 40+ people wanting a slot.

The Abstract:
You don’t have to write War and Peace. You do need to be descriptive and clear. If you can do that in the space of a Tweet great, you have 1000 words don’t be shy. Focus your topic. Don’t mix and match several things that may not even be related in the same presentation. If the first third applies to me but the other two thirds don’t I probably won’t go to the presentation at all. Again, funny is great but if you don’t cut to the chase I don’t know what your topic is really about.

Session Prerequisites:
Unless this is a 100 level session you will have to detail some prerequisites that your audience should have. Since of humor or personality flaws aren’t prerequisites. One year of analysis services is.

Session Goals:
Right now we have three slots on the form. Most folks didn’t take full advantage of listing the learning points.  The goals should be tangible and something you can repeat on your own after sitting through the session. Feeling better about yourself is something you get from therapy. Learning how to prioritize tasks is a goal. If there aren’t enough slots then use your abstract space for that as well.

Level:
This seems to be one of the more difficult areas. It can be subjective. I would use your prerequisites and goals guide you. If there are a few specific prerequisites that generally indicates something higher than novice. If it requires specific knowledge about a specific feature that may be advanced or higher. If your goals are very specific about a feature or technique that may also raise the session level.

Next post I’ll talk about the process and tools that we used to make these difficult decisions.

Fundamentals of Storage Systems – RAID and Hard Disk Reliability, Under the Covers

In the last RAID article we covered the basics. This is a little deeper dive into the underlying mechanics of RAID. Exactly what it does, how it does it and what it doesn’t do that people assume it does. I sited David Patterson, Garth Gibson, and Randy Kats and their work at UC Berkley on RAID. They show something I’ve talked about before the “Pending I/O Crises”. Of course it isn’t pending anymore, its here. One of the concerns has to do with Amdah’s Law and speeding up execution with parallel operations. As processors and memory speed up hard disks are still an order of magnitude slower. Another aspect is Kryder’s Law, which like Moore’s Law, is a estimation of capacity growth of hard disks over time. Kryder’s Law is starting to slow down just as Moore’s law is. The problem with hard drives has never really been capacity, its speed. As areal density increases you do get an increase in data throughput, there is simply more data per square inch on the disk. You also get an improvement in I/O’s, tracks are closer together.  We haven’t broken past the 15k barrier yet. I’ve still got Seagate Cheetah 15k.3 drive from 2002. It has a max sequential throughput around 80 MB/sec. I doubt we will see spinning disks faster than 15k. This is a real problem for scaling I/O up. Enter RAID. It’s simple get a bunch of disks and then stripe data across them. One little problem creeps up.  Reliability goes down for each drive you add to the array. Using RAID 0 pretty much guarantees you will have an array failure. To overcome this We start adding some way to make the data more redundant.

Hard Disk Reliability

People make a lot of assumptions about hard drives and their reliability. Hard disks break down into two classes consumer grade, the drive you have in your desktop and enterprise, the kind usually in your servers. There are misconceptions around both. Recently, Google and others have written papers based on long term large batch sample failure rates and found the enterprise class drives don’t last any longer than consumer class. This study is perfectly valid from a physical reliability point of view. Most drives are manufactured the same way in the same plants. Not like the poor misunderstood lemming, hard disks do all jump off a cliff together. Studies have shown that there is a strong corollary to disk failure and a shared manufacturing batch. Simply put, if they are made around the same time if one has a failure there is a likelihood, around 30%, other drives in that batch will also suffer failures. So, what are we paying for with an enterprise drive besides speed? Data reliability. Enterprise level drives have more robust error correction than their consumer counterparts. On a normal hard drive the smallest piece of data that can be written is 512 bytes. This is the size of a sector. Enterprise drives usually have 520 byte sector 8 bytes are used to verify the data in that sector, this is the Data Integrity Field. DIF isn’t 100% ether. It is more reliable than a consumer drive without it. You can still have write corruption for several reasons. Misdirected writes occur when data is written to the wrong location on disk and reported as a successful write. When the system goes to access again you get a read fault. Torn pages, which we are familiar with, is when an 8k page write is requested but only part of the 8k is actually reported. Corruption outside the drive where the controller makes a bad request to write but it is a perfectly legitimate I/O request at the hard drive level. With larger drives the odds of hitting one of these errors becomes a real possibility. Enterprise drives add this extra layer of protection. Your RAID HBA may also have additional error correction. The last thing I would like to touch on is write catching. Without a battery backup, or if the cache non-volatile in nature, you will loose data on a power failure if a write is in progress.

RAID Host Bus Adapter Reliability

The adapter is as reliable as any other component in your system. Normally, the cache on the controller is ECC based. Also, you usually have the option of a battery module to supply the cache with power incase of an outage so the data in cache can be written to the array when everything comes back up. Most of the issues I have seen with RAID HBAs is almost always driver or firmware related. You may also see inconsistent performance due to write catching and the battery backup unit. The unit has to be taken off line and conditioned to keep it in top condition. The side effect is a temporary disabling of the write cache on the controller. You can override this setting on some controllers but it is dangerous proposition. I personal anecdote from my days at a large computer manufacturer, we started getting a larger volume of failed drive calls into support. We started doing failure analysis. It all pointed back to a particular batch of hard drives. That was when the drive manufacturer made a change in its drives removing very small component. It shaved a few cents off the cost but had a dramatic effect. All the drives were technically good and would pass validation. Under a enough load and attached to a particular RAID HBA they would randomly fall off line. It came down to the little component. It provided a little bit of electrical noise suppression on the SCSI bus. Some cards were effected and others chugged along just fine. This is also confirmed by the Google paper, they observed the same behavior. They also point out that 20% to 30% of all returned drives have no detectible problems. The point is validate your entire I/O stack. Any single component may be within specification but may not play well with others.

RAID Parity, Mirroring, and Recoverability

Not to belabor the point, RAID isn’t bullet proof. People rap RAID round themselves like Superman’s cape. There are several issues that all the RAID schemes in the world don’t protect against. With current hard disks in the two terabyte range it is possible to build even a small RAID 5 array and have potential for complete failure. The problem is the amount of data that has to be read for the rebuild process. Having a hot spare available reduces the time to replace a failed drive to zero but that is only part of the equation. The much larger part is rebuild time. Lets say you have a 14 drive RAID 5 array with the new two terabyte drives installed and suffer a failure. If you have no activity on the array and all the IO is detected to the rebuilt it could still take two or three days to rebuild the array. During that time you are effectively running on a RAID 0 array that is now under load. Your chance of total array failure is near 100%. RAID by its very nature assumes a failure is a hard failure. A drive goes off line and the redundant part of the system takes over. It also makes the assumption that if a write succeeds then, barring a hardware failure, the read will also be valid. Data is only validated on writes not on reads. If it was RAID 5 would be twice as slow on reads and four times as slow on writes as a single drive or RAID 0. With all the potential hidden write failures it is completely possible to have hidden corruption and not know it until it is way to late. RAID levels with striped parity are most susceptible to this kind of silent creeping corruption. It is possible that the corrupted data is in the parity stripe making it completely unusable for data reconstruction. If that particular piece of data doesn’t change you can go a very long time with a RAID 5 array with polluted parity. You know how to recover from a polluted parity stripe? Simple, copy all the data off the array, figure out which files are now corrupt and restore them. RAID 6 with its dual stripes makes it more likely to recover your data from a single parity stripe becoming corrupt. You do pay a price in write speed for that extra level of protection. RAID 1 and RAID 10 aren’t perfect ether. On a mirrored pair if the write is assumed good there is no way to validate that on read. Without a third piece of information, like a checksum, it would be a coin toss. If the read is successful there is no way to tell which drive has the bad data. It is possible to have a mirrored pair run just fine with one giving you corrupted data on reads all day long. It would manifest itself as file corruption or some other anomaly that could be difficult to track down. We are back to relying on the disk to tell us all is well. We often recommend RAID 10 over everything else for speed and reliability, and I still hold to that. RAID 10 can still suffer from a catastrophic failure due to a single mirrored pair failing at the same time. With the probability of correlated disk failures it can’t be ignored.

What Can We Do?

There are a few tools available to us that can help predict the failure of a drive or that something is wrong with the array. All modern drives support the SMART protocol. Even though Google found it wasn’t as useful and wasn’t 100% reliable, closer to 30%, some warning is better than none in my opinion. All modern RAID HBA’s also come with tools to detect parity errors. You do take a hit when you run these internal consistency checks. Just like you run maintenance on your databases via DBCC your RAID arrays need checkups too. They are a necessary evil if you don’t want any surprises one day when you have a failed drive in your RAID 5 array and can’t rebuild it. If you have intermittent problems with a drive, don’t mess around, replace it. The HBA almost always has the ability to send SNMP messages to something like nagios or HP Openview, Use it. If you aren’t running something like that usually you can configure email alerts on error to go out. Proactive is the name of the game.

Don’t take my word for it….

Short list of papers to get you started on your path to paranoia.

Google Disk Failure analysis

Original RAID Paper

NetApp disk failure analysis

CERN data corruption tests

Silent Data Corruption in SATA arrays

Series To Date:
  1. Introduction
  2. The Basics of Spinning Disks
  3. The System Bus
  4. Disk Controllers, Host Bus Adapters and Interfaces
  5. RAID, An Introduction
  6. RAID and Hard Disk Reliability, Under The Covers – You are here!
  7. Stripe Size, Block Size, and IO Patterns
  8. Capturing IO Patterns
  9. Testing IO Systems

SQL Saturday #35 Notes and Observations

First of all, I want to congratulate all the volunteers that made this happen. It was a very well organized event and ran smoothly. I had a great time. It was nice meeting people that I couldn’t have met any other way.

 

As A Vendor…

The Good:

Ryan Adams did a very good job keeping things coordinated up to the event. Making sure that everything we were entitled to we got. Always very responsive to emails and questions.

The day of I always had Ryan or one of the volunteers stop by between sessions and check that everything was good. I have always had a good experience with PASS events, but I’ve never had so many people checking on us before!

Needs Improvement:

Table placements. I just didn’t understand the flow and layout of the event until I saw the venue first hand. I would have picked a different table. I don’t think it hurt us, we had crazy foot traffic and lots of conversations.

It did bottleneck up sometimes around the vendor tables as sessions let out but I think over all the placement was OK. There isn’t much room to work with and I don’t know if I would have done much better in their shoes!

Overall:

As a vendor I was very happy with the event and the amount of time I got to spend talking to folks about Nitrosphere and what we do. As a new company getting out and meeting people is very important.

Only having one or two big conferences a year is difficult and costs 5 to 10 times the amount of money that a SQL Saturday does.

 

As a Speaker…

The Good:

Again, well coordinated no scheduling issues or anything like that. I found the different tracks layer out well. The meet and greet the night before was nice.

Speaker room was big enough.Internet access seemed fine to me.

Again, I was checked on by the staff over and over to make sure things were OK.

We also had a handler feeding us time to help keep us on track.

Needs Improvement:

Recording sessions was spotty. It was a last minute thing and most of us could have used a little hand holding getting it right.

Overall:

As a speaker I was happy again with the organization and attention to detail.

 

As an attendee…

The Good:

Lots of tracks an sessions for everyone. I enjoyed seeing so many local and new speakers making the break.

Plenty of interaction between people and speakers.

The food was great, I NEVER get the chicken salad, I ate two for lunch :). Oh the ice cream…. so evil.

Needs Improvement:

Bathroom Queue Length’s were a little long but did clear up.

Finding the stairs to the second floor was fun.

Overall:

Yet again, no real complaints. Plenty of seating solid flow and awesome choices. I still can’t believe this was a free day of training!

 

I will be making room for other SQL Saturdays going forward.

What happens when Windows is the step-child? Adventures in Ruby on Rails.

Like many of you I’ve heard the developer community going on about Rails for quite a while now. It wasn’t until recently I had any reason to dip into that world. Over at Nitosphere the website is all run on Rails. We got a inexpensive web host and it was pretty easy to get it up and running. Like most shared web host, it is all linux/open source based. We have now grown to the point that hosting our own server would be cheap enough and give us complete control over the box. As a Microsoft ISV I thought it would be nice to have our new box be a Windows box. It would also be nice to hook in to SQL Server instead of MySQL as well. After a little digging I did find that Microsoft is sponsoring IronRuby, a Ruby clone that runs on the .net platform. Unfortunately, it isn’t completely compatible with one of the packages that we need to run the website on. So, back to Ruby. There is also a gem to run Rails apps against SQL Server, It isn’t compatible with some of the stuff on our website ether. Finally, I fell back to ODBC to connect to SQL Server. Everything wired up but there was still an incompatibility issue. I’ll keep trying to work it out but our fall back was MySQL.

You will need:

Source Control:

If you plan on getting the source for anything you will need ether GIT or Subversion.

GIT for windows:

http://code.google.com/p/msysgit/

http://msysgit.googlecode.com/files/Git-1.7.0.2-preview20100309.exe

Subversion clients:

http://www.sliksvn.com/en/download/ basic client 32bit or 64bit

some folks prefer tortoisesvn

http://tortoisesvn.tigris.org/ 

 

Ruby core:

http://rubyinstaller.org/

rubyinstaller-1.8.7-p249-rc2.exe

Ruby 1.8.7 has the most compatibility with existing gems. Get the latest installer if the one linked isn’t it. There are installers for 1.8.6 and 1.9.1. Again, check to see if they will support the gems you will need to get your site up and running!

http://rubyforge.org/frs/download.php/66888/devkit-3.4.5r3-20091110.7z

install the dev kit if you would like to compile some gems instead of manually downloading them. If you don’t install the devkit some gems will fail to install since they can’t compile to a native extension. To get around that you can also use –platform=mswin32 when you install a gem.

Example: gem install fastercsv –platform=mswin32 will fetch the precompiled windows gem if it exists. for more information on the devkit check out this link. http://www.akitaonrails.com/2008/7/26/still-playing-with-ruby-on-windows

 

Databases:

Sqlite

http://www.sqlite.org/download.html

http://www.sqlite.org/sqlite-3_6_23.zip

http://www.sqlite.org/sqlitedll-3_6_23.zip

By default when you create an application Ruby on Rails defaults to sqlite3. If you want to use Sqlite3 you will need to download the dll’s and the command line executable.

MySQL

http://www.mysql.com/downloads/mysql/5.1.html 

Since I couldn’t get our site to talk to SQL Server we are staying on MySQL for now. You can use the latest installer 64 bit or 32 bit but you must have the 32 bit library for Ruby to work properly. The libmySQL.dll from the 5.0.15 install did the trick for me.

http://downloads.mysql.com/archives.php?p=mysql-5.0&v=5.0.15 

SQL Server

If you would like to try the SQL Server adapter just do a gem install activerecord-sqlserver-adapter to get it.

 

HTTP Ruby Server:

I went with mongrel, it may not be the best but it was pretty easy to setup and get up and running. I am running version 1.1.5 right now since that seems to work best with mongrel_service which you will need if you don’t want to stay logged into your web server with a dos prompt running. I opted for the latest beta of mongrel_service since it cut out some dependencies and seems pretty stable at the moment. As always adding –pre gets the latest beta gem. Also, –include-dependencies will grab everything the gem will need to run, including mongrel.

 

Gems specific to my install:

Substruct uses rmagick for thumbnail generation, which requires image magic to do the actual work.

http://www.imagemagick.org/script/binary-releases.php?ImageMagick=dv31jd0gev1d3lk182a4pma8i6#windows

http://www.imagemagick.org/download/binaries/ImageMagick-6.6.0-7-Q8-windows-dll.exe

Redcloth is a textile markup language for Ruby. If you didn’t install the devkit don’t for get to add –platform=mswin32 to your gem install commands.

http://rubyforge.org/frs/?group_id=216&release_id=36337

 

My installation steps:

Install Ruby

Make sure c:rubybin (or where you installed it to) is in the path. I recommend a path with no spaces so no c:program files.

extract the devkit to the c:ruby directory.

extract the sqlite exe and dlls to c:rubybin

extract libmySQL.dll from the 32 bit 5.0.15 archive

Open an command prompt with administrator privileges.

Issue these commands:

gem update –system

gem install rails –no-ri –no-rdoc

gem install sqlite3-ruby –no-ri –no-rdoc

gem install mysql –no-ri –no-rdoc

gem install mongrel_service –no-ri –no-rdoc –platform mswin32 –include-dependencies –pre

 

After that install any gems you need for your Rails app. Make sure and test that your app works in production mode with mongrel before anything else. There will be some kinks to work out I’m sure. Once you are happy that everything is running as expected you can install your mongrel service.

mongrel_rails service::install -N MyAppsServiceName -c c:appmyapp -p 3000 -e production

The –N is the service name. –c is where the app will be served from. –p is the port number that it will listen on. –e is the mode it will run in like development or production. I chose a few high ports 3000 to 3008 for my services to run in.

You can always remove a service if something is wrong.

mongrel_rails service::remove -N MyAppsServiceName

 

Setting up IIS7:

Install the application request routing 2.0 and URL Rewrite plug-ins using the Web Platform Installer

http://www.microsoft.com/web/Downloads/platform.aspx

Once that is done you will need to create a new web farm.

 create farm

Next you will need to add at least one server entry. You may want to edit your host file and add additional aliases to your IP Address so you can run multiple copies of mongrel to service all request. The recommendation is one per cpu/core.

add server

The last step the wizard ask if you want to add the routing rules. The answer is yes.

add rules

You can confirm the routing rules are in place.

edit inbound rule

Make sure you have a website in IIS running and listening on port 80. Without this there is nothing for IIS to route to your new server farm.

 

If you have any questions post them up. I’m not a Rails expert but I have just been through the pain of Rails on Windows!

What I’ve Read and Recommend to Others – General Database and Theory by C.J. Date

Date on Database: Writings 2000-2006
This is a collection of writings by C.J. Date, one of the fathers of the relational model. It has a nice tribute to E.F. Codd, inventor of the relational model.
If you are looking for tips and insights into relational databases on a higher level this is a solid read. 
SQL and Relational Theory: How to Write Accurate SQL Code
Another solid text from Date. This one helps you understand the theory so you can write effective real world code. It has lots of solid examples.
It covers topics like granting access directly to the data or through views. How NULLS effect the answers returned by your queries. Advanced coverage
of constraints. I’m currently reading this book again. I don’t think you could absorb all of the goodness this book has to offer in one pass.
Temporal Data & the Relational Model (The Morgan Kaufmann Series in Data Management Systems)
If you are working with dates and time this book will teach you a new modeling technique. Unlike Domain Key Normal Form this builds on the previous
Normalization rules and is a logical extension of them. It does use Tutorial D to explain its examples which is kind of a pain.
Database in Depth: Relational Theory for Practitioners
Another fundamentals book that transcends any particular product line and gets to the heart of the relational model. It is a short but concise read and one
I generally recommend to people wanting to expand their theoretical base.
Refactoring Databases: Evolutionary Database Design
If you have ever had to refractor a database this book is for you. I’ve read and re-read this book over the last few years. It isn’t the easiest read in the world
but it can help you apply what you have learned from the books above to your current database without having to start from scratch.

 

As always if you have any questions or want to suggest a book let me know!