So long, and thanks for all the fish.

It has been a month of changes for me and my family. First we lost a very close and dear friend. Next, we are taking in that friends daughter so she can start her education. That meant we needed to quit looking for a house and buy one, which we did. Now, I have decided to leave eScan Data Systems and pursue a new opportunity with Dell.

I am very grateful for my time at eScan. I made lots of friends. Learned new things, and helped the company manage the explosive growth it has seen in the last two years. I’m sure that some of the connections I made at eScan will be life long ones. Now, I’m joining Dell as a Database Administrator Consultant. I’ll be working on some of the largest and most complicated SQL Server deployments in the world. It will be a challenge. There will be new friends to make and new things to learn. One of my friends at eScan has said since day one that I was over qualified to be there. I disagree, it was the right place for me to be for eScan and for my personal life. It allowed me to balance my work life and home life much better than I had ever done before. It also allowed me to use my expertise in an environment that thrived on data at its core.

It has been a time of sad endings and new beginnings. So long, and thanks for all the fish.

SQL Server, Storage and You

Just a note that I will start my three part webcast, SQL Server, Storage and You next week April 13th at 2PM CST. I’m excited to have this opportunity to speak to a much wider audience on something that I love so much. When Idera approached me last year about doing a three part series I was nervous to say the least. I’ve always taught in a live setting with students or attendees right in front of me. Luckily, this isn’t my first time doing something like this. As some of you know I was actually a mass communications/theater major in college and worked in radio. I’m having to reach back and dust off some of these skills. I am confident that it will go smoothly. Registration is free and they record the session for later viewing as well.

Register now

SQL Server, Storage and You – Part I: Storage Basics

Just like building a house we must first lay the foundation. This presentation will take you through low level fundamentals that we will use later on as we grow your storage knowledge. Starting with how data moves inside your server. How hard disks work. You will also get a primer on RAID configuration and how to mitigate drive failures and data loss. Wrapping up with a file system primer and how to configure your storage with SQL Server in mind.

Thanks again to Idera and MSSQLTips!

Idera logo MSSQL Tips logo

SQLSaturday #63, Great Event!

So,

I actually had a early morning sessions and gave my Solid State Storage talk and had a great time. The audience was awesome asked very smart questions and I didn’t run over time. The guys and gals here in Dallas have put on another great event and it isn’t even lunch time yet!

As promised here is the slide deck from todays session. As always if you have any questions please drop me a line.

Solid State Storage Deep Dive

Changing Directions

I See Dead Tech….

Knowing when a technology is dying is always a good skill to have. Like most of my generation we weren’t the first on the computer scene but lived through several of it’s more painful transitions. As a college student I was forced to learn antiquated technologies and languages. I had to take a semester of COBOL. I also had to take two years of assembler for the IBM 390 mainframe and another year of assembler for the x86 focused on the i386 when the Pentium was already on the market. Again and again I’ve been forced to invest time in dying technologies. Well not any more!

Hard drives are dead LONG LIVE SOLID STATE!

I set the data on a delicate rinse cycle

I’m done with spinning disks. Since IBM invented them in nineteen and fifty seven they haven’t improved much over the years. They got smaller and faster yes but they never got sexier than the original. I mean, my mom was born in the fifties, I don’t want to be associated with something that old and way uncool. Wouldn’t you much rather have something at least invented in the modern age in your state of the art server?

Don’t you want the new hotness?

I mean seriously, isn’t this much cooler? I’m not building any new servers or desktop systems unless they are sporting flash drives. But don’t think this will last. You must stay vigilant, NAND flash won’t age like a fine wine ether. There will be something new in a few years and you must be willing to spend whatever it takes to deploy the “solid state killer” when it comes out.

Tell Gandpa Relational is Soooo last century

The relational model was developed by Dr. EF Codd while at IBM in 1970, two years before I was born. Using some fancy math called tuple calculus he proved that the relational model was better at seeking data on these new “hard drives” that IBM had laying around. That later tuned into relational algebra that is used today. Holy cow! I hated algebra AND calculus in high school why would I want to work with that crap now?

NoSQL Is The Future!

PhD’s, all neck ties and crazy gray hair.

Internet Scale, web 2.0 has a much better haircut.

In this new fast paced world of web 2.0 and databases that have to go all the way to Internet scale, the old crusty relational databases just can’t hang. Enter, NoSQL! I know that NoSQL covers a lot of different technologies, but some of the core things they do very well is scale up to millions of users and I need to scale that high. They do this by side stepping things like relationships, transactions and verified writes to disk. This makes them blazingly fast! Plus, I don’t have to learn any SQL languages, I can stay with what I love best javascript and JSON. Personally, I think MongoDB is the best of the bunch they don’t have a ton of fancy PhD’s, they are getting it done in the real world! Hey, they have a Success Engineer for crying out loud!!! Plus if you are using Ruby, Python, Erlang or any other real Web 2.0 language it just works out of the box. Don’t flame me about your NoSQL solution and why it is better, I just don’t care. I’m gearing up to hit all the major NoSQL conferences this year and canceling all my SQL Server related stuff. So long PASS Summit, no more hanging out with people obsessed with outdated skills.

Head in the CLOUD

Racks and Racks of Spaghetti photo by: Andrew McKaskill

Do you want this to manage?

Or this?

With all that said, I probably won’t be building to many more servers anyway. There is a new way of getting your data and servers without the hassle of buying hardware and securing it, THE CLOUD!

“Cloud computing is computation, software, data access, and storage services that do not require end-user knowledge of the physical location and configuration of the system that delivers the services. Parallels to this concept can be drawn with the electricity grid where end-users consume power resources without any necessary understanding of the component devices in the grid required to provide the service.” http://en.wikipedia.org/wiki/Cloud_computing

Now that’s what I’m talking about! I just plug in my code and out comes money. I don’t need to know how it all works on the back end. I’m all about convenient, on-demand network access to a shared pool of configurable computing resources. You know, kind of like when I was at college and sent my program to a sysadmin to get a time slice on the mainframe. I don’t need to know the details just run my program. Heck, I can even have a private cloud connected to other public and private clouds to make up The Intercloud(tm). Now that is sexy!

To my new ends I will be closing this blog and starting up NoSQLServerNoIOTheCloud.com to document my new jersey, I’ll only be posting once a year though, on April 1st.

See you next year!

The Mystery Of The Slow Last Row

Things Go South

Recently I was troubleshooting a piece of software that archives data out of a very active import table. It is a pretty simple app. It queries the table for any id’s that have data older than X days. It then queries the table again to pull the detail data generating a csv file, table schema script and compresses them for storage. Well, it was running slow. Much slower than expected in fact. It has to process around 5000 id’s and it was taking 50 to 240 seconds an id. This doesn’t sound slow but it adds up in a hurry. I started digging into the code looking at the normal culprits. Network slow? No. Disk IO a problem? No. It must be the compression engine! It takes the CPU to 100% and leaves it there. That metric by itself isn’t a bad thing but I just wasn’t seeing what the problem was. I plugged in other compression engines and tweaked the settings. Oddly enough, every engine performed about the same! I knew there was something I was missing. Stripped things away until I was left with one thing, the query that pulls the detail data from SQL Server.

The Big Boy(TM)

The table in question is big, due to the archive tool slowing down it has ballooned up to 240GB, in a single table, that shouldn’t be more than 10GB or 20GB at any one time. It gets most of the heft not in just rows but the width of the table. I took my sample slow query and started running permutations on it. In the grand scheme of things it wasn’t returning a large row count, around 183,000 or so. The data size was pretty large at 650MB. Still, not anything that would choke the network or our system by any means.

Turn The Knobs

I though it may be that the application was choking on the data size. So, I stepped out of the app and ran the query in SSMS. It ran in the same time that the program did as far as query execution. I put a TOP clause on it cutting the number of rows in half thinking I would get the data back in half the time. I was way off! Instead of the 40 seconds it was now taking around 5 seconds. I pulled the TOP clause off again and watched the data come back.  I noticed it returned data pretty quickly. As a matter of fact, it returned almost all the rows in just a few seconds. It would then spend the next say 30 seconds waiting for the last few rows. At this point I thought something was wrong with my workstation. I  ask a co-worker to run the same two queries, one with the TOP and one without. He got the exact same results I did! I started to panic a little bit. What if something was wrong with the server or there was corruption in the database? I decided to play with the TOP clause getting the count to within a couple of records before things would slow down dramatically. I looked at the crazy simple plan nothing to see here but a scan and return.

 Original Plan

Plan with TOP operator

Digging Deep, Pinging my Tweeps

If you aren’t on twitter you are missing out on a valuable resource. There are members of the SQL Server community that hang out watching #sqlhelp that might just be qualified to offer you an opinion on your problem. MVP’s, MCM’s and beyond! Sometimes they are just really smart people like Paul White (blogtwitter). He may not have any of the magic three letters in his title he is still extremely capable. If you aren’t reading his blog you need to start now. Anyway, I tweeted my dilemma and several people responded. I had already tried everything that people were suggesting Paul offered to take a look at the execution plans and see if he could figure it out. I was pretty sure it had something to do with the data layout on disk and the fact I only had a single index, the clustered key, on the table which I wasn’t searching on. What I expected to get was a full scan every time on the table. When I added the TOP clause SQL Server was smart enough to say “Stop scanning when X records are returned” So, when I returned all 182,853 records it had no choice to to scan the entire table.

I think Paul explains it pretty well:

It seems that the first 182,852 matches on (IDFK = 69468) appear relatively early in the clustered index (scanning in clustered key order).

The 182,853rd record appears to be much later in the clustered index – perhaps quite close to the end of the table, so SQL Server scans a large number of pages after finding the 182,852nd match before it encounters the next one.  If you select the clustered index key plus IDFK from the table, and number the rows using ROW_NUMBER(OVER ORDER BY (SELECT 0)) you should find a big gap in clustered key values between records 182,852 and 182.853.

Note – Which I did find – Wes

When you request the TOP (182852) records, the scan stops as soon as that many records are found – so the scan does not continue across the big gap in clustered index key values to find the 182,853rd value.

Note – I also played with SET ROWCOUNT and had the exact same results. – Wes

Side note.  The query does not request the results in clustered key order, but the storage engine has no other choice as the query is written.  If you add WITH (TABLOCK), WITH (NOLOCK), or otherwise modify the table so that no data changes are possible while the scan is running (perhaps by making the database read-only, or placing the table on a read-only file group) – the storage engine will be able to scan from the IAM pages, which will result in rows coming back in the order pages were allocated to the table.  Depending on the distribution of records with (IDFK = 69468), you might get better, or worse, performance – but the big gap between record #182,852 and #182,853 will almost certainly disappear.  Of course, the storage pattern might result in other such gaps!

Thanks Paul!

I did test the NOLOCK and TABLOCK, both had similar results that sped up the query. Alas, I couldn’t do ether since there is data changes taking place during the archive process. I opted for a covering index on the search key, which automatically includes the primary key and a bookmark lookup. Now it is faster and consistent on speed.

And that is how the mystery of the slow last row was solved!