SQLDIY: Gathering A Data Dictionary

In this installment we are looking at assembling a basic data dictionary from the column level meta data stored in SQL Server. This is a little different from the rest. You can still do quite a bit with the information provided but to leverage this to the fullest requires a bit of developer buy in. Or a bit more work for whomever is your schema designer. Extended properties have been available to you since the stone ages of SQL Server. Two fine fellows Mark Chaffin and Brian Knight(blog|twitter) expound on the power of extended properties in the article on Managing Metadata in SQL Server 2005. Well worth the read. Using a couple of stored procedures you can store some very rich metadata about every object in your database.

Here are some examples of how to add descriptions to your objects.

–Adding a description to a table
EXEC sp_addextendedproperty
@name = N’MS_Description’,
@value = N’New Description for the Server table’,
@level0type = N’SCHEMA’, @level0name = dbo,
@level1type = N’TABLE’, @level1name = Server;

GO

–adding a description to a column
EXEC sp_addextendedproperty
@name = N’MS_Description’
,@value = N’Login is trusted’
,@level0type = N’Schema’, @level0name = dbo
,@level1type = N’Table’,  @level1name = Server
,@level2type = N’Column’, @level2name = Trusted;
GO
–updating that column description
EXEC sp_updateextendedproperty
@name = N’MS_Description’
,@value = N’Login is windows authenticated’
,@level0type = N’Schema’, @level0name = dbo
,@level1type = N’Table’,  @level1name = Server
,@level2type = N’Column’, @level2name = Trusted;

Pretty simple, yet I’ve always received push back from both developers and database administrators. To me knowledge is power and building the documentation directly into the database metadata is an extremely powerful tool to have handy. At any point I can query sys.extended_properties on an object and find its usage, without having to guess by the object name what it may, or may not actually represent.

A second use case for having the metadata and a history of changes pretty straight forward, checking for schema changes. If you don’t want to implement DDL triggers or use a third party tool to watch for schema changes this is another handy tool in your tool belt. I’ve used this data to track schema deployed across servers to make sure that they are always in sync and if they are out of sync to send an alert.

Couple of quick notes for you. When you are querying sys.extended_properties minor_id is the column_id if you are looking at a class of 1. See sys.extended_properties on technet.

This one is a two in one enjoy!

Series to Date SQLDIY: Manage and Monitor SQL Server Yourself

Link to the script Gather Table Level Metadata

Link to the script Gather Database Level Metadata

I’m hosting all my scripts from this series on GitHub as I do with all my open source projects.

As always, if you find any bugs please let me know and I will correct them!

The Austin Fall Classic, SQLSaturday is Coming To Austin, TX

Come Get Your Learn On!

You heard right, on October 1st 2011 you need to be in Austin, TX. We have finalized our location and spot on the SQLSaturday list. After a rough start to the year I’ve managed to get my core team together and in action, settle my personal life down a bit and get this train back on the tracks. The call for speakers is open. This is your chance to start your speaking career. If you wish to volunteer for any part of the event please send an email to Michael.Kaplan@sqlposse.com. If you want to sponsor this event take a look at the sponsor page and let us know. You can email directly at wes.brown@sqlposse.com For more information please go to the SQLSaturday #97 page.

Expanding In Austin

To serve the needs of the SQL Server community in Austin, we are starting a second meeting time and place. We are talking with Experis, a leading technology recruiting firm, to use their offices at 301 Congress Ave. Response to the second meeting space and time has been very positive and I look forward to watching this monthly event grow. As soon as we are confirmed I will send out the inaugural meeting notification.

Watch out! Here Comes the POSSE

The SQL POSSE that is. To support our efforts to grow the SQL Server community in central Texas and the surrounding area I am proud to announce the formation of the Professional Organization of SQL Server Educators. Our purpose is to support local chapters with things like SQLSaturday or just helping out at the local user group level. Currently, CACTUSS and SALSSA are affiliated with POSSE and in the coming weeks I will be reaching out to other user groups to see if they are interested in pooling some resources. POSSE doesn’t replace your user group, it is a support facility to help get things done. POSSE is currently filing for federal non-profit status and state non-profit status in Texas. We should be fully recognized around the time of our SQLSaturday event. Once we have gone through this I will post full details on what we did, how we did it and what it cost to get it done. After that I’ll be more than happy to schedule a conference call with chapter leaders to discuss the details and answer questions. I’ve asked for years for anyone who has done this to share and got back cricket noises so I figured it was time to put up or shut up.

Its been a busy summer and it will be a busy fall too!

SQLDIY: Tracking Space Usage At the Index Level

I am a fan of information, the more the merrier. Anytime I’m tasked with tracking a production server over the long haul I try and gather as many data points as I can. You can capture database sizes but when your database blows up in size its nice to be able to drill down and find out if it was a new index or correlates with a change in code at a point in time. In this script I’m falling back on sp_spaceused to gather that information. I could query several DMV’s to get the same or more information. I could also put in DDL triggers for tracking things like schema changes if I need to the minute and alerting to go along with it. This particular view is more of a “close enough” look at your database in some detail. As we get into more detail index tracking you will see some more specific numbers on row counts, index size and other index specific details. You may see a pattern emerging here. My goal is to gather data in a timely and contestant fashion. We will also gather more detailed meta data on schema later as well.

Series to Date SQLDIY: Manage and Monitor SQL Server Yourself

Link to the script Gather Table Statistics

I’m hosting all my scripts from this series on GitHub as I do with all my open source projects.

As always, if you find any bugs please let me know and I will correct them!

UPDATE:

Aaron Bertrand sugested that I take a look at sys.dm_db_partition_stats instead of sp_spaceused and as usual he was spot on. By moving to sys.dm_db_partition_stats I was able to cut the table loop out completely and sped up the performance quite a bit. New version is up.

Understanding Benchmarks

That Means What?

Vizzini: HE DIDN’T FALL? INCONCEIVABLE.
Inigo Montoya: You keep using that word. I do not think it means what you think it means.
– Princess Bride

If you are like me, you are constantly reading up on the latest hardware. Each site has it’s own spin on what makes up its review. All of them use some kind of synthetic benchmarking software. Some don’t rely to heavily on them because they can show the real world performance using playback tools. This method is used heavily on gaming hardware sites like [H]ard|OCP where they decided long ago that using purely synthetic benchmarks were at best inaccurate and at worst flat misleading. In the graphics card and processor space this is especially so. Fortunately, on the storage side of the house things are a little simpler.

 

 

What’s In A Workload

In the processor space measuring performance is a complicated beast. Even though every processor may be able to run the same software they can vary wildly in how they do it. On the processor side of things I favor Geekbench right now since it uses known mathematical algorithms. John Poole is very open on how Geekbench works Are the benchmarks relevant to database workloads? I’ll be exploring that in a future post.

In the storage space we have a pretty standard benchmarking tool in Iometer. This tool was initially developed by Intel and spread like wildfire throughout the industry. Intel quit working on it but did something very rare, turned it over to the Open Source Development Lab for continued development. You may ask why I favor Iometer over SQLIO? The answer is simple, complexity. Iometer allows me to simulate diffrent read/write patterns in a very predictable manor. SQLIO doesn’t simulate complex patterns. It does reads or writes, random or sequential for a fixed duration. This is fine for finding the peak performance of a specific IO size but doesn’t really tell you how your storage system might respond under varying workloads. You my notice that they only sites that use SQLIO are SQL Server sites. While the rest of the world generally uses Iometer. The problem is none of the sites that I regularly visit publish the exact Iometer settings they used to get the results they publish. Tom’s Hardware, Anandtech, Ars Technica and Storage Review all use Iometer in some fashion. Doing some digging and testing like hard drives I think most of the sites are using a mix 67% reads 33% writes 100% random at an 2KB block which was defined by Intel and represents an OLTP workload. Storage Review did a nice writeup a decade ago on what they use for I/O patterns and Iometer. This isn’t the best fit for a purely SQL Server workload but isn’t the worst ether. By moving from a 2KB block to an 8KB block we are now squarely in SQL Server I/O land.

SQL Server Specific

Now we are starting to get to the root of the problem. All the main hardware review sites don’t focus on us at all. If we are lucky there will be a single column marked “Database workload”. So what do we do? You read, research and put together your own test suite. SQL Server I/O access patterns are pretty well documented.  So, I put those general patterns in a Iometer configuration file and keep it in my back pocket. I have posted a revised file in the My Tools section here on the site.

For the storage stuff that is fine but what about CPU and memory throughput? Things get a little murky here. Like Glenn Berry(blog|twitter) and I you can use Geekbench to get a baseline on those two things but again, this isn’t a SQL Server specific tool. In most cases sampling a workload via trace getting a baseline on performance then replaying that same workload on different servers will help but only tells you about your application. If you are looking for general benchmarks I personally wouldn’t put much stock in the old TPC-C tests anymore. They aren’t a realistic assessment of database hardware at this point. It is pretty easy to stack a ton of memory and throw a bunch of CPU’s at the test to get some ridiculous numbers. I personally look at TPC-E for OLTP tests since there is a decent sampling of SQL Server based systems and TPC-H for data warehouse style benchmarks. As always don’t expect the exact same numbers on your system that you see on the TPC benchmark scores. Even TPC tells you to take the numbers with a grain of salt.

My Personal Reader List

I personally follow Joe Chang (blog) for hard core processor and storage stuff. He has a keen mind for detail. I also read Glenn Berry(blog|twitter) he has some deep experience with large SQL Server deployments. Also, Paul Randal (blog|twitter) because he has more hardware at his house than I do and puts it to good use. I would advise you to always try and find out how the benchmark was performed before assuming that the numbers will fit your own environment.

What’s On My Todo List

I wrote a TPC-C style benchmark quite a while back in C#. I’m currently building up instructions for TPC-E and TPC-H using the supplied code and writing the rest myself in hopes of building up a benchmark database. This will be in no way an official TPC database or be without bias. I’m also always updating my Iometer and SQLIO tools as well with full instructions on how I run my tests so you can validate them yourself.

As always if you have any suggestions or questions just post them up and I’ll do my best to answer.

SQLDIY: Database Space Detail

Continuing to build up our data gathering library we shift our focus to tracking database file growth. This particular procedure compiles both data and log file usage. This is a fundamental aspect of tracking growth over time and being proactive about growing your databases yourself instead of letting the auto grow making  a mess of your databases. There is no clean and easy way to do this without temp tables so I just have to suck it up and use them. I considered going with a CLR routine and gather more data but we are assuming that you, the user, don’t have the CLR or OLE automation enabled. Personally, I wouldn’t have OLE automation enabled if at all possible. It is another surface area to secure and another source of odd memory issues. There are similar issues with the CLR. Other than the security concerns when there is memory pressure the CLR is ejected and may not be able to load again until the SQL Server process is recycled. With all that in mind I present to you Database Space Detail schema and script!

Series to Date SQLDIY: Manage and Monitor SQL Server Yourself

Link to the script Gather Database Space Detail

I’m hosting all my scripts from this series on GitHub as I do with all my open source projects.

As always, if you find any bugs please let me know and I will correct them!