As an old school DBA and always having cheap frugal employers, I have rolled my own management and monitoring solutions. I’ve shared bits and pieces of it over the years to colleagues and to the community but never the whole thing at one go. I did try and build a commercial version. With the birth of my son, doing contract work and building a company from scratch just wasn’t in the cards. So, my loss is your gain! I’ve decided to take what I have built up over the years, clean it up and post it as a series of articles. The code is hosted at github. It may take a while to unwind all the pieces, update it for SQL Server 2008 and clean it up for general consumption but when it is done you should have a solid foundation for a solution of your own.

My goal as a DBA is to provide secure, accurate, consistent and  efficient data access to all the databases under my care.

To meet those goals I should:

  1. Provide monitoring and alerting for critical errors.
  2. Provide reliable maintenance on all databases.
  3. Perform backup and recovery operations, including testing backups to guarantee recoverability.
  4. Analyze performance issues and be proactive about possible issues.
  5. Secure the databases and monitor for any breaches in security policies and procedures.

If you have a few SQL Server instances to manage a single person may be able to do these things without any level of automation. As you scale to dozens or hundreds of instances it becomes impossible to throw enough people at the problem. You get inconsistency across servers in configuration and management. Introduce inaccuracy in your monitoring and long term growth projections. Add a whole new level of unreliability as some servers just fall off the radar and your DBA team spends all their time reacting to problems.

My core philosophy for dealing with these issues is pretty straight forward.

Manage By Exception
This means setting up your monitoring and alerting systems to work for you not against you. To many alerts are just as deadly as too few. As you handle problems feed the solution back into your automated system.

Automate Everything
Make your servers work for you and not against you! There is always a cost up front to automate the management of your environment. The time you get back by not having your very valuable and highly skilled workers not doing menial tasks more than makes up for it.

Standardize Across Servers
As servers come on line you may find that they aren’t setup to the company policies. Fixing all of this by hand is also a huge time waster. Define a standard and modify it as needed. Through automation you can make a single change and then deploy it across your entire environment.

To meet these needs I’ve written several modules, a centralized system for managing them and the servers in my care. I will update this list with links as the articles come on line. If I use any code that I haven’t written I’ll be very clear who did write it, if I made changes to it and what the license to redistribute the code falls under.

Data Gathering

Monitoring And Alerting

  • Server Blocking
  • Check Backup Space
  • Server Database Transaction Rate
  • Server Poll
  • System Database Roles
  • Long Running Queries
  • Long Running Stored Procedures
  • Log Shipping

Administration

  • Backup Database
  • Update Statistics
  • Re-indexing
  • QA Refresh
  • Test Backups
  • Log Shipping

Reporting

  • Daily Status
  • Data Dictionary
  • Long Running Queries
  • Long Running Stored Procedures

Trend Analysis

  • Backup/Restore Failure Analysis
  • Backup Growth
  • Drive Space Detail
  • Server Blocking
  • Database I/O
  • Database Transaction Rate
  • Server Wait Statistics
  • Long Running Queries
  • Long Running Stored Procedures