Monthly Archives: July 2010

The Dangers of Sub-queries!!!!

Ok, now that I have your attention this really should be titled the danger of not qualifying all objects, but that isn’t quite as sensational enough to make you click through to here 🙂

Imagine if you will, a developer expects ten records to be inserted into a table and the query has been running for almost an hour.  A quick glance and sp_who2 shows that it is using a lot of IO, I mean a lot. The first thing that happens is a command decision from the boss to kill the query. I took a quick peek and found the plan still in the plan cache while we waited for this thing to roll back. As I look at the plan I see something really wrong. It looks like the table receiving the inserts isn’t getting the ten records we thought but 169 million records, every record from the table in the select clause.

I look at the query and on the surface everything looks good. When I hit the check mark it compiles without error. Looking at the plan in detail I notice a little icon that tipped me off on where to look next.

image

Oh that’s bad. Warnings: No join predicate. Ouch. But this is a sub-query, how can I have no join predicate? Lets take a look at an example.

DROP TABLE #t1
DROP TABLE #t2
GO
CREATE TABLE #t1 (
  t1id    INT   NOT NULL,
  t1name  VARCHAR(50)   NULL,
  t1place VARCHAR(50)   NULL)
  
ALTER TABLE #t1
 ADD   PRIMARY KEY ( t1id )
 
CREATE TABLE #t2 (
  t2id    INT   NOT NULL,
  t1name  VARCHAR(50)   NULL,
  t1place VARCHAR(50)   NULL)
  
ALTER TABLE #t2
 ADD   PRIMARY KEY ( t2id )
 
INSERT INTO #t1
SELECT 
  spid,
  loginame,
  hostname
FROM   
  MASTER.dbo.sysprocesses
WHERE  hostname <> ''

INSERT INTO #t2
SELECT 
  spid,
  loginame,
  hostname
FROM   
  MASTER.dbo.sysprocesses
WHERE  hostname <> ''

This gives us two tables with some data. We want to find all the records in #t1 that also exist in #t2 but instead of a join we will use an IN and a sub-query.

SELECT 
  t1name,
  t1place
FROM   
  #t1
WHERE  t1id IN (SELECT 
                 t1id
                FROM   
                 #t2)
 

Looks simple enough and will pass the compile test all day long even though t1id doesn’t exist in #t2. Since you can use columns from the top level query in the sub-query this is a perfectly valid piece of T-SQL.

It gives us this plan when we look at it though.

image

And there you have it, a join without an ON clause as far as the optimizer is concerned. By not qualifying all the columns in the sub-query we opened ourselves up to this error. This isn’t the optimizers fault! If we re-write this as it was mean to be:

SELECT 
  t1name,
  t1place
FROM   
  #t1
WHERE  t1id IN (SELECT 
                 t2id
                FROM   
                 #t2)
 

We get a plan that is more to our liking.

image

You could also re-write this as a join instead of using the IN clause and would have avoided this problem as well. Scary way to learn that SQL Server will do exactly what you tell it to even if you are wrong!

Fusion-IO releases new 2.1 driver and firmware

And it is well worth the upgrade. I recently had the opportunity to interview David Flynn CEO of Fusion-IO and that will be coming up soon. I have been beta testing the 2.0 driver for quite some time and have been very happy with the performance and reduction in required system memory (by half!). The 2.1 driver is the official release of the 2.x series and has some gains even over the 2.0 drive I’ve been testing. I always to a little test run with HD Tach before diving into my other more detailed tools and right off the top the 2.1 driver is faster yet again than the 1.27 and the 2.0 driver. The blue is the 2.0 the red is the 2.1. I don’t know about you but getting a performance bump from a firmware and driver upgrade is always a good thing!

 

image

Creating UDL Files On The Fly

Many years ago, in the dark ages of DTS I created a little app that would take two parameters and build a UDL file. This an some crafty VBScript allowed me to loop through a list of servers on the fly. I haven’t thought about this code in almost ten years when I came across John Paul Cooks’ blog post on using UDL files. I thought I’d just post up the code, it is basic but got the job done! I did clean it up a bit since it was written when .net 1.0 was all the rage.  The secret sauce is writing the file encoded Unicode and doing binary writes to get all the little bits in the correct palace. the UDL file format is picky that way. Enjoy!

using System;
using System.Collections.Generic;
using System.Text;
using System.Data;
using System.IO;
namespace CreateUDL
{
 class Program
 {
 static void Main(string[] args)
 {
 string str2 = "";
 string str3 = "";
 try
 {
 str2 = args[0];
 str3 = args[1];
 }
 catch
 {
 if (args.Length < 2)
 {
 Console.WriteLine("Not enough arguments!");
 return;
 }
 }
 str2 = str2 + str3 + ".udl";
 if (str2.Length == 0)
 {
 Console.WriteLine("must provide file name");
 return;
 }
 if (str3.Length == 0)
 {
 Console.WriteLine("must provide server name");
 return;
 }
 else
 {
 try
 {
 FileStream output = new FileStream(str2, FileMode.OpenOrCreate);
 BinaryWriter writer = new BinaryWriter(output);
 writer.Write((short)(-257));
 string s = "[oledb]rn";
 byte[] bytes = new byte[(s.Length * 2) + 1];
 bytes = Encoding.Unicode.GetBytes(s);
 writer.Write(bytes);
 s = "; Everything after this line is an OLE DB initstringrn";
 bytes = new byte[(s.Length * 2) + 1];
 bytes = Encoding.Unicode.GetBytes(s);
 writer.Write(bytes);
 s = "Provider=SQLOLEDB.1;Integrated Security=SSPI;"+
 "Persist Security Info=False;"+
 "Initial Catalog=master;Data Source=" + str3 + "rn";
 bytes = new byte[(s.Length * 2) + 1];
 bytes = Encoding.Unicode.GetBytes(s);
 writer.Write(bytes);
 writer.Flush();
 writer.Close();
 writer = null;
 output = null;
 }
 catch(Exception e)
 {
 Console.WriteLine(e.Message);
 }
 }
 }
 }
}



Building My New Home, SQL Server IO

April of 2009 I decided it was time to blog about my SQL Server experiences in earnest. Steve Jones (twitter|blog) over at SQL Server Central was very supportive and gave me some space there. I’ve been playing with diffrent blogging platforms over the last six months and have settled on wordpress. I’m also taking some advice from Brent Ozar (twitter|blog) and syndicating it out to the world. Again, Steve has been nice enough to just feed my new blog into my old blog space on SSC so there won’t be any loss of old articles or comments on them. With that said you can visit me at http://www.sqlserverio.com I plan to grow it to be more than a blog but who knows how that will turn out!

The PASS Session Selection Process, My Experience Part 3

In my last two posts I talked specifically about the process. This round I’m going to discuss the tools of the trade. Note: These are my views and opinions and not that of PASS in any way shape or form.

This was my first year on the program committee. From talking to other people on the committee, there have been several tools and methods used in the past to do the work of selecting abstracts. I can’t speak to the previous tools just to this years. And it will change again next year. PASS is a dynamic, volunteer, organization things can change pretty quickly when they need to. I also have to say that Elena Sebastiano and Jeremiah Peschka were both very helpful and responsive to questions. Lance Harra was also on the Professional Development track team and was easy to work with. This isn’t Lance’s first time doing this and for that I was grateful. He helped me stay focused and really helped guide the selection process. In all, the final selection process was pretty smooth in that regard.

The Tool
At first blush the web based tool we were to use seemed pretty simple. Jeremiah did a training session and I felt like I could use it without much fuss. The tool is integrated into the main PASS website, which is based on DotNetNuke. Since they have to work inside the DNN framework there are some limitations. It has a limited amount of space to display a lot of information. In the inner panel you could have to scroll down and to the right.

The main page has all the sessions listed, but on multiple pages. I actually missed this my first night going through the first page of submissions. I thought there were only 20~ submissions because I couldn’t see the page counter until I scrolled the frame all the way to the bottom.

The detail page was laid out pretty well the first thing I would change if possible would be the column names. They looked just like that, camel case column names. Secondly, the section to enter notes and set ratings was a little slow to use. You had to click on a rate button fill out the form and submit. This requires a server round trip every time. It does keep you from losing anything you have put into the form so far though.

When you are done with the detail ratings you are back to the main page for your final ratings. Again every button push was a server call and got tiring at times. This is just your final ratings and reason for rejection or to approve an abstract. You also have to set a final reason that ultimately is used by the heads of the program committee to pick the session list.

This brings me to the selection of reasons an abstract was rejected. I have to say it was limited and was difficult to choose. There isn’t a “You were awesome but not enough slots” in the drop down. We have to put in a reason so I tried to pick the most appropriate one I could.

In all, the tool was functional and allowed us to do the work. Again, this is the first year for this tool and I’m sure it will undergo some changes.

Odds and Ends

One of the things I thought was odd was the lack of knowledge sharing. I could see my partners totals for each submission but not any of the notes. Since we aren’t in the same room let alone in the same state it pretty much means out of band emails or phone calls to talk about abstracts. Also, as my first time doing this it would have been nice to see why Lance had rated an abstract the way he did.

After talking with some of the submitters, it appears that they don’t get any feedback on why they were chosen or not, just what we picked in the reasons drop down. I took notes on pretty much every abstract with the assumption that it would be fed back to the submitter, so if they chose to submit again next year they wouldn’t make some of the same mistakes.

Lastly, a speaker is limited to the number of sessions they can present. This guarantees that you don’t see the same three people the whole summit. The problem is we don’t know if they have been chosen more than the allotted times. If we pick them then they get pulled for another track they have to depend on our alternate selection to fill a slot. We did some second guessing on some folks with the assumption they would be gobbled up by other tracks. In hind sight I it would have been helpful if we knew the person had put in say five submissions to and what tracks to make our choices a little better. Possibly prioritize the tracks and publish to the tracks down stream who is off the table. Maybe even allow the submitter to put a preference on their submissions so we have just that little bit more information on what they would like to speak on as well.