When Technical Support Fails You

I have had the pleasure of being a vendor, and technical support for both hardware and software products. I know it isn’t easy. I know it isn’t always possible to fix everything. The level of support I’ve received from HP on my current issue is just unacceptable. This is made more frustrating by the lack of documentation. The technical documents show capacity. How many drives in an array, Maximum volume size but nothing on throughput.Every benchmark they have seems to be relative to another product with no hard numbers. For example, the P800 is 30% faster than the previous generation.

I’m not working with a complicated system. It’s a DL380 G5 with a P800 and two MSA70’s fully populated with 15k 73GB hard drives. 46 of them are in a RAID 10 array with 128k stripe. Formatted it NTFS with a 64k block size and sector aligned the partition. Read/Write cache is set at 25%/75%. This server originally just had one MSA70. We added the second for capacity expansion and expected to see a boost in performance as well. As you can probably guess, there wasn’t any increase in performance at all.

Here is what I have as far as numbers. Some of these are guesses based on similar products.

P800 using two external miniSAS 4x connectors maximum throughput of 2400 MB/sec (2400Mbit per link x 4 per connector x 2 connectors).
The P800 uses a PCIe x8 connection to the system at 4,000 MB/Sec (PCIe 2.0 2.5GHz 4GB/sec each direction).
Attached to the controller are 15k 73GB 2.5” hard drives 46 of them for a raw speed 3680 MB/Sec of sequential read or write speed (23x80MB/sec write sequential 2 MSA70’s RAID 10 46 Drives total based on Seagate 2.5 73GB SAS 15.1k)

Expected write speed should be around 1200 megabytes a second.

We get around 320 MB/Sec sequential write speed and 750MB/sec in reads.

Ouch.

Did I mention I also have a MSA60 with 8 7.2k 500GB SATA drives that burst to 600MB/sec and sustain 160MB/Sec writes in a RAID 10 array? Yeah, something is rotten in the state of Denmark.

With no other options before me I picked up the phone and called.

I go through HP’s automated phone system, which isn’t that painful at all, to get to storage support. Hold times in queue were very acceptable. A level one technician picked up the call and started the normal run of questions. It only took about 2 minutes to realize the L1 didn’t understand my issue and quickly told me that they don’t fix performance issues period. He told me to update the driver, firmware, and reboot. Of course none of that had worked the first time but what the heck, I’ll give it the old college try. Since this is a production system I am limited on when I can just do these kinds of things. This imposed lag makes it very difficult to keep an L1 just sitting on the phone for five or so hours on hold while they wait for me to complete the assigned tasks. I let him go with the initial action plan in place with an agreement that he would follow up.Twice I got automated emails that the L1 had tried to call and left voicemails for me. Twice, there were no voicemails. I sent him my numbers again just to be on the safe side. Next, I was told to run the standard Array Diagnostic Utility and a separate utility that they send you to gather all the system information and logs, think a PSSDiag or SQLDiag. After reviewing the logs he didn’t se anything wrong and had me update the array configuration utility. I was then told they would do a deeper examination of the logs I had sent and get back to me. Three days later I got another email saying the L1 had tried to call and left me a message. Again there was no voicemail on my cell or my desk phone. I sent a note back to the automated system only to find the case had been closed!

I called back in to the queue and gave the L1 who answered my case number, he of course told me it was closed. He read the case notes to me, the previous L1 had logged it as a network issue and closed the case. If I had been copying files over the network and not to another local array I can see why it had been logged that way. I asked to open a new case and to speak to a manager. I was then told the manager was in a meeting. No problem, I’ll stay on the line. After 45 minutes I was disconnected. Not one to be deterred, I called back again. The L1 that answered was professional and understanding. Again, I was put on hold while I waited for the manager to come out of his meeting. About 10 minutes later I was talking to him. He apologized and told me my issues would be addressed.

I now had a new case number and a new L1. Again, we dumped the diagnostic logs and started from the beginning. This time he saw things that weren’t right. There was a new firmware for the hard drives, a new driver for the P800, and a drive that was showing some errors. Finally, I felt like I was getting somewhere! At this point it has been ten days since I opened the previous case. We did another round of updates. A new drive was dispatched and installed. The L1 did call back and actually managed to ether talk to me or leave a message. When nothing had made any improvement he went silent. I added another note to the case requesting escalation.

That was eight days ago. At this point I have sent seven sets of diagnostic logs. Spent several hours on the phone. And worked after hours for several days. The last time I talked to my L1, the L2’s were refusing to accept the escalation. It was clearly a performance problem and they don’t cover that. The problem is, I agree. Through this whole process I have begged for additional documentation on configuration and setup options, something that would help me configure the array for maximum performance.

They do offer a higher level of support that covers performance issues, for a fee of course. This isn’t a cluster or a SAN. It is a basic setup in every way. The GUI walks you through the setup, click, click, click, monster RAID 10 array done. What would this next level of paid support tell me?

My last hope is CDW will be able to come through with documentation or someone I can talk to. They have been very understanding and responsive through this whole ordeal.

Thirty one days later, I’ve still got the same issue. I now have ordered enough drives to fill up the MSA60. The plan is to transfer enough data to free up one of the MSA70’s. Through trial and error, I will figure out what the optimum configuration is. Once I do I’ll post up my findings here.

If any of you out there in internet-land have any suggestions I’m all ears.