Tuesday, 16 October 2012

Can restoring a SQL database bring down the entire virtual infrastructure?



Can restoring a SQL database bring down the entire virtual infrastructure?

Issue
Few days ago, we ran a simple database restore as we were in the process of migrating a system. The restore should have completed relatively quickly but we were waiting for an hour and we knew something was not right. Meanwhile we started receiving calls from here, there and everywhere asking if there is anything wrong with the servers.

Diagnosis
We started having a look at our Vsphere virtual infrastructure as 90% of our estate is virtualised and straight away noticed that there are read write errors on some of our virtual servers. Anyone who has worked on VMware for few years would think ‘storage’ almost instantly. We use HP EVA 6400 and all our storage is presented from a single disk group; we have about 140 disks in the group, all FCAL 600GB.

Action
We have had couple of issues with our Storage performance in the past but nothing major but still believe there are some issues with the storage. We decided to create a new disk group and present the storage to the SQL Clustered instance through this disk group and ran the restore to the LUN presented from this disk group. This time restore completed in less than 10 minutes without any issues at all.

Unfortunately the inbuilt monitoring capabilities of the SAN are not great, so we decided to send the logs to the SAN Vendor so that we could find out, what went wrong? And if we have too much running on the 140 disks. Unfortunately the performance counters had to be reset before capturing the logs, which wouldn’t have been very useful to us, but we sent the logs anyway to see if there are any other issues. The results were clear and the engineers could not find a lot wrong with the SAN.

Lessons learnt

  • Don’t put all your eggs in one basket - Plan and Test

  • Importance of having monitoring systems so that you can backtrack!

  • If the inbuilt monitoring capabilities of the product are not great, there is a possibility of integrating third party monitoring tools (there are a lot of them)

  •  If you have the luxury of setting up a test environment and do some testing – DO IT!!

  • Sometimes even the experts don’t have the right answer so do your research (I work as a contractor and see this again and again), ask your connections, and ask questions on the relevant forums.


1 comment:

  1. Update: There is a Performance Advisor tool available from HP which can be brought as a licenced feature, 1 licence per array. The command view needs to be updated to version 10 though.

    ReplyDelete