Can restoring a SQL database bring
down the entire virtual infrastructure?
Issue
Few days ago, we ran a simple database restore as
we were in the process of migrating a system. The restore should have completed
relatively quickly but we were waiting for an hour and we knew something was
not right. Meanwhile we started receiving calls from here, there and everywhere
asking if there is anything wrong with the servers.
Diagnosis
We started having a look at our Vsphere virtual infrastructure
as 90% of our estate is virtualised and straight away noticed that there are
read write errors on some of our virtual servers. Anyone who has worked on VMware
for few years would think ‘storage’ almost instantly. We use HP EVA 6400 and
all our storage is presented from a single disk group; we have about 140 disks
in the group, all FCAL 600GB.
Action
We have had couple of issues with our Storage performance
in the past but nothing major but still believe there are some issues with the
storage. We decided to create a new disk group and present the storage to the
SQL Clustered instance through this disk group and ran the restore to the LUN
presented from this disk group. This time restore completed in less than 10
minutes without any issues at all.
Unfortunately the inbuilt monitoring capabilities
of the SAN are not great, so we decided to send the logs to the SAN Vendor so
that we could find out, what went wrong? And if we have too much running on the
140 disks. Unfortunately the performance counters had to be reset before
capturing the logs, which wouldn’t have been very useful to us, but we sent the
logs anyway to see if there are any other issues. The results were clear and
the engineers could not find a lot wrong with the SAN.
Lessons
learnt
- Don’t put all your eggs in one basket - Plan and Test
- Importance of having monitoring systems so that you can backtrack!
- If the inbuilt monitoring capabilities of the product are not great, there is a possibility of integrating third party monitoring tools (there are a lot of them)
- If you have the luxury of setting up a test environment and do some testing – DO IT!!
- Sometimes even the experts don’t have the right answer so do your research (I work as a contractor and see this again and again), ask your connections, and ask questions on the relevant forums.