Wednesday, January 31, 2018

SQL Server Frequent mirror failovers, think beyond DBA skills

As you are a DBA you might have observed the primary databases configured in mirror gets failed over frequently / unexpectedly which causes downtime to the users.   So how do you start troubleshoot.... probably you could see the database status and once you find this was failed over to secondary, then how do you verify as you have all connectivity in place from secondary otherwise bring it back to the primary server to make sure the application connectivity remains to the users. 

But how do you troubleshoot to fix that unexpected errors, so that it will not happen again.

The first step is to start analyzing the SQL Server logs ....  you would get lot of information from it ;

SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\MSDBLog.ldf] in database [msdb] (4). The OS file handle is 0x000000000000

 
The mirroring connection to "TCP://xxxx_mir.xxx.isyntax.net:5022" has timed out for database "xxxxx" after 10 seconds without a response. Check the service and network connections.

And check in Windows Event logs as well where you can find information for hardware & OS related.

So, one of these scenarios how do you troubleshoot further and fix the issue. 

1) You could change the timeout increased to a higher value, so that if there is no response from the IO or network you can avoid frequent failovers.
2) Analyze the IO errors working with storage admin then take up the action accordingly.
3) If you observe much load happening it requires system to be tuned.
4) See any disk related errors locally and sufficient disk space available.

In my case I have observed the server is a shared database node in VMware cluster and the SAN which is hosted the storage allocated to the database node is used for the other nodes as well.    So it looks like recently there were few more nodes added which kept more load on it, hence the IOPs is not happening with the desired rate.   

Hence SQL server is getting timed out as there is no response when it writes the data on the data file.   I worked with storage admin and move one of the other node to different VMcluster and that is connected to an another SAN together.  So reduced the IO load.

Hope this info helps you troubleshooting the mirror failover issues.
 

Tuesday, January 23, 2018

SQL Code/Procedure development suggestions

Most of you familiar writing stored procedures in SQL Server, but if you follow the development standards (in my experience)  which would help debugging the code (or) in case code modifications required it is easy to point out. 

Hope the following SQL coding rules help you while writing the SQL procedures & functions.

  • Header part of the script should be updated reflecting the current changes to the script with the version details.  Suppose it is a new procedure you could mention as initial draft/release.  Later it can be added with all subsequent changes.
  • Alignment of the script content should be proper.  Proper alignment shall help programmer to read the code easily.
  • SNAPShot isolation helps reduce the locks, If the DB isolation is Snapshot and the SNAPSHOT isolation is supposed to be used in the scripts then verify that 'SET SNAPSHOT ISOLATION' is mentioned in the script. If in doubt confirm with the team which has sent the script for review and if 'SNAPSHOT ISOLATION' is mentioned, verify that SNAPSHOT ISOLATION is enabled at the database level too
  • 'NOLOCK' can be used in the code when the diry reads are allowed which is less lock operation.
  • 'SET NOCOUNT ON' statement should be mentioned in the beginning of each procedure. If the scripts require the counts and do not make use of SET NOCOUNT ON, make sure that it uses @@ROWCOUNT server variables instead, for fetching the counts.
  • It is good to assign input parameter variables to the local variables then use it in procedure code.
  • Fixed length columns in the table should use CHAR datatype all the times.
  • TOP Statement might be required/might be present in the scripts. But verify if the TOP statement mentioned in the script is intentional/required.
  • Any/All temp tables created in the procedure should be dropped at the end of the procedure
  • SELECT statements using temp tables do not require NOLOCK as locks for the temp table maintained internally by SQL Server.
  • If any other database references are present in the script, verify that the database is present, always use if exists clause to validate these cases.
  • If LINKED Server references are present in the script, verify that the correct LINKED server are referenced and the LINKED Server is present in the appropriate environment.  Better to use If exists clause.
  • Instead of multiple SET statements, a single SELECT statement can be used to assign value to variables that avoid confusion.
  • Avoid using 'SELECT *' instead, use specific column name(s) in the SELECT statement.
  •  Avoid using table variable if data stored in the table is too high.
  • Avoid using CTE's if data stored in the corresponding CTE is too high.
  • Avoid using co-related queries. See if the same can be resolved using subqueries instead.
  • Avoid using CURSOR's if possible, try while instead.  If CURSOR is needed & used, check the CURSOR is CLOSED & DEALLOCATED before end of the procedure.
  • Trigger should not be created on any table. If it is absolutely necessary see if an alternative approach can be used to eliminate the trigger.
  • Check for Unused variable declarations in the procedures, this can be removed to have only required code.  Dead code / commented code should not be there in the procedure.
  • Proper error handling should be done in the code, explaining the error handling in comments would help review the code.  
  • Execution Plan of the procedure could help validating the logic and query optimization if needed.
  • Validating the performance of the procedure can help figure out the performance bottlenecks.  Execute the procedure using parameters supplied in test environment atleast.