Seeing orphaned transactions that we need to kill

Article
01/25/2012

Question

_{Wednesday, January 25, 2012 9:58 PM}

Of late, we are seeing some orphaned transactions on SQL that are holding locks and causing blocking,etc. What's a bit mysterious is the fact that when I run sp_who2 for this orphaned transaction is that the CPUTime and DiskIO remain static and do not change but when running a dbcc inputbuffer on that spid, we notice that after some time the statement changes, even with sp_who2 output for the spid not changing.. These happen on some occasions whereas on other situations tied to this orphan transactions, its stuck and its the same statement..

What can I do to ensure we do not encounter these orphaned transactions ? Is it some special way the code is written on the app or web servers as it makes this connection that we need to change ? Let me know if need to run anything specific to get more deeper into what's going on.

We are using SQL 2008.

All replies (7)

_{Thursday, January 26, 2012 1:20 AM ✅Answered}

Since you are on SQL Server 2008, stop using sp_who2 and make use of the DMV's or sp_WhoIsActive by Adam Machanic. You get much better information by going directly to the source.

You can use Extended Events and multiple targets to track down the root cause of this in the system.

http://sqlskills.com/blogs/jonathan/post/An-XEvent-a-Day-(11-of-31)-e28093-Targets-Week-e28093-Using-Multiple-Targets-to-Debug-Orphaned-Transactions.aspx

This is the fastest way to figure out what is leading up to the problem in a live system. I've caught the problem in a few minutes multiple times and then been able to use the information captured by Extended Events to work with developers to fix the application code.

Jonathan Kehayias | Principal Consultant, SQLSkills.com
SQL Server MVP | Microsoft Certified Master: SQL Server 2008
Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
Please click the Mark as Answer button if a post solves your problem!

_{Wednesday, January 25, 2012 10:13 PM}

How are you determining these are "orphaned"? SQL Server is very good at cleaning itself up.

Most likely these are open connections held by your application in a "connection pool". They are not orphaned, they are just waiting for the application to send the next command.

If the connection to the application is dropped, SQL Server would rollback the transaction and release the lock.

_{Wednesday, January 25, 2012 10:36 PM}

The transaction holds locks and blocks other spids. When you run an sp_who2 on this spid thats blocking others, it does not increase its CPUTime and DiskIO values that we have seen on other active transactions.. More important, sp_who2 does not even report this blocking spid when you run sp_who2 active. We believe the application does not drop the connection or if it does, it does not do so gracefully.

It does not happen all the time after we kill a few spids here and there.. and then a few weeks later, it pops up again.

_{Wednesday, January 25, 2012 11:29 PM | 1 vote}

I have viewed this scenario in a few environments that use multiple application servers using connection pooling through a middle tier. This seems to be an application issue (the ones I saw) where the application would begin a transaction, it would run, but it would never be rolled back or comitted. It took a few months and many logs later, we had the vendor finally do a code update and it fixed many of the issues.

This was particularly hard because the connections would stay open through the pooling and the application was browser based and didn't have any checking on users forcefully closing their browser or session (by either killing the process on their machine or closing their browser before clicking their "log out" button).

-Sean

_{Thursday, January 26, 2012 12:11 AM}

I second what Sean said, it is very very (very!) common to find apps that leak connections, especially from unhandled errors, and that that is what you are seeing.

It is a problem, they will build up and block stuff and kill your system, and if need be I suppose you can simply establish a rule that anything that looks orphaned for more than X minutes or hours, gets killed, if (a) it's not blocked, (b) it doesn't use X CPU or Y DiskIO in Z seconds, LastBatch < T, and SPID > 50, and maybe filters on others.

FWIW, I've never had to do that.

Josh

_{Thursday, January 26, 2012 2:10 PM}

As everyone has said, this is usually an application issue. The application is not closing the connection or terminating the transaction properly. Although you can setup monitoring to find this, you ultimately need to get the application fixed.

I have a SQL Agent job run the following stored proc every 1 hour to email the DBAs of any long running queries. It also auto kills any query without a WHERE running for 180 mins.

CREATE PROC [dbo].[usp_EmailLongQueries]
AS
SET NOCOUNT ON

DECLARE @emaildba varchar(100), @subject varchar(256), @body varchar(8000), @killcmd INT

SET @emaildba = {emailaddresses}
SET @killcmd = 0


DECLARE @servername varchar(256), @loginid varchar(256), @ntdomain varchar(100), @ntusername varchar(100),
    @hostname varchar(255), @hostproc varchar(10), @spid varchar(10), @execcont varchar(10), @waittype varchar(256),
    @waitresource varchar(255), @waittime varchar(10), @blockedby varchar(10), @starttime datetime, @runtimesecs varchar(10),
    @runtimemin varchar(10), @status varchar(20), @dbname varchar(255), @commandtype varchar(25), @sqllen int, @sqlcmd varchar(max)

DECLARE longquery CURSOR FOR
    select 
    @@SERVERNAME as ServerName,
    sproc.loginame LoginID,
    sproc.nt_domain AS NTDomain,
    sproc.nt_username AS NTUserName,
    sproc.Hostname AS HostName,
    sproc.hostprocess,
    sproc.spid AS Session_ID,
    sproc.ecid AS Execution_Context,
    sproc.lastwaittype AS Wait_Type, 
    sproc.waitresource AS Wait_Resource,
    sproc.waittime AS Wait_Time, 
    CASE WHEN sproc.blocked = 0 THEN 0 ELSE sproc.blocked END as BlockedBy,
    sproc.last_batch Started_At,
    datediff(second,sproc.last_batch,getdate()) AS Elapsed_Seconds,
    datediff(mi,sproc.last_batch,getdate()) AS Elapsed_Mins,
    sproc.status AS [Status], 
    DB_NAME(sproc.dbid) AS DBName,
    sproc.cmd AS Command,
    len(sqltext.TEXT) SQL_Length,
    SUBSTRING(sqltext.text, (sproc.stmt_start/2)+1, 
        ((CASE sproc.stmt_end
          WHEN -1 THEN DATALENGTH(sqltext.text)
         ELSE sproc.stmt_end
         END - sproc.stmt_start)/2) + 1) AS Query_SQL
    --,sproc.*
    from master.sys.sysprocesses sproc
    OUTER APPLY master.sys.dm_exec_sql_text(sproc.sql_handle) AS sqltext
    where sproc.spid <> @@SPID
    AND sproc.spid > 50
    AND sproc.cmd <> 'AWAITING COMMAND'
    AND sproc.cmd <> 'WAITFOR'
    AND datediff(mi,sproc.last_batch,getdate()) > 120  -- Running longer than 2 hours
    ORDER BY sproc.spid, sproc.ecid


OPEN longquery
FETCH NEXT FROM longquery INTO @servername, @loginid, @ntdomain, @ntusername,
        @hostname, @hostproc, @spid, @execcont, @waittype,
        @waitresource, @waittime, @blockedby, @starttime, @runtimesecs,
        @runtimemin, @status, @dbname, @commandtype, @sqllen, @sqlcmd

WHILE @@FETCH_STATUS = 0
BEGIN
    SET @killcmd = 0

    -- Auto KILL any user process running longer than x mins
    IF @runtimemin > 900  
        SET @killcmd = 1

    -- Auto kill any user process over x Mins with no WHERE clause
    IF @runtimemin > 180 AND CHARINDEX('WHERE',@sqlcmd) = 0
        SET @killcmd = 1

    IF @killcmd = 1
    BEGIN
        SET @subject = 'KILLED: ' + @servername + ' SPID: ' + @spid + ' process running for ' + @runtimemin + ' Mins'
        DECLARE @sql varchar(1000)
        SET @sql = 'KILL ' + @spid + ';'
        EXEC (@sql)
    END
    ELSE
    BEGIN
        SET @subject = CASE WHEN @blockedby <> 0 THEN 'Blocked: ' ELSE 'Warning: ' END + @servername + ' SPID: ' + @spid + ' Runtime: ' + @runtimemin + ' Mins'
    END

    SET @body = 'Server: ' + @servername + CHAR(10) 
    SET @body = @body + 'SPID: ' + @spid + CHAR(10) 
    SET @body = @body + 'Start: ' + CONVERT(varchar(25), @starttime, 120) + ' Running for: ' + @runtimemin + ' Mins' + CHAR(10) 
    SET @body = @body + 'Username: ' + @loginid + CHAR(10) 
    SET @body = @body + 'Database: ' + @dbname + CHAR(10) 
    SET @body = @body + 'Wait: ' + RTRIM(@waittype) + ' ' + @waittime + 'ms ' + CASE WHEN @blockedby <> 0 THEN 'Blocked By SPID: ' + @blockedby ELSE '' END+ CHAR(10) 
    SET @body = @body + 'From Host: ' + RTRIM(@hostname) + ' PID: ' + @hostproc + CHAR(10) 
    SET @body = @body + CHAR(10) + 'Query: ' + LEFT(@sqlcmd,100) + CHAR(10) 
    SET @body = @body + CHAR(10) + CHAR(10) + '******* THIS IS AN UNMONITORED MAILBOX.  DO NOT REPLY TO THIS EMAIL ******* ' + CHAR(10) 

    --PRINT @subject
    --PRINT @body

    -- Send email
    EXEC msdb.dbo.sp_send_dbmail @recipients = @emaildba, @subject=@subject, @body=@body, @body_format = 'TEXT'


    FETCH NEXT FROM longquery INTO @servername, @loginid, @ntdomain, @ntusername,
        @hostname, @hostproc, @spid, @execcont, @waittype,
        @waitresource, @waittime, @blockedby, @starttime, @runtimesecs,
        @runtimemin, @status, @dbname, @commandtype, @sqllen, @sqlcmd
END
CLOSE longquery
DEALLOCATE longquery

SET NOCOUNT OFF

_{Thursday, June 23, 2016 10:21 AM}

I had to deal with a problem of this nature today and I agree that there are no quick answers to sorting this out.

Luckily I have been in my current position long enough to know how the process I was investigating works, so I identified processes outside of SQL server that were being called by the application that had to be killed by task manager.

The problem in my case came from the fact that we ended up with a chain of orphans when calling system commands e.g. dtexec, which were sat in the background because the user's connection had dropped and the process had nowhere to report back to.

Once I found the system process and killed the process tree in task manager, the spid disappeared and all the table locks were released.

Lesson to be learned, don't run xp_cmdshell if you can use a T-SQL process to do the same job.

Connect, code, and grow

Share via

Question

All replies (7)

Share via

Seeing orphaned transactions that we need to kill

Question

All replies (7)

Additional resources