Share via


Best way to Delete the Data

Question

Wednesday, August 23, 2017 6:16 AM

  • Hello All,

    I am facing an issue while deleting the records in the tables. Please find below the scenario-

    There are around 34 tables and each table contains 50 million records (One clustered index) and I want to delete around 20 Million records in each table. We need to complete this entire process of deletion in 3 hours.

    So far I have tested these 2 approaches :

    1. Using While loop (50000 records per iteration ) (Issue-The whole process is taking too much time)
    2. Tried to delete the data in one go(Issue- Log file becomes very big in size ,which results in insufficient space )

malik

All replies (11)

Wednesday, August 23, 2017 6:29 AM | 1 vote

Issue-The whole process is taking too much time

Mainly you only have these 2 approaches and deleting lot of data takes some time, depedning on available Memory & IO storage performance.

A third approach to creat additional tables, copy over the data which should remain, drop old tables and rename new ones to old name; but that requires a lot of work and will also take a lot of time.

Olaf Helper

[ Blog] [ Xing] [ MVP]


Wednesday, August 23, 2017 9:35 AM | 1 vote

Hi malik,

According to your post, it seems that you want to find the fastest way of deleting large amount of data in the table. For this purpose, I have done some research about which way of deleting data is more effective. Well, I find some good article for you to choose. You may try to test these situations based on these articles.

1.Delete a Huge Amount of Data from a Table

https://social.technet.microsoft.com/wiki/contents/articles/20651.sql-server-delete-a-huge-amount-of-data-from-a-table.aspx

2.How to delete large data of table in SQL without log?

https://stackoverflow.com/questions/24213299/how-to-delete-large-data-of-table-in-sql-without-log

3.Deleting 1 millions rows in SQL Server

https://stackoverflow.com/questions/24785439/deleting-1-millions-rows-in-sql-server

4.Optimizing Delete on SQL Server

https://stackoverflow.com/questions/955435/optimizing-delete-on-sql-server

Hope these are useful to you.

Best Regards,

Will

MSDN Community Support
Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.


Wednesday, August 23, 2017 12:30 PM

If the database Recovery mode is set to FULL, you might try temporarily setting it to SIMPLE during the delete operation, which should cut down on some transaction logging overhead and speed up the delete operation.

Another option to deleting all 20mm rows at once would be to divide up the delete operation into smaller 'chunks' with a TRANSACTION operation. To give you the idea, here's some sample code that you could adapt for your purpose:

DECLARE @rowid int
SET @rowid = 100 -- start row
WHILE @rowid < 20000000 -- end row
BEGIN
    BEGIN TRAN
        DELETE    [tablename]
        WHERE    id <= @rowid

        IF @@Error <> 0
            ROLLBACK
        ELSE
            COMMIT
        SET @rowid = @rowid + 100000 -- commit tran every 100,000 rows
END -- Loop

My experience is that deleting smaller 'chunks' at a time, like 100 thousand or 500 thousand rows, actually goes faster than the entire 20mm rows all at once.

Reference:
BEGIN TRANSACTION (Transact-SQL)

Hope that helps,

 

Phil Streiff, MCDBA, MCITP, MCSA


Wednesday, August 23, 2017 1:44 PM

Can you share an example of your batch delete query? With only a clustered index, it would be best to delete in batches by clustered key range plus your other criteria. This will help avoid scanning the same rows many times.

Dan Guzman, Data Platform MVP, http://www.dbdelta.com


Friday, August 25, 2017 4:28 AM

Thanks for sharing the article. but the same result

malik


Friday, August 25, 2017 4:31 AM

Please find the query

Print getdate()
Declare @counter bigint
SET @COUNTER=0
WHILE (1=1)
BEGIN
BEGIN TRANSACTION
SET @Counter=@Counter+1

Delete top (50000) t1
From table1 t1
Inner join table2 T2
on  T1.clusterindex colnum  =   t2.clusterindex colnum

IF @@ROWCOUNT < 1 
BREAK
COMMIT TRANSACTION
PRINT '***Loop count - '+ CONVERT(VARCHAR(50),@Counter) +'   Date/time - '+ CONVERT(NVARCHAR(26),GETDATE(),109) +'***********'

IF @@ERROR=1
BEGIN
ROLLBACK TRANSACTION
BREAK
END
END

Print getdate()

malik


Friday, August 25, 2017 6:36 AM

If you have any criteria to separate the data to be deleted, you can have partitions based on it.

Once you partitioned your data, you can use SWITCH to move data to different tables(having said, there should be another table you need to create with same schema). Once the SWITCH completed, later you can truncate from the SWITCH-ed table. This must be one case that you can consider to achieve a faster data move.

Please mark this reply as answer if it solved your issue or vote as helpful if it helped.
 [Blog]


Friday, August 25, 2017 11:04 AM

Even though you are deleting in batches, this script commits only after all batches are complete. There is little benefit to deleting in batches unless each batch is a separate transaction. The purpose of batching is to avoid filling the transaction log and modifying too much of the buffer cache with uncommitted data, which will slow performance.

Below is an example for an incremental (e.g. IDENTITY) integer column key you can adjust for your situation.

SET NOCOUNT ON;
BEGIN TRY
    PRINT GETDATE();
    DECLARE
          @BatchSize int = 50000
        , @Counter bigint
        , @DeletedRowCount bigint = 0
        , @MinClusteredIndexColumn int
        , @MaxClusteredIndexColumn int
        , @StartClusteredIndexColumn int
        , @EndClusteredIndexColumn int
    SET @Counter = 0;

    SELECT
          @MinClusteredIndexColumn = MIN(ClusteredIndexColumn)
        , @MaxClusteredIndexColumn = MAX(ClusteredIndexColumn)
    FROM dbo.table1;

    SET @StartClusteredIndexColumn = @MinClusteredIndexColumn;

    WHILE @StartClusteredIndexColumn <= @MaxClusteredIndexColumn
    BEGIN

        SET @EndClusteredIndexColumn = @StartClusteredIndexColumn + @BatchSize - 1;
        SET @Counter += 1

        DELETE t1
        FROM dbo.table1 AS t1
        INNER JOIN dbo.table2 AS T2
            ON T1.ClusteredIndexColumn  =   t2.ClusteredIndexColumn
        WHERE
            T1.ClusteredIndexColumn BETWEEN @StartClusteredIndexColumn AND @EndClusteredIndexColumn;
        SET @DeletedRowCount = @DeletedRowCount + @@ROWCOUNT;

        SET @StartClusteredIndexColumn += @BatchSize;

    END;

    PRINT '***Loop count - ' + CONVERT(VARCHAR(50),@Counter) 
        + '   Date/time - '+ CONVERT(NVARCHAR(26),GETDATE(),109) 
        + '   ' + CONVERT(VARCHAR(50),@DeletedRowCount) + '  rows deleted ***********';
    PRINT GETDATE();
END TRY
BEGIN CATCH
    THROW; --use RAISERROR instead of throw in SQL 2008R2 and earlier versions
END CATCH;
GO

Dan Guzman, Data Platform MVP, http://www.dbdelta.com


Friday, August 25, 2017 1:46 PM

Malik,

    Hi. Sadly those are the two best options. 

     To get around the large transaction log, we run transaction log backups every 15 min to a file that gets overwritten as we are ok with the risk. 

     Can I ask why there is a three hour window? Has anyone looked at creating a temporary index to help find the records to be deleted or splitting the the tables up on different physical drives ? 

Allen


Tuesday, August 29, 2017 4:16 AM

on what basis I will partition the table.???For deletion i have to kept that data in one partition but on which criteria i will partitioned

malik


Thursday, August 31, 2017 9:19 AM

on what basis I will partition the table.???For deletion i have to kept that data in one partition but on which criteria i will partitioned

malik

Hi malik,

According to Latheesh's reply,  I think it depends on the condition of deleting data. If you want to know more about partitioned tables, you may refer to this link.

Best Regards,

Will

MSDN Community Support
Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.