Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Wednesday, August 23, 2017 6:16 AM
Hello All,
I am facing an issue while deleting the records in the tables. Please find below the scenario-
There are around 34 tables and each table contains 50 million records (One clustered index) and I want to delete around 20 Million records in each table. We need to complete this entire process of deletion in 3 hours.
So far I have tested these 2 approaches :
- Using While loop (50000 records per iteration ) (Issue-The whole process is taking too much time)
- Tried to delete the data in one go(Issue- Log file becomes very big in size ,which results in insufficient space )
malik
Wednesday, August 23, 2017 6:29 AM | 1 vote
Issue-The whole process is taking too much time
Mainly you only have these 2 approaches and deleting lot of data takes some time, depedning on available Memory & IO storage performance.
A third approach to creat additional tables, copy over the data which should remain, drop old tables and rename new ones to old name; but that requires a lot of work and will also take a lot of time.
Olaf Helper
Wednesday, August 23, 2017 9:35 AM | 1 vote
Hi malik,
According to your post, it seems that you want to find the fastest way of deleting large amount of data in the table. For this purpose, I have done some research about which way of deleting data is more effective. Well, I find some good article for you to choose. You may try to test these situations based on these articles.
1.Delete a Huge Amount of Data from a Table
2.How to delete large data of table in SQL without log?
https://stackoverflow.com/questions/24213299/how-to-delete-large-data-of-table-in-sql-without-log
3.Deleting 1 millions rows in SQL Server
https://stackoverflow.com/questions/24785439/deleting-1-millions-rows-in-sql-server
4.Optimizing Delete on SQL Server
https://stackoverflow.com/questions/955435/optimizing-delete-on-sql-server
Hope these are useful to you.
Best Regards,
Will
MSDN Community Support
Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.
Wednesday, August 23, 2017 12:30 PM
If the database Recovery mode is set to FULL, you might try temporarily setting it to SIMPLE during the delete operation, which should cut down on some transaction logging overhead and speed up the delete operation.
Another option to deleting all 20mm rows at once would be to divide up the delete operation into smaller 'chunks' with a TRANSACTION operation. To give you the idea, here's some sample code that you could adapt for your purpose:
DECLARE @rowid int
SET @rowid = 100 -- start row
WHILE @rowid < 20000000 -- end row
BEGIN
BEGIN TRAN
DELETE [tablename]
WHERE id <= @rowid
IF @@Error <> 0
ROLLBACK
ELSE
COMMIT
SET @rowid = @rowid + 100000 -- commit tran every 100,000 rows
END -- Loop
My experience is that deleting smaller 'chunks' at a time, like 100 thousand or 500 thousand rows, actually goes faster than the entire 20mm rows all at once.
Reference:
BEGIN TRANSACTION (Transact-SQL)
Hope that helps,
Phil Streiff, MCDBA, MCITP, MCSA
Wednesday, August 23, 2017 1:44 PM
Can you share an example of your batch delete query? With only a clustered index, it would be best to delete in batches by clustered key range plus your other criteria. This will help avoid scanning the same rows many times.
Dan Guzman, Data Platform MVP, http://www.dbdelta.com
Friday, August 25, 2017 4:28 AM
Thanks for sharing the article. but the same result
malik
Friday, August 25, 2017 4:31 AM
Please find the query
Print getdate()
Declare @counter bigint
SET @COUNTER=0
WHILE (1=1)
BEGIN
BEGIN TRANSACTION
SET @Counter=@Counter+1
Delete top (50000) t1
From table1 t1
Inner join table2 T2
on T1.clusterindex colnum = t2.clusterindex colnum
IF @@ROWCOUNT < 1
BREAK
COMMIT TRANSACTION
PRINT '***Loop count - '+ CONVERT(VARCHAR(50),@Counter) +' Date/time - '+ CONVERT(NVARCHAR(26),GETDATE(),109) +'***********'
IF @@ERROR=1
BEGIN
ROLLBACK TRANSACTION
BREAK
END
END
Print getdate()
malik
Friday, August 25, 2017 6:36 AM
If you have any criteria to separate the data to be deleted, you can have partitions based on it.
Once you partitioned your data, you can use SWITCH to move data to different tables(having said, there should be another table you need to create with same schema). Once the SWITCH completed, later you can truncate from the SWITCH-ed table. This must be one case that you can consider to achieve a faster data move.
Please mark this reply as answer if it solved your issue or vote as helpful if it helped.
[Blog]
Friday, August 25, 2017 11:04 AM
Even though you are deleting in batches, this script commits only after all batches are complete. There is little benefit to deleting in batches unless each batch is a separate transaction. The purpose of batching is to avoid filling the transaction log and modifying too much of the buffer cache with uncommitted data, which will slow performance.
Below is an example for an incremental (e.g. IDENTITY) integer column key you can adjust for your situation.
SET NOCOUNT ON;
BEGIN TRY
PRINT GETDATE();
DECLARE
@BatchSize int = 50000
, @Counter bigint
, @DeletedRowCount bigint = 0
, @MinClusteredIndexColumn int
, @MaxClusteredIndexColumn int
, @StartClusteredIndexColumn int
, @EndClusteredIndexColumn int
SET @Counter = 0;
SELECT
@MinClusteredIndexColumn = MIN(ClusteredIndexColumn)
, @MaxClusteredIndexColumn = MAX(ClusteredIndexColumn)
FROM dbo.table1;
SET @StartClusteredIndexColumn = @MinClusteredIndexColumn;
WHILE @StartClusteredIndexColumn <= @MaxClusteredIndexColumn
BEGIN
SET @EndClusteredIndexColumn = @StartClusteredIndexColumn + @BatchSize - 1;
SET @Counter += 1
DELETE t1
FROM dbo.table1 AS t1
INNER JOIN dbo.table2 AS T2
ON T1.ClusteredIndexColumn = t2.ClusteredIndexColumn
WHERE
T1.ClusteredIndexColumn BETWEEN @StartClusteredIndexColumn AND @EndClusteredIndexColumn;
SET @DeletedRowCount = @DeletedRowCount + @@ROWCOUNT;
SET @StartClusteredIndexColumn += @BatchSize;
END;
PRINT '***Loop count - ' + CONVERT(VARCHAR(50),@Counter)
+ ' Date/time - '+ CONVERT(NVARCHAR(26),GETDATE(),109)
+ ' ' + CONVERT(VARCHAR(50),@DeletedRowCount) + ' rows deleted ***********';
PRINT GETDATE();
END TRY
BEGIN CATCH
THROW; --use RAISERROR instead of throw in SQL 2008R2 and earlier versions
END CATCH;
GO
Dan Guzman, Data Platform MVP, http://www.dbdelta.com
Friday, August 25, 2017 1:46 PM
Malik,
Hi. Sadly those are the two best options.
To get around the large transaction log, we run transaction log backups every 15 min to a file that gets overwritten as we are ok with the risk.
Can I ask why there is a three hour window? Has anyone looked at creating a temporary index to help find the records to be deleted or splitting the the tables up on different physical drives ?
Allen
Tuesday, August 29, 2017 4:16 AM
on what basis I will partition the table.???For deletion i have to kept that data in one partition but on which criteria i will partitioned
malik
Thursday, August 31, 2017 9:19 AM
on what basis I will partition the table.???For deletion i have to kept that data in one partition but on which criteria i will partitioned
malik
Hi malik,
According to Latheesh's reply, I think it depends on the condition of deleting data. If you want to know more about partitioned tables, you may refer to this link.
Best Regards,
Will
MSDN Community Support
Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.