BULK INSERT with unlimited errors

Article
01/02/2016

Question

_{Saturday, January 2, 2016 2:57 AM}

Hi,

I am using BULK INSERT command, as below:

https://msdn.microsoft.com/en-us/library/ms188365%28v=sql.90%29.aspx

I want the command to be executed and IGNORE ALL errors occur during the insert operation. And I cannot know the number of errors beforehand. In such a case, how to set the MAXERRORS parameter?

Thanks

All replies (11)

_{Saturday, January 2, 2016 10:05 AM ✅Answered}

MAXERRORS = 1000000000 should make your day, unless your file is even larger.

As Latheesh points out, format errors will still abort the file. This can be circumvented by setting the parameter BATCHSIZE to a low value. And if you set it to 1, you will get all rows, even if the last row is incomplete. But there are two big caveats here:

1) This will slow down the import seriously.
2) If the conditions for the operation to be minimally logged are met, the table will take up a lot more space. (Because each row will get an extent of eight pages of its own.)

_{Saturday, January 2, 2016 3:44 AM}

BULK INSERT tb_HTMLDocuments
FROM 'C:\files\htmlfile.html'
WITH (
     Note the below space only exists to prevent the HTML reader from removing it
    ROWTERMINATOR = '< !--END-->'
    ,MAXERRORS=10
)

https://www.mssqltips.com/sqlservertip/3587/troubleshooting-common-sql-server-bulk-insert-errors/

Please click Mark As Answer if my post helped.

_{Saturday, January 2, 2016 6:53 AM}

Hi chcw, Please provide your statement to help you better.

Have you provided option MAXERRORS to skip the error? This would skip the data errors, however, not the format errors.

Please mark this reply as answer if it solved your issue or vote as helpful if it helped.
[Blog]

_{Saturday, January 2, 2016 7:28 AM}

Hi chcw, Please provide your statement to help you better.

Have you provided option MAXERRORS to skip the error? This would skip the data errors, however, not the format errors.

Please mark this reply as answer if it solved your issue or vote as helpful if it helped.
[Blog]

Hi,Latheesh,

Yes, I provide MAXERRORS. But the problem is that I don't know how many errors will the data file produce, so don't know how to set the value for MAXERRORS. I want to use a value so that the import procedure will NEVER be aborted due to errors.

_{Saturday, January 2, 2016 10:47 AM}

MAXERRORS = 1000000000 should make your day, unless your file is even larger.

As Latheesh points out, format errors will still abort the file. This can be circumvented by setting the parameter BATCHSIZE to a low value. And if you set it to 1, you will get all rows, even if the last row is incomplete. But there are two big caveats here:

1) This will slow down the import seriously.
2) If the conditions for the operation to be minimally logged are met, the table will take up a lot more space. (Because each row will get an extent of eight pages of its own.)

How to know a bulk import is aborted due to errors? Check the error file or are there other options?

_{Saturday, January 2, 2016 12:59 PM}

How to know a bulk import is aborted due to errors? Check the error file or are there other options?

Put BULK INSERT in TRY-CATCH. The CATCH handler only fires if the operation is aborted (because of format error or MAXERRORS being exceeded). See also
http://www.sommarskog.se/error_handling/Part2.html#BULKINSERT2

_{Saturday, January 2, 2016 3:14 PM}

I like Erland's solution, I just want to point up another option, which i use in some cases: You can use INSTEAD OF TRIGGER.

In this approach all the data come through the trigger, which allowing us to make any checking that we want and to log anything that we need. Performance of the bulk import execution might be affected ("How Much" depending on what you to execute in the trigger). Yet the insert is done in bulks and the minimally logged can be implement.

* Don't forget to support multiple row inserts in the trigger (this is common mistake).

* Triggers executed once for each batch, and not for each row.

* By default BULK INSERT disable triggers. you can enable triggers by using FIRE_TRIGGERS.

* This approach need place in the tempDb. you may need to expand the size of tempdb.

I think that I covered the most important issue :-)
I hope that this useful

Ronen Ariely
[Personal Site] [Blog] [Facebook] [Linkedin]

_{Sunday, January 3, 2016 1:27 AM}

Thank you very much

_{Monday, January 4, 2016 7:11 AM}

MAXERRORS = 1000000000 should make your day, unless your file is even larger.

As Latheesh points out, format errors will still abort the file. This can be circumvented by setting the parameter BATCHSIZE to a low value. And if you set it to 1, you will get all rows, even if the last row is incomplete. But there are two big caveats here:

1) This will slow down the import seriously.
2) If the conditions for the operation to be minimally logged are met, the table will take up a lot more space. (Because each row will get an extent of eight pages of its own.)

Hi Erland,

I've followed your advice regarding on BATCHSIZE, the test table(heap) indeed takes more space. Say when BATCHSIZE is set to 1, each batch issued an individual transaction and it seems each row got an individual page(in mixed extent) allocated for the first 8 rows and an individual extent since 9th row.

I didn't know this and do you mind guiding me to find more details(any link or blog) on how pages/extents are allocated in BULK operations?

Thanks in advance.

Eric Zhang
TechNet Community Support

_{Monday, January 4, 2016 9:04 AM}

When an operation is minimally logged, all that SQL Server logs is extent allocations. This reduces the amount of logging considerably. On the other hand, if the database is in bulk_logged recovery and you take a transaction log dump, it has to write the complete pages for all those extents to the log dump, not only the changes. (This is not an issue in full recovery, because full recovery precludes minimally logged operations, and in simple recovery, you don't back up the transaction log dump.)

For a "normal" minimally logged operation like an index rebuild, this is not much of a problem, since the transaction encompassses the entire rebuild, and it eats lots of extents. But if you bulk load with a batchsize = 1, each transaction eats an extent, since, as I said, all that is logged is the extent deallocation.

I don't have any blog or link about this, but Paul Randall's blog is always a great place if you want to learn about minimally logged operations. The reason I knew this is that an MVP colleague made this mistake and was mighty upset. But when you think of it, it is quite logical.

_{Monday, January 4, 2016 4:21 PM}

You can check how many pages allocated to a table before and after you run the bulk load operation:

SELECT OBJECT_NAME(p.object_id) AS object_name
       , i.name AS index_name
       , ps.in_row_used_page_count
FROM sys.dm_db_partition_stats ps
JOIN sys.partitions p
       ON ps.partition_id = p.partition_id
JOIN sys.indexes i
       ON p.index_id = i.index_id
       AND p.object_id = i.object_id

Ronen Ariely
[Personal Site] [Blog] [Facebook] [Linkedin]

Share via

BULK INSERT with unlimited errors

Question

All replies (11)

Additional resources