Use TABLOCK to boost your INSERT INTO … SELECT performance

Something that we do frequently in an analytics application is loading data from one table to another. Personally I recommend SSIS for such tasks. But there are chances SSIS is not suitable, then we usually use INSERT INTO … SELECT.

I don’t recommend SELECT … INTO because you can’t control which file group the destination table goes to. And it is very important when I have to use table partition.

So if we need use T-SQL to move data from one place to another one. It is recommended to use TABLOCK option on the destination table. With a few other constraints, the query can be executed with minimal logging.

The few constraints are:

  • The database must be in bulk logged recovery model.
  • The destination table must be empty or without clustered index.
  • There is no non-clustered index on the destination table.

Maybe it is difficult to match the second and third constraint. But we have an alternative which is partition switch. We can partition the destination table. Load the data from source into an empty table with same schema except the index objects (usually I keep the clustered index). After load, we can do a partition switch.

Here is two sample queries:

INSERT INTO dbo.FactDevicePerformance_History
([TimeKey], [DeviceKey], [reads per sec], [read hits per sec],
[seq reads per sec], [seq read hits per sec], [writes per sec],
[write hits per sec], [Kbytes read per sec], [Kbytes written per sec])
SELECT h.* FROM AIR.dbo.FactDevicePerformance_History h
INNER JOIN AIR.dbo.DimTime t ON h.TimeKey = t.TimeKey
WHERE t.[Date] BETWEEN ‘2011-11-1’ AND ‘2011-11-30’

INSERT INTO dbo.FactDevicePerformance_History WITH (TABLOCK)
([TimeKey], [DeviceKey], [reads per sec], [read hits per sec],
[seq reads per sec], [seq read hits per sec], [writes per sec],
[write hits per sec], [Kbytes read per sec], [Kbytes written per sec])
SELECT h.* FROM AIR.dbo.FactDevicePerformance_History h
INNER JOIN AIR.dbo.DimTime t ON h.TimeKey = t.TimeKey
WHERE t.[Date] BETWEEN ‘2011-1-1’ AND ‘2011-1-31’

You can find the only difference is query hint “TABLOCK”. But first query runs 21 seconds, the second one runs only 8 seconds. First query increase the log file from 0.5MB to 672MB. The second one increase the log file from 0.5MB to 3.37MB. There are 2.3 million records moved in the query.

This is what called minimal logging since SQL Server 2005. Anyway you still need to compare these two approaches before you implement. Because with minimal logging SQL Server will force data pages to be flushed to disk before the transaction commits. So if the IO system for data file is not very fast, or the data pages affected by the operation are not sequentially arranged, you might see worse performance with minimal logging.

Advertisements
Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: