temporary table vs. table variable

Temporary table and table variable are two common objects used in T-SQL code. Both of these two objects can be used for keeping your temporary dataset in runtime. These two objects are very popular due to its convenience, but I saw lot of developers got some mis-understandings for these two objects.

I’ve heard some developers said these two objects did job with same methods, just with different names. In fact, they are different.

I’ve also heard some developers indicate table variable stayed in buffer, and table variable on hard disk. In fact, both of them are kept on hard disk, which was tempdb. You can monitor the data file size of tempdb to confirm on this. (Just ensure you put a WAITFOR command in your test script because table variable will be released just after the batch execution was finished.)

So what’s the difference for these two objects indeed?

First of all, table variable and temporary table all store their data in tempdb. You can query sys.tables in tempdb to make sure of this. And you can also query sys.database_files to check the data file size increase. Both of them will create a table object in tempdb, and both of them keeps data in tempdb.

You can open one query session, and execute command:

DECLARE @t table (ID int)
WHILE (1=1) BEGIN
INSERT INTO @t VALUES (CAST(RAND() * 1000 AS int))
END

Then you open another session, and execute command

USE tempdb
SELECT * FROM sys.tables
SELECT name, (size * 8)/ 1024.0 AS file_size_mb FROM sys.database_files

You will find one table like “#XXXX”. XXXX is a random number. And you will also see data file of tempdb growing.

Though both of these two objects wrote their data in tempdb, DML on table variable only generate minimal log so in previous demo you will see data file growing but log file remains it origional size.  Maybe the reason is table variable didn’t even hold a transaction. You can not image to resume an operation on one variable after you break then restore a session, right? And you will also never expect you can access a variable in another session, right? So table variable has lower overhead on logging IO and transaction lock.

Anyway if you replace the table variable by temporary table in the previous code, you will see both data file and log file growing. This is because temporary table evolves in transaction definately.

So from this point, when you need load amount of data into one temporary dataset or you need perform lot of modifications on one temporary dataset, table variable will has lower overhead on log IO.

On the contrary, table varialbe doesn’t support index and statistics but temporary table does. So if you need perform some heavy load queries which may benefits from index and statistics, like search, join or aggregate, temporary table maybe can run faster with defined index and statistics. Anyway, you still can use primary key or unique constraint to simulate an index on table variable, or use query hint to force an execution plan. But obviously, it will be very complicated to implement this in some complex environmenet and you may suffer from unstable performance.

There are also some other restrictions for these two objects:

  1. Temporary table can not be used in function, even you just want query some records out from one temporary table. This is due to a restriction from T-SQL language, which called side effect. You can find the definition of “CREATE FUNCTION” in SQL BOL, which indicate code in function body can not change the status of outside objects, like modifying a table.
  2. You can not modify schema of a table variable once your declared it.
  3. You can not pass a table variable into a dynamic T-SQL. You can only declare and use a table variable inside the dynamic T-SQL code.
  4. You can not pass a table variable to stored procedure or function. But as one new feature of SQL Server 2008, you can declare a table type parameter to achieve the similar result.

So after all, how we choose between table variable and temporary table?

Despite of application logics requirement, there are some simple rules you may consider. If you have a small temporary dataset, you’d better consider table variable first. But if you have a large temporary dataset, then consider the operation you will perform on the dataset:

  • If you need share temporary dataset between sessions, temporary table is the only choice.
  • If you need modify the dataset schema on fly, temporary table is the only choice.
  • If you need search data, aggregate or join dataset, you shall consider temporary table in most cases. Anyway, you still can try primary key and query hint on table variable.
  • Otherwise, you shall consider table variable first.
Advertisements
Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: