T-SQL Tips: Delete duplicated rows in a table without unique key

It is always a good practice to put a unique constraint in a table, no matter an auto-incremental primary key or a composite unique index. The reason is not only it can produce some performance benefits, but also make you easier to remove duplicated rows.

Let’s look at one sample here. You have a table with log information. The table has three columns, log_date, log_source, and log_text. The problem is developer forgot to put a unique index there, although he can put one on the columns log_date and log_source. And one day the log relay-agent middleware had some problems and pushed several log entries into the table several times when it retried to save the log into the database. (Maybe you can find some more realistic scenarios.)

Now as a dba, you need to find out those duplicate rows and delete them. You can try the undocumented pseudo column %%physloc%%, or %%lockres%% in SQL Server 2005.

For SQL Server 2005
DELETE FROM logs
WHERE %%lockres%% NOT IN
(SELECT MIN(%%lockres%%) FROM logs GROUP BY log_date, log_source)

For SQL Server 2008 or later version
DELETE FROM logs
WHERE %%physloc%% NOT IN
(SELECT MIN(%%physloc%%) FROM logs GROUP BY log_date, log_source)

Advertisements
Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: