Don’t relies on auto-statistics option in VLDB environment

As we known, statistics on SQL Server is quite helpful for generating an efficient execution plan. But a lot of developers never take care of statistics, because in SQL Server, there are two options, auto create statistics and auto update statistics, which we called auto-statistics option.

In most cases, auto-statistics works properly and saved us a lot of time to manage the statistics. When you perform some search, aggregate or sort operation on the table, SQL Server will automatically create some statistics on the table to help generating execution plan. Or there are already some statistics for generating execution plan, SQL Server may update those statistics marked as outdated.  There are only few triggers to mark a statistics as outdated, one is table schema was changed, and another one is 20% of the data was modified (in fact it is 20% plus 500 rows).

In SQL Server 2000, the modification is evaluated on row level, so there is only one trigger for entire table. Then Microsoft realized this algorithm will cause statistics on those quite stable columns updated too frequent. So in SQL Server 2005, Microsoft changed the detection mechanism to column level, to prevent unnecessary statistics updating when only few fields were heavily updated.

First of all, both in SQL Server 2000 and SQL Server 2005, or later versions, auto update statistics may not take effect on large tables as we expected, like on fact table. Image you got a fact table which pump in new data daily by ETL, only after 5 days, data pumps in daily will not enough to trigger the auto statistics update. Then we can image in most data warehousing scenarios, we already archived 2 years data in fact table. How long we will wait for statistics to be updated automatically again? For every 5 months!

Then in SQL Server 2008, even in those OLTP environments which mean smaller the table and if you used filtered index, things will go worse. Because filtered index still use whole column as the trigger base, so if you have a filtered index which covers only  5% of rows in the column, you need to update over 4 times on those rows covered by filtered index to trigger the auto-update statistics operation again.

So forget auto-statistics option in those large database environment, and on filtered index in any environments. Do statistics maintenance by yourself.

Advertisements
Post a comment or leave a trackback: Trackback URL.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: