Thursday, September 3, 2009

Few Index Optimization Tips

Consider creating index on column(s) frequently used in the WHERE, ORDER BY, and GROUP BY clauses.
These column(s) are best candidates for index creating. You should analyze your queries very attentively to avoid creating not useful indexes. Keep your indexes as narrow as possible.

Because each index take up disk space try to minimize the index key's size to avoid using superfluous disk space. This reduces the number of reads required to read the index and boost overall index performance.

Drop indexes that are not used.

Because each index take up disk space and slow the adding, deleting, and updating of rows, you should drop indexes that are not used. You can use Index Wizard to identify indexes that are not used in your queries.

Try to create indexes on columns that have integer values rather than character values.

Because the integer values usually have less size then the characters values size (the size of the int data type is 4 bytes, the size of the bigint data type is 8 bytes), you can reduce the number of index pages which are used to store the index keys. This reduces the number of reads required to read the index and boost overall index performance. Limit the number of indexes, if your application updates data very frequently.

Because each index take up disk space and slow the adding, deleting, and updating of rows, you should create new indexes only after analyze the uses of the data, the types and frequencies of queries performed, and how your queries will use the new indexes. In many cases, the speed advantages of creating the new indexes outweigh the disadvantages of additional space used and slowly rows modification. However, avoid using redundant indexes, create them only when it is necessary. For read-only table, the number of indexes can be increased. Check that index you tried to create does not already exist.

Keep in mind that when you create primary key constraint or unique key constraints SQL Server automatically creates index on the column(s) participate in these constraints. If you specify another index name, you can create the indexes on the same column(s) again and again.

Create clustered index instead of nonclustered to increase performance of the queries that return a range of values and for the queries that contain the GROUP BY or ORDER BY clauses and return the sort results.

Because every table can have only one clustered index, you should choose the column(s) for this index very carefully. Try to analyze all your queries, choose most frequently used queries and include into the clustered index only those column(s), which provide the most performance benefits from the clustered index creation.

Create nonclustered indexes to increase performance of the queries that return few rows and where the index has good selectivity.

In comparison with a clustered index, which can be only one for each table, each table can have as many as 249 nonclustered indexes. However, you should consider nonclustered index creation as carefully as the clustered index, because each index take up disk space and drag on data modification.

Create clustered index on column(s) that is not updated very frequently.

Because the leaf node of a nonclustered index contains a clustered index key if the table has clustered index, then every time that a column used for a clustered index is modified, all of the nonclustered indexes must also be modified.

Create clustered index based on a single column that is as narrow as possibly.

Because nonclustered indexes contain a clustered index key within their leaf nodes and nonclustered indexes use the clustered index to locate data rows, creating clustered index based on a single column that is as narrow as possibly will reduce not only the size of the clustered index, but all nonclustered indexes on the table also.

Avoid creating a clustered index based on an incrementing key.

For example, if a table has surrogate integer primary key declared as IDENTITY and the clustered index was created on this column, then every time data is inserted into this table, the rows will be added to the end of the table. When many rows will be added a "hot spot" can occur. A "hot spot" occurs when many queries try to read or write data in the same area at the same time. A "hot spot" results in I/O bottleneck.
Note. By default, SQL Server creates clustered index for the primary key constraint. So, in this case, you should explicitly specify NONCLUSTERED keyword to indicate that a nonclustered index is created for the primary key constraint.

Create a clustered index for each table.

If you create a table without clustered index, the data rows will not be stored in any particular order. This structure is called a heap. Every time data is inserted into this table, the row will be added to the end of the table. When many rows will be added a "hot spot" can occur. To avoid "hot spot" and improve concurrency, you should create a clustered index for each table.

Don't create index on column(s) which values has low selectivity.

For example, don't create an index for columns with many duplicate values, such as "Sex" column (which has only "Male" and "Female" values), because in this case the disadvantages of additional space used and slowly rows modification outweigh the speed advantages of creating a new index.

If you create a composite (multi-column) index, try to order the columns in the key as to enhance selectivity, with the most selective columns to the leftmost of the key.

The order of the columns in a composite (multi-column) index is very important. This can increase the chance the index will be used.

If you create a composite (multi-column) index, try to order the columns in the key so that the WHERE clauses of the frequently used queries match the column(s) that are leftmost in the index.

The order of the columns in a composite (multi-column) index is very important. The index will be used to evaluate a query only if the leftmost index key's column are specified in the WHERE clause of the query. For example, if you create composite index such as "Name, Age", then the query with the WHERE clause such as "WHERE Name = 'Alex'" will use the index, but the query with the WHERE clause such as "WHERE Age = 28" will not use the index.

If you need to join several tables very frequently, consider creating index on the joined columns.

This can significantly improve performance of the queries against the joined tables.

Consider creating a surrogate integer primary key (identity, for example).

Every table must have a primary key (a unique identifier for a row within a database table). A surrogate primary key is a field that has a unique value but has no actual meaning to the record itself, so users should never see or change a surrogate primary key. Some developers use surrogate primary keys, others use data fields themselves as the primary key. If a primary key consists of many data fields and has a big size, consider creating a surrogate integer primary key. This can improve performance of your queries.

Consider creating the indexes on all the columns, which referenced in most frequently used queries in the WHERE clause which contains the OR operator.

If the WHERE clause in the query contains an OR operator and if any of the referenced columns in the OR clause are not indexed, then the table or clustered index scan will be made. In this case, creating the indexes on all such columns can significantly improve your queries performance.

If your application will perform the same query over and over on the same table, consider creating a covering index including columns from this query.

A covering index is an index, which includes all of the columns referenced in the query. So the creating covering index can improve performance because all the data for the query is contained within the index itself and only the index pages, not the data pages, will be used to retrieve the data. Covering indexes can bring a lot of performance to a query, because it can save a huge amount of I/O operations.

Use the DBCC DBREINDEX statement to rebuild all the indexes on all the tables in your database periodically (for example, one time per week at Sunday) to reduce fragmentation.

Because fragmented data can cause SQL Server to perform unnecessary data reads and the queries performance against the heavy fragmented table can be very bad, you should periodically rebuild all indexes to reduce fragmentation. Try to schedule the DBCC DBREINDEX statement during CPU idle time and slow production periods.

Use the DBCC INDEXDEFRAG statement to defragment clustered and secondary indexes of the specified table or view.

The DBCC INDEXDEFRAG statement is a new SQL Server 2000 command, which was not supported in the previous versions. Unlike DBCC DBREINDEX, DBCC INDEXDEFRAG does not hold locks long term and thus will not block running queries or updates. So, try to use the DBCC INDEXDEFRAG command instead of DBCC DBREINDEX, whenever possible.

Consider using the SORT_IN_TEMPDB option when you create an index and when tempdb is on a different set of disks than the user database.

The SORT_IN_TEMPDB option is a new SQL Server 2000 feature, which was not supported in the previous versions. When you create an index with the SORT_IN_TEMPDB option, SQL Server uses the tempdb database, instead of the current database, to sort data during the index creation. Using this option can reduce the time it takes to create an index, but increases the amount of disk space used to create an index.

Use the SQL Server Profiler Create Trace Wizard with "Identify Scans of Large Tables" trace to determine which tables in your database may need indexes.

This trace will show which tables are being scanned by queries instead of using an index.