How to Create Indexes: Databases for Developers: Performance #4

View All Tutorials Login to Run Tutorial

Tutorial How to Create Indexes: Databases for Developers: Performance #4
Description An introduction to creating indexes in Oracle Database. This teaches you how to create an index and the main types available.
Tags index, performance
Area Performance and Tuning
Contributor Chris Saxon (Oracle)
Created Thursday February 13, 2020
Modules 16

Prerequisite SQL

create table bricks (
  brick_id integer not null,
  colour   varchar2(10) not null,
  shape    varchar2(10) not null,
  weight   integer not null,
  colour_mixed_case varchar2(10) not null,
  insert_datetime date not null,
  junk     varchar2(1000) not null
);
exec dbms_random.seed ( 0 );

insert into bricks
  with rws as (
    select level x, 
           case ceil ( level / 250 )
             when 4 then 'red'
             when 1 then 'blue'
             when 2 then 'green'
             when 3 then 'yellow'
           end colour
    from   dual
    connect by level <= 1000
  )
    select rownum,
           colour,
           case mod ( rownum, 4 )
             when 0 then 'cube'
             when 1 then 'cylinder'
             when 2 then 'pyramid'
             when 3 then 'prism'
           end shape,
           round ( dbms_random.value ( 1, 10 ) ),
           case mod ( rownum, 3 )
             when 0 then upper ( colour ) 
             when 1 then lower ( colour ) 
             when 2 then initcap ( colour ) 
           end mixed_case,
           date'2020-01-01' + ( rownum / 12 ),
           rpad ( chr ( mod ( rownum, 26 ) + 65 ), 1000, 'x' ) 
    from   rws;
    
commit;

exec dbms_stats.gather_table_stats ( null, 'bricks' ) ;

Module 1
Introduction
The two most common data access methods in Oracle Database are:
- A full table scan
- An index access method
This tutorial compares these two methods and shows you how to create indexes. It uses the BRICKS table for the examples:
```
select * from bricks
fetch first 100 rows only;
```
Module 2
Full Table Scans
There are no indexes on the bricks table. So all queries against it will do a full table scan:
```
select /*+ gather_plan_statistics */count(*) from bricks;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));

select /*+ gather_plan_statistics */count(*) from bricks
where  colour = 'red';

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));

select /*+ gather_plan_statistics */count(*) from bricks
where  brick_id = 1;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
```
This means the database reads every single row in the table and every block (up to the high water mark). Then applies the where clause. And returns those rows where this is true. For searches that return few rows, inspecting every single row is clearly a huge waste of effort.

Creating an index on columns in the WHERE clause of a query enables the database to only read rows matching the search criteria. This may reduce the amount of work needed to execute the query.
Module 3
Creating an Index
A database index stores the values from the indexed columns and points to the rows with these values in the table. Standard indexes in Oracle Database are B-Trees. Using this the database can search the index for entries matching the WHERE clause. This can lead to a big reduction in work.

Use CREATE INDEX to make an index on a table. There are three key parts to this:
- The index name
- The name of the table it's on
- A comma separated list of the columns in the index
This defines an index on the COLOUR column of BRICKS:
```
create index brick_colour_i 
  on bricks ( colour );
```
With the index in place, the optimizer can use it when searching for rows of a given colour. For example, this query searches the index for all "red" entries:
```
select /*+ gather_plan_statistics red bricks */count(*) from bricks
where  colour = 'red';

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
```
Full scanning the table did nearly 200 consistent reads. With the index in place this query did just 2 - a huge saving!

Module 4

Try It!

Replace /* TODO */ in the index definition below, so the query will use it:

create index brick_shape_i 
  on bricks ( /* TODO */ );
  
select /*+ gather_plan_statistics */ count(*) from bricks
where  shape = 'cylinder';

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));

If you want to try creating the index with different columns, you must re-create it. Do this by dropping it first:

drop index brick_shape_i;

Module 5
Viewing Indexes
The *_INDEXES views return details of the indexes in a database. This query returns the indexes on the BRICKS table:
```
select * from user_indexes
where  table_name = 'BRICKS';
```
It's possible to list up to 32 columns in an index. This query returns the indexes on BRICKS and a list of their columns:
```
select index_name, column_name, column_position 
from   user_ind_columns
where  table_name = 'BRICKS'
order  by index_name, column_position;
```
Module 6
Multi-column Indexes
This query searches BRICKS for rows with the colour red and a weight of 1:
```
select /*+ gather_plan_statistics */*
from   bricks
where  colour = 'red'
and    weight = 1;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
```
This uses the colour index to locate the rows with the colour red. The database reads all these rows in the table. Then filters out the rows with weights other than one.

You can see this from the "Predicate Information" section of the plan, which looks like:
```
   1 - filter("WEIGHT"=1)
   2 - access("COLOUR"='red')
```
There are two different operations:
- Access - only read rows that match the search criteria
- Filter - read all the input rows, discarding those that don't match the search
So the query first goes to the index and only reads entries where the colour is red (line 2 of the plan). For all 250 of these entries the query gets the corresponding row from the table. It then searches all these for those where the weight is one. This drops the number of rows returned down from 250 to 14 (line 1 in the plan).

It would be more efficient to only read the rows that meet both criteria. To do this, create a multi-column index on COLOUR and WEIGHT. This enables the database locate the 14 matching rows using the index:
```
create index brick_colour_weight_i 
  on bricks ( colour, weight ) ;
  
select /*+ gather_plan_statistics indexed */*
from   bricks
where  colour = 'red'
and    weight = 1;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
```
The database now only reads rows that match the search. You can see this by the fact every line in the plan returns 14 rows. The total buffers for the query is also lower (30 vs 69) and there is only one predicate, an ACCESS that reads both columns:
```
2 - access("COLOUR"='red' AND "WEIGHT"=1)
```
So having an index including all the columns in a WHERE clause can make queries faster. You can also include columns in the SELECT list to improve performance further
Module 7
Index Only Scans
A composite index can also enable the database to only read the index and bypass the table altogether. To do this, all the columns listed in the query must be present in an index.

For example, this query counts the number of red bricks for each weight. Both of the columns in the query are in the BRICK_COLOUR_WEIGHT_I index. So the query only scans the index:
```
select /*+ gather_plan_statistics */weight, count(*) 
from   bricks
where  colour = 'red'
group  by weight;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
```
Notice that BRICKS doesn't appear in the execution plan! This is also called a covering index

The optimizer can also do an index only scan when you select all the columns in an index without a WHERE clause:
```
select /*+ gather_plan_statistics */colour, weight
from   bricks
fetch  first 10 rows only;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
```
For the optimizer to do this, at least one column must be NOT NULL. This is because Oracle Database excludes rows where all columns are NULL from (BTree) indexes. If all the columns allow nulls, the query could give incorrect results using an index as it would not return wholly NULL rows!
Module 8
Indexing Nulls
Change the COLOUR and WEIGHT columns so they allow null values:
```
alter table bricks
  modify ( colour null, weight null );
```
The table may now store rows where these both of columns are NULL. If you insert such a row, it won't be in the BRICK_COLOUR_WEIGHT_I or BRICK_COLOUR_I indexes. So the query selecting just COLOUR and WEIGHT can no longer use an index:
```
select /*+ gather_plan_statistics nulls */colour, weight
from   bricks
fetch  first 10 rows only;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
```
This also means queries searching for null-valued colours must use a full table scan:
```
select /*+ gather_plan_statistics null colour */ *
from   bricks
where  colour is null;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
```
To allow IS NULL searches to use an index, there must be at least one non-null expression in the index. So you can enable the database to use BRICK_COLOUR_WEIGHT_I by making WEIGHT mandatory:
```
alter table bricks
  modify weight not null;

select /*+ gather_plan_statistics */ *
from   bricks
where  colour is null;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
```
The optimizer can use BRICK_COLOUR_WEIGHT_I to find null-valued colours because every row is guaranteed to have a weight. For example, a row with a null colour and weight of one will have this entry in the index:
```
null, 1
```
Make COLOUR mandatory again before continuing:
```
alter table bricks
  modify colour not null;
```
Module 9
Composite Index Column Order
The order of columns in an index can have a big impact on how effective it is. And whether the optimizer is able to use it!

The database searches columns in an index left-to-right. To be most effective, use the leading columns of the index the WHERE clause.

For example, COLOUR is the first column in BRICK_COLOUR_WEIGHT_I. Queries looking for colour values in the WHERE clause can restrict their search of the index to only these colours. This will often appear as an INDEX RANGE SCAN in the plan.

But queries with WEIGHT in the WHERE clause and COLOUR in the SELECT list can't restrict their search to only those entries matching the weight value. Instead they check every entry in the index with a (FAST) FULL INDEX SCAN:
```
select /*+ gather_plan_statistics */colour, count(*) 
from   bricks
where  weight = 1
group  by colour;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
```
So the query reads all the entries in the index, discarding those where the WEIGHT is not one. The filter predicate for the plan confirms this:
```
2 - filter("WEIGHT"=1) 
```
A full index scan is usually faster than a full table scan. But slower than an index range scan. To enable a range scan for the previous query, you need to create an index with WEIGHT as the first column:
```
create index bricks_weight_colour_i
  on bricks ( weight, colour );

select /*+ gather_plan_statistics */colour, count(*) 
from   bricks
where  weight = 1
group  by colour;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
```
While covering indexes can be the fastest way to read data, creating one for every query leads to an explosion in the number of indexes you have. This increases your storage requirements and makes it harder for the optimizer to choose the best index for each query.

Keep the number of indexes you create to a minimum. Reserve covering indexes for critical queries!

Module 10

Try It!

Replace /* TODO */ to complete the index definition, so the query only reads the index, not the table:

create index brick_junk_colour_i
  on bricks ( /* TODO */ );

select /*+ gather_plan_statistics */colour, count(*) 
from   bricks
where  junk = rpad ( 'A', 1000, 'x' )
group  by colour;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));

When you have a covering index, BRICKS should not appear in the Name column of the plan.

Module 11
Creating Unique Indexes
The BRICK_ID is a unique identifier for rows in BRICKS. But there are no constraints on this column, so it's possible to have two or more rows with the same value for this:
```
insert into bricks 
  values ( 1, 'red', 'cylinder', 1, 'RED', sysdate, 'stuff' );

select * from bricks
where  brick_id = 1;

rollback;
```
To avoid this you can create a unique index. This is a special type of index that is guaranteed to have at most one entry for each set of values. To do this, add the UNIQUE keyword between CREATE and INDEX:
```
create unique index brick_brick_id_u 
  on bricks ( brick_id );
```
There can now be at most one row in the table for each BRICK_ID. If you try and insert an existing value, the database will throw an ORA-00001 exception:
```
insert into bricks 
  values ( 1, 'red', 'cylinder', 1, 'RED', sysdate, 'stuff' );
```
The optimizer can also use a unique index to do an INDEX UNIQUE SCAN when a query has equality conditions on all its columns:
```
select /*+ gather_plan_statistics brick_id */* from bricks
where  brick_id = 1;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
```
Because there can be at most one entry in the index for each BRICK_ID, this is a highly efficient search method.

But the main value of unique indexes is improving data quality by preventing duplicate rows. A common way to do this is by creating a primary key or unique constraint.
Module 12
Primary and Unique Key Indexes
When you create a primary key or unique constraint, Oracle Database will create a unique index to police it. Unless there's an existing index with the constraint's columns at the start!

For example, the previous module created a unique index on BRICK_ID. So making this column the primary key uses the existing unique index on this column:
```
select index_name
from   user_ind_columns
where  table_name = 'BRICKS'
and    column_name = 'BRICK_ID';

alter table bricks
  add primary key ( brick_id );
  
select constraint_name, index_name
from   user_constraints
where  constraint_type = 'P';
```
If you remove the primary key and index, then re-create the primary key, the database will create a new index with a name matching the constraint:
```
alter table bricks
  drop primary key;
drop index brick_brick_id_u;

alter table bricks
  add constraint brick_pk
  primary key ( brick_id );
  
select constraint_name, index_name
from   user_constraints
where  constraint_type = 'P';

select index_name
from   user_ind_columns
where  table_name = 'BRICKS'
and    column_name = 'BRICK_ID';
```
The database can use existing non-unique indexes with the primary key columns at the start to police the constraint. To ensure it uses the index you want when adding a primary key, you can specify it in the INDEX clause.
Module 13
Function-based indexes
If you apply a function to a column in the where clause, the database is unable to range scan regular indexes on that column. This query uppercases the colours, searching for rows with the value RED. Although there is an index on COLOUR, the optimizer uses a full-table scan:
```
select /*+ gather_plan_statistics */ shape, count ( distinct insert_datetime )
from   bricks
where  upper ( colour ) = 'RED'
group  by shape;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
```
This is because the values you're searching for no longer match the values in the index. To overcome this, you can create a function-based index. Do this by calling the function in the column list of the index:
```
create index brick_shape_upper_i 
  on bricks ( upper ( colour ) );

select /*+ gather_plan_statistics */shape, count ( distinct insert_datetime ) times#
from   bricks
where  upper ( colour ) = 'RED'
group  by shape;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
```
To ensure that the optimizer can use a function-based index, call the function in exactly the same way in the index and WHERE clause.

Oracle Database has optimizations which allow it to use functions-based indexes with similar functions to the where clause, but it's best not to rely on these!

Module 14

Try It!

Replace /* TODO */ in this index, so the query will use it to search for bricks inserted on 1 Jan 2020:

create index bricks_insert_date_fbi 
  on bricks ( /* TODO */ );
  
select /*+ gather_plan_statistics  1st Jan weights */weight, count ( * ) 
from   bricks
where  trunc ( insert_datetime ) = date'2020-01-01'
group  by weight;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, null, 'IOSTATS LAST'));

Remember: the function in the WHERE clause must appear in the index definition!

Module 15
Avoiding Function-based Indexes
Function-based indexes are powerful. But it's best to avoid creating them!

It's better to create indexes on the base columns. Then change queries so they don't include indexes on the columns in the query.

The DATE data type in Oracle Database includes a time component. So the previous query used TRUNC() to set the time to midnight for every date to find all the rows inserted on 1 Jan.

Instead of using TRUNC(), you can rewrite the query to search for all rows with a date greater than or equal to 1 Jan 2020. And strictly less than 2 Jan 2020. This allows you to create a regular index on INSERT_DATETIME:
```
create index bricks_insert_date_i 
  on bricks ( insert_datetime );
  
select /*+ gather_plan_statistics  1st Jan weights */weight, count ( * ) 
from   bricks
where  insert_datetime >= date'2020-01-01'
and    insert_datetime < date'2020-01-02'
group  by weight;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, null, 'IOSTATS LAST'));
```
This is more flexible than creating an index on TRUNC ( INSERT_DATETIME ). The index BRICKS_INSERT_DATE_I stores the full datetime values. So if you write another query search for rows with specific times, the database can use this index to read only these values:
```
select /*+ gather_plan_statistics  1st Jan weights */weight, count ( * ) 
from   bricks
where  insert_datetime >= date'2020-01-01'
and    insert_datetime < date'2020-01-01' + interval '12' hour
group  by weight;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, null, 'IOSTATS LAST'));
```
If you create a function-based TRUNC() index, the times are no longer part of the index. So while the optimizer may be able to use it, it'll be less effective than the non-function-based index.

Module 16

Try It Challenge!

Which expressions can you replace /* TODO */ with to make an index that will improve the performance of this query?

create index brick_challenge_i
  on bricks ( /* TODO */ );

select /*+ gather_plan_statistics */ shape, count (*) 
from   bricks
where  upper ( colour_mixed_case ) = 'RED'
and    substr ( junk, 1, 1 ) = 'A'
group  by shape;

select *  
from   table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));

Try creating several different indexes. Which enables the query to use the fewest buffers?

Additional Information

Database on OTN SQL and PL/SQL Discussion forums
Oracle Database
Download Oracle Database

How to Create Indexes: Databases for Developers: Performance #4

Prerequisite SQL

Introduction

Full Table Scans

Creating an Index

Try It!

Viewing Indexes

Multi-column Indexes

Index Only Scans

Indexing Nulls

Composite Index Column Order

Try It!

Creating Unique Indexes

Primary and Unique Key Indexes

Function-based indexes

Try It!

Avoiding Function-based Indexes

Try It Challenge!

Links

Additional Information