create table bricks (
brick_id integer not null,
colour varchar2(10) not null,
shape varchar2(10) not null,
weight integer not null,
colour_mixed_case varchar2(10) not null,
insert_datetime date not null,
junk varchar2(1000) not null
);
exec dbms_random.seed ( 0 );
insert into bricks
with rws as (
select level x,
case ceil ( level / 250 )
when 4 then 'red'
when 1 then 'blue'
when 2 then 'green'
when 3 then 'yellow'
end colour
from dual
connect by level <= 1000
)
select rownum,
colour,
case mod ( rownum, 4 )
when 0 then 'cube'
when 1 then 'cylinder'
when 2 then 'pyramid'
when 3 then 'prism'
end shape,
round ( dbms_random.value ( 1, 10 ) ),
case mod ( rownum, 3 )
when 0 then upper ( colour )
when 1 then lower ( colour )
when 2 then initcap ( colour )
end mixed_case,
date'2020-01-01' + ( rownum / 12 ),
rpad ( chr ( mod ( rownum, 26 ) + 65 ), 1000, 'x' )
from rws;
commit;
exec dbms_stats.gather_table_stats ( null, 'bricks' ) ;
There are no indexes on the bricks table. So all queries against it will do a full table scan:
select /*+ gather_plan_statistics */count(*) from bricks;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
select /*+ gather_plan_statistics */count(*) from bricks
where colour = 'red';
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
select /*+ gather_plan_statistics */count(*) from bricks
where brick_id = 1;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
This means the database reads every single row in the table and every block (up to the high water mark). Then applies the where clause. And returns those rows where this is true. For searches that return few rows, inspecting every single row is clearly a huge waste of effort.
Creating an index on columns in the WHERE clause of a query enables the database to only read rows matching the search criteria. This may reduce the amount of work needed to execute the query.
A database index stores the values from the indexed columns and points to the rows with these values in the table. Standard indexes in Oracle Database are B-Trees. Using this the database can search the index for entries matching the WHERE clause. This can lead to a big reduction in work.
Use CREATE INDEX to make an index on a table. There are three key parts to this:
This defines an index on the COLOUR column of BRICKS:
create index brick_colour_i
on bricks ( colour );
With the index in place, the optimizer can use it when searching for rows of a given colour. For example, this query searches the index for all "red" entries:
select /*+ gather_plan_statistics red bricks */count(*) from bricks
where colour = 'red';
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
Full scanning the table did nearly 200 consistent reads. With the index in place this query did just 2 - a huge saving!
Replace /* TODO */ in the index definition below, so the query will use it:
create index brick_shape_i
on bricks ( /* TODO */ );
select /*+ gather_plan_statistics */ count(*) from bricks
where shape = 'cylinder';
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
If you want to try creating the index with different columns, you must re-create it. Do this by dropping it first:
drop index brick_shape_i;
The *_INDEXES views return details of the indexes in a database. This query returns the indexes on the BRICKS table:
select * from user_indexes
where table_name = 'BRICKS';
It's possible to list up to 32 columns in an index. This query returns the indexes on BRICKS and a list of their columns:
select index_name, column_name, column_position
from user_ind_columns
where table_name = 'BRICKS'
order by index_name, column_position;
This query searches BRICKS for rows with the colour red and a weight of 1:
select /*+ gather_plan_statistics */*
from bricks
where colour = 'red'
and weight = 1;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
This uses the colour index to locate the rows with the colour red. The database reads all these rows in the table. Then filters out the rows with weights other than one.
You can see this from the "Predicate Information" section of the plan, which looks like:
1 - filter("WEIGHT"=1) 2 - access("COLOUR"='red')
There are two different operations:
So the query first goes to the index and only reads entries where the colour is red (line 2 of the plan). For all 250 of these entries the query gets the corresponding row from the table. It then searches all these for those where the weight is one. This drops the number of rows returned down from 250 to 14 (line 1 in the plan).
It would be more efficient to only read the rows that meet both criteria. To do this, create a multi-column index on COLOUR and WEIGHT. This enables the database locate the 14 matching rows using the index:
create index brick_colour_weight_i
on bricks ( colour, weight ) ;
select /*+ gather_plan_statistics indexed */*
from bricks
where colour = 'red'
and weight = 1;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
The database now only reads rows that match the search. You can see this by the fact every line in the plan returns 14 rows. The total buffers for the query is also lower (30 vs 69) and there is only one predicate, an ACCESS that reads both columns:
2 - access("COLOUR"='red' AND "WEIGHT"=1)
So having an index including all the columns in a WHERE clause can make queries faster. You can also include columns in the SELECT list to improve performance further
A composite index can also enable the database to only read the index and bypass the table altogether. To do this, all the columns listed in the query must be present in an index.
For example, this query counts the number of red bricks for each weight. Both of the columns in the query are in the BRICK_COLOUR_WEIGHT_I index. So the query only scans the index:
select /*+ gather_plan_statistics */weight, count(*)
from bricks
where colour = 'red'
group by weight;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
Notice that BRICKS doesn't appear in the execution plan! This is also called a covering index
The optimizer can also do an index only scan when you select all the columns in an index without a WHERE clause:
select /*+ gather_plan_statistics */colour, weight
from bricks
fetch first 10 rows only;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
For the optimizer to do this, at least one column must be NOT NULL. This is because Oracle Database excludes rows where all columns are NULL from (BTree) indexes. If all the columns allow nulls, the query could give incorrect results using an index as it would not return wholly NULL rows!
Change the COLOUR and WEIGHT columns so they allow null values:
alter table bricks
modify ( colour null, weight null );
The table may now store rows where these both of columns are NULL. If you insert such a row, it won't be in the BRICK_COLOUR_WEIGHT_I or BRICK_COLOUR_I indexes. So the query selecting just COLOUR and WEIGHT can no longer use an index:
select /*+ gather_plan_statistics nulls */colour, weight
from bricks
fetch first 10 rows only;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
This also means queries searching for null-valued colours must use a full table scan:
select /*+ gather_plan_statistics null colour */ *
from bricks
where colour is null;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
To allow IS NULL searches to use an index, there must be at least one non-null expression in the index. So you can enable the database to use BRICK_COLOUR_WEIGHT_I by making WEIGHT mandatory:
alter table bricks
modify weight not null;
select /*+ gather_plan_statistics */ *
from bricks
where colour is null;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
The optimizer can use BRICK_COLOUR_WEIGHT_I to find null-valued colours because every row is guaranteed to have a weight. For example, a row with a null colour and weight of one will have this entry in the index:
null, 1
Make COLOUR mandatory again before continuing:
alter table bricks
modify colour not null;
The order of columns in an index can have a big impact on how effective it is. And whether the optimizer is able to use it!
The database searches columns in an index left-to-right. To be most effective, use the leading columns of the index the WHERE clause.
For example, COLOUR is the first column in BRICK_COLOUR_WEIGHT_I. Queries looking for colour values in the WHERE clause can restrict their search of the index to only these colours. This will often appear as an INDEX RANGE SCAN in the plan.
But queries with WEIGHT in the WHERE clause and COLOUR in the SELECT list can't restrict their search to only those entries matching the weight value. Instead they check every entry in the index with a (FAST) FULL INDEX SCAN:
select /*+ gather_plan_statistics */colour, count(*)
from bricks
where weight = 1
group by colour;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
So the query reads all the entries in the index, discarding those where the WEIGHT is not one. The filter predicate for the plan confirms this:
2 - filter("WEIGHT"=1)
A full index scan is usually faster than a full table scan. But slower than an index range scan. To enable a range scan for the previous query, you need to create an index with WEIGHT as the first column:
create index bricks_weight_colour_i
on bricks ( weight, colour );
select /*+ gather_plan_statistics */colour, count(*)
from bricks
where weight = 1
group by colour;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
While covering indexes can be the fastest way to read data, creating one for every query leads to an explosion in the number of indexes you have. This increases your storage requirements and makes it harder for the optimizer to choose the best index for each query.
Keep the number of indexes you create to a minimum. Reserve covering indexes for critical queries!
Replace /* TODO */ to complete the index definition, so the query only reads the index, not the table:
create index brick_junk_colour_i
on bricks ( /* TODO */ );
select /*+ gather_plan_statistics */colour, count(*)
from bricks
where junk = rpad ( 'A', 1000, 'x' )
group by colour;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
When you have a covering index, BRICKS should not appear in the Name column of the plan.
The BRICK_ID is a unique identifier for rows in BRICKS. But there are no constraints on this column, so it's possible to have two or more rows with the same value for this:
insert into bricks
values ( 1, 'red', 'cylinder', 1, 'RED', sysdate, 'stuff' );
select * from bricks
where brick_id = 1;
rollback;
To avoid this you can create a unique index. This is a special type of index that is guaranteed to have at most one entry for each set of values. To do this, add the UNIQUE keyword between CREATE and INDEX:
create unique index brick_brick_id_u
on bricks ( brick_id );
There can now be at most one row in the table for each BRICK_ID. If you try and insert an existing value, the database will throw an ORA-00001 exception:
insert into bricks
values ( 1, 'red', 'cylinder', 1, 'RED', sysdate, 'stuff' );
The optimizer can also use a unique index to do an INDEX UNIQUE SCAN when a query has equality conditions on all its columns:
select /*+ gather_plan_statistics brick_id */* from bricks
where brick_id = 1;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
Because there can be at most one entry in the index for each BRICK_ID, this is a highly efficient search method.
But the main value of unique indexes is improving data quality by preventing duplicate rows. A common way to do this is by creating a primary key or unique constraint.
When you create a primary key or unique constraint, Oracle Database will create a unique index to police it. Unless there's an existing index with the constraint's columns at the start!
For example, the previous module created a unique index on BRICK_ID. So making this column the primary key uses the existing unique index on this column:
select index_name
from user_ind_columns
where table_name = 'BRICKS'
and column_name = 'BRICK_ID';
alter table bricks
add primary key ( brick_id );
select constraint_name, index_name
from user_constraints
where constraint_type = 'P';
If you remove the primary key and index, then re-create the primary key, the database will create a new index with a name matching the constraint:
alter table bricks
drop primary key;
drop index brick_brick_id_u;
alter table bricks
add constraint brick_pk
primary key ( brick_id );
select constraint_name, index_name
from user_constraints
where constraint_type = 'P';
select index_name
from user_ind_columns
where table_name = 'BRICKS'
and column_name = 'BRICK_ID';
The database can use existing non-unique indexes with the primary key columns at the start to police the constraint. To ensure it uses the index you want when adding a primary key, you can specify it in the INDEX clause.
If you apply a function to a column in the where clause, the database is unable to range scan regular indexes on that column. This query uppercases the colours, searching for rows with the value RED. Although there is an index on COLOUR, the optimizer uses a full-table scan:
select /*+ gather_plan_statistics */ shape, count ( distinct insert_datetime )
from bricks
where upper ( colour ) = 'RED'
group by shape;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
This is because the values you're searching for no longer match the values in the index. To overcome this, you can create a function-based index. Do this by calling the function in the column list of the index:
create index brick_shape_upper_i
on bricks ( upper ( colour ) );
select /*+ gather_plan_statistics */shape, count ( distinct insert_datetime ) times#
from bricks
where upper ( colour ) = 'RED'
group by shape;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
To ensure that the optimizer can use a function-based index, call the function in exactly the same way in the index and WHERE clause.
Oracle Database has optimizations which allow it to use functions-based indexes with similar functions to the where clause, but it's best not to rely on these!
Replace /* TODO */ in this index, so the query will use it to search for bricks inserted on 1 Jan 2020:
create index bricks_insert_date_fbi
on bricks ( /* TODO */ );
select /*+ gather_plan_statistics 1st Jan weights */weight, count ( * )
from bricks
where trunc ( insert_datetime ) = date'2020-01-01'
group by weight;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, null, 'IOSTATS LAST'));
Remember: the function in the WHERE clause must appear in the index definition!
Function-based indexes are powerful. But it's best to avoid creating them!
It's better to create indexes on the base columns. Then change queries so they don't include indexes on the columns in the query.
The DATE data type in Oracle Database includes a time component. So the previous query used TRUNC() to set the time to midnight for every date to find all the rows inserted on 1 Jan.
Instead of using TRUNC(), you can rewrite the query to search for all rows with a date greater than or equal to 1 Jan 2020. And strictly less than 2 Jan 2020. This allows you to create a regular index on INSERT_DATETIME:
create index bricks_insert_date_i
on bricks ( insert_datetime );
select /*+ gather_plan_statistics 1st Jan weights */weight, count ( * )
from bricks
where insert_datetime >= date'2020-01-01'
and insert_datetime < date'2020-01-02'
group by weight;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, null, 'IOSTATS LAST'));
This is more flexible than creating an index on TRUNC ( INSERT_DATETIME ). The index BRICKS_INSERT_DATE_I stores the full datetime values. So if you write another query search for rows with specific times, the database can use this index to read only these values:
select /*+ gather_plan_statistics 1st Jan weights */weight, count ( * )
from bricks
where insert_datetime >= date'2020-01-01'
and insert_datetime < date'2020-01-01' + interval '12' hour
group by weight;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, null, 'IOSTATS LAST'));
If you create a function-based TRUNC() index, the times are no longer part of the index. So while the optimizer may be able to use it, it'll be less effective than the non-function-based index.
Which expressions can you replace /* TODO */ with to make an index that will improve the performance of this query?
create index brick_challenge_i
on bricks ( /* TODO */ );
select /*+ gather_plan_statistics */ shape, count (*)
from bricks
where upper ( colour_mixed_case ) = 'RED'
and substr ( junk, 1, 1 ) = 'A'
group by shape;
select *
from table(dbms_xplan.display_cursor(:LIVESQL_LAST_SQL_ID, format => 'IOSTATS LAST'));
Try creating several different indexes. Which enables the query to use the fewest buffers?