We'll start by creating a simple table called approxt. It includes a value column we'll use for calculations and we'll be aggregating the results by US state and county.
declare
ORA_00942 exception; pragma Exception_Init(ORA_00942, -00942);
begin
execute immediate 'drop table approxt';
exception when ORA_00942 then null;
end;
/
create table approxt (
id number,
volume number,
state varchar2(5),
county varchar2(30));
insert into approxt
select level id, floor(dbms_random.value()*1000) volume, 'CA' state, 'COUNTY1' county
from dual
connect by level <= 10000;
insert into approxt
select level+10000 id, floor(dbms_random.value()*1000) volume, 'CA' , 'COUNTY2' town
from dual
connect by level <= 10000;
insert into approxt
select level+20000 id, floor(dbms_random.value()*1000) volume, 'NM' , 'COUNTY3'
from dual
connect by level <= 10000;
insert into approxt
select level+30000 id, floor(dbms_random.value()*1000) volume, 'DL' , 'COUNTY4'
from dual
connect by level <= 10000;
insert into approxt
select level+40000 id, floor(dbms_random.value()*1000) volume, 'MA' , 'COUNTY5'
from dual
connect by level <= 10000;
commit;
exec dbms_stats.gather_table_stats(ownname=>null,tabname=>'approxt');
Let us recap approximate count distinct, which has been available since Oracle Database 12.1.
REM Use ALTER SESSION to ensure that COUNT DISTINCT will not be converted to APPROX COUNT DISTINCT. More on this later.
alter session set approx_for_count_distinct = FALSE ;
REM Exact count(distinct volume) calculation.
select count(distinct volume) from approxt;
REM Looking at the execution execution plan, you'll see that it includes a hash group by and sort aggregate.
explain plan for select COUNT(distinct volume) from approxt;
select * from table(dbms_xplan.display);
REM This time, use the approximate calculation using the explicit function: APPROX_COUNT_DISTINCT(volume).
select APPROX_COUNT_DISTINCT(volume) from approxt;
REM Notice that the execution plan for approximate count distinct includes a "sort aggregate approx".
REM This operation is much cheaper than the equivalent exact sort operation.
REM The performance benefits of approximate processing are, to a large extent, associated
REM with the removal of the expensive sort operation.
REM This saving becomes more significant for larger data volumes on columns with
REM large numbers of distinct values.
explain plan for select APPROX_COUNT_DISTINCT(volume) from approxt;
select * from table(dbms_xplan.display);
Oracle Database 12.2 added the ability to replace exact calculations with approximate calculations without making code changes. This can be done using a database parameter, "approx_for_count_distinct", which can be set at the session or database-level.
REM Transparently replace exact COUNT DISTINCT with APPROX COUNT DISTINCT
alter session set approx_for_count_distinct = TRUE ;
REM Approximate count distinct, even though query is "count(distinct)".
REM Check the result with the previous one.
select count(distinct volume) from approxt;
REM An now, "back to normal"...
alter session set approx_for_count_distinct = FALSE ;
REM Exact...
select count(distinct volume) from approxt;
The new approximate median calculation.
REM Exact median calculation...
select median(volume) from approxt where state = 'CA';
REM The new approximate calculation for median...
select approx_median(volume) from approxt where state = 'CA';
REM Look at the execution plan for approximate median...
explain plan for select approx_median(volume) from approxt where state = 'CA';
select * from table(dbms_xplan.display);
The new approximate percentile calculation.
"Approximate percentile" is an umbrella term for "approx_percentile" and "approx_median" because these functions are closely related to one another. A percentile of "0.5" is equivalent to a median. For the rest of this tutorial, the term "approximate percentile" can be taken to be referring to the percentile and median functions.
REM A percentile of "0.5" is equivalent to MEDIAN...
select approx_percentile(0.5) within group (order by volume)
from approxt where state = 'CA';
select approx_percentile(0.1) within group (order by volume)
from approxt where state = 'CA';
The percentile calculations offer error rate and confidence reporting.
REM MEDIAN error rate...
select approx_median(volume,'ERROR_RATE') from approxt where state = 'CA';
REM MEDIAN confidence...
select approx_median(volume,'CONFIDENCE') from approxt where state = 'CA';
REM PERCENTILE error rate...
select approx_percentile(0.1,'ERROR_RATE') within group (order by volume)
from approxt where state = 'CA';
REM PERCENTILE confidence...
select approx_percentile(0.1,'CONFIDENCE') within group (order by volume)
from approxt where state = 'CA';
Two approximate algorithms are offered for the approximate percentile functions:
REM Approximate median, NON DETERMINISTIC
select approx_median(volume), approx_median(volume,'ERROR_RATE')
from approxt where state = 'CA';
REM Approximate median, DETERMINISTIC
select approx_median(volume deterministic), approx_median(volume deterministic,'ERROR_RATE')
from approxt
where state = 'CA';
REM Approximate percentile, DETERMINISTIC
select approx_percentile(0.1 deterministic) within group (order by volume)
from approxt where state = 'CA';
Transparent replacement of exact percentile and median calculations is possible using database parameters.
The umbrella parameter approx_for_aggregation controls both approximate count distinct and the approximate percentile functions. Setting to TRUE will cause the exact count distinct and percentile calculations to be replaced with the equivalent approximate functions.
Finer control is achieved using approx_for_count_distinct (demonstrated above) and approx_for_percentile.
The parameter approx_for_percentile can be used to finely control what exact percentile functions are replaced and whether to use the deterministic or non-deterministic algorithm. The choices for this parameter are: all deterministic, percentile_disc deterministic, percentile_cont deterministic, all, percentile_disc, percentile_cont and none.
REM Set "all deterministic" so that all percentile calculations are replaced with approximate, deterministic calculation...
alter session set approx_for_percentile = 'all deterministic';
REM This query uses MEDIAN, which will be replaced with the approx deterministic calculation...
select median(volume) from approxt where state = 'CA';
REM Check the execution plan...
explain plan for select median(volume) from approxt where state = 'CA';
select * from table(dbms_xplan.display);
REM Back to the default
alter session set approx_for_percentile = 'none';
Approximate aggregations can be materialized and converted back to approximate count distinct and approximate percentiles. In this way, a single table can be used for queries that require approximate answers for aggregations sliced-and-diced over different dimensions.
declare
ORA_00942 exception; pragma Exception_Init(ORA_00942, -00942);
begin
execute immediate 'drop table agg';
exception when ORA_00942 then null;
end;
/
REM Materialize the aggregations in a table called <i>agg</i>.
REM The <i>detail</i> column stores a binary object that encodes the distinct value information
REM for interpretation by the <i>approx_percentile_agg</i> function for different aggregations.
create table agg as
select state,
county,
approx_percentile_detail(volume) detail
from approxt group by state,county;
REM Look at the non-readable content of the <i>detail</i> column...
select county, utl_raw.cast_to_raw( dbms_lob.substr( detail, 128, 1 ) ) as detail
from agg
fetch first 5 rows only;
REM Aggregation by state.
REM The table was created using an aggregation by state and
REM county but below, the "agg" function is able to calculate
REM the approximate percentile when aggregating by state alone.
REM First, here is the exact result from the base table...
select state,median(volume)
from approxt
group by state;
REM This is the approximate result generated using the intermediate <i>agg</i> table.
select state, to_approx_percentile(approx_percentile_agg(detail),0.5)
from agg
group by state;
REM This is the foundation for materialized view rewrite functionality (coming up next).
The approximate query functions can be used with materialized views for transparent query re-write.
REM Clean up...
declare
ORA_12003 exception; pragma Exception_Init(ORA_12003, -12003);
begin
execute immediate 'drop materialized view count_d_mview';
exception when ORA_12003 then null;
end;
/
REM Create a materialized view (MV) for APPROX PERCENTILE
create materialized view pctl_mview enable query rewrite as
select state, county, approx_percentile_detail(volume) detail
from approxt
group by state, county;
REM Gather stats...
exec dbms_stats.gather_table_stats(ownname=>null,tabname=>'pctl_mview');
REM The following query re-writes to the materialized view, aggregating by state.
REM Note how Approx Median can use the Approx Percentile that was defined in the MV.
select state, approx_median(volume)
from approxt
where state = 'CA'
group by state;
REM Take a look at the plan...
explain plan for
select state, approx_median(volume)
from approxt
where state = 'CA'
group by state;
SELECT * FROM TABLE(dbms_xplan.display);
Query re-write can be used with approximate count distinct too.
declare
ORA_12003 exception; pragma Exception_Init(ORA_12003, -12003);
begin
execute immediate 'drop materialized view count_d_mview';
exception when ORA_12003 then null;
end;
/
REM Create an MV for approximate count distinct
create materialized view count_d_mview enable query rewrite as
select state, county, approx_count_distinct_detail(volume) detail
from approxt
group by state, county;
REM Gather stats...
exec dbms_stats.gather_table_stats(ownname=>null,tabname=>'count_d_mview');
REM Query re-writes to MV aggregating by state and county...
select state, county, approx_count_distinct(volume)
from approxt
where state = 'CA'
group by state,county;
REM check the execution plan...
explain plan for select state, county, approx_count_distinct(volume)
from approxt
where state = 'CA'
group
by state,county;
SELECT * FROM TABLE(dbms_xplan.display);
REM Query can also re-write to the MV, aggregating by state alone...
select state, approx_count_distinct(volume)
from approxt
where state = 'CA'
group by state;
REM Check the execution plan...
explain plan for
select state, approx_count_distinct(volume)
from approxt
where state = 'CA'
group by state;
SELECT * FROM TABLE(dbms_xplan.display);
This example is very similar to the previous step, but this time it demonstrates how it is possible to re-write an exact query transparently to use an approximate calculation defined in a materialized view. This is a very powerful feature: it is possible to achieve huge performance benefits without changing existing queries.
declare
ORA_12003 exception; pragma Exception_Init(ORA_12003, -12003);
begin
execute immediate 'drop materialized view count_d_mview';
exception when ORA_12003 then null;
end;
/
REM Create an MV for approximate count distinct
create materialized view count_d_mview enable query rewrite as
select state, county, approx_count_distinct_detail(volume) detail
from approxt
group by state, county;
REM Gather stats...
exec dbms_stats.gather_table_stats(ownname=>null,tabname=>'count_d_mview');
REM Alter the session to enable approx for count distinct...
alter session set approx_for_count_distinct = TRUE ;
REM Query re-writes to MV aggregating by state and county...
select state, county, count(distinct volume)
from approxt
where state = 'CA'
group by state,county;
REM check the execution plan...
explain plan for
select state, county, count(distinct volume)
from approxt
where state = 'CA'
group by state,county;
SELECT * FROM TABLE(dbms_xplan.display);
REM Query can also re-write to the MV, aggregating by state alone...
select state, count(distinct volume)
from approxt
where state = 'CA'
group by state;
REM Check the execution plan...
explain plan for
select state, count(distinct volume)
from approxt
where state = 'CA'
group by state;
SELECT * FROM TABLE(dbms_xplan.display);
REM Back to "normal"...
alter session set approx_for_count_distinct = FALSE ;
Materialized views with approximate aggregations can be incrementally refreshed
REM Clean up...
declare
ORA_12003 exception; pragma Exception_Init(ORA_12003, -12003);
begin
execute immediate 'drop materialized view count_d_mview';
exception when ORA_12003 then null;
end;
/
declare
ORA_12002 exception; pragma Exception_Init(ORA_12002, -12002);
begin
execute immediate 'drop materialized view log on approxt';
exception when ORA_12002 then null;
end;
/
REM Create MV log...
create materialized view log on approxt
with sequence, rowid (id,volume,state,county)
including new values;
REM Create MV...
create materialized view count_d_mview
refresh fast
enable query rewrite
as
select state, county, approx_count_distinct_detail(volume) detail
from approxt
group by state, county;
REM Gather stats...
exec dbms_stats.gather_table_stats(ownname=>null,tabname=>'count_d_mview');
REM Get status of materialized view...
select mview_name, staleness
from user_mviews where mview_name='COUNT_D_MVIEW';
REM Do some dml...
insert into approxt
select level+50000 id, floor(dbms_random.value()*1000) volume, 'FL' , 'COUNTY5'
from dual
connect by level <= 100;
commit;
REM Check status of materialized view...
select mview_name, staleness
from user_mviews where mview_name='COUNT_D_MVIEW';
REM Do an incremental refresh...
begin
DBMS_MVIEW.REFRESH('count_d_mview', method => 'F');
end;
/
REM Check status of materialized view...
select mview_name, staleness
from user_mviews where mview_name='COUNT_D_MVIEW';