create table bricks (
brick_id integer,
colour varchar2(10),
shape varchar2(10),
weight integer
);
insert into bricks values ( 1, 'blue', 'cube', 1 );
insert into bricks values ( 2, 'blue', 'pyramid', 2 );
insert into bricks values ( 3, 'red', 'cube', 1 );
insert into bricks values ( 4, 'red', 'cube', 2 );
insert into bricks values ( 5, 'red', 'pyramid', 3 );
insert into bricks values ( 6, 'green', 'pyramid', 1 );
commit;
Aggregate and analytic functions both enable you to do a calculation over many rows. Aggregate functions squash the output to one row per group. For example the following counts the total rows in the table. It returns one row:
select count(*) from bricks;
Adding the over clause converts it to an analytic. This preserves the input rows. So you get all six, each with the value six:
select count(*) over () from bricks;
This allows you to see the values from all the other columns, which you can't using group by:
select b.*,
count(*) over () total_count
from bricks b;
The group by clause splits rows into groups of the same value. For example, the following get the number of rows and total weight for each colour:
select colour, count(*), sum ( weight )
from bricks
group by colour;
You can carve up the input to an analytic function like this with the partition by clause. The following returns the total weight and count of rows of each colour. It includes all the rows:
select b.*,
count(*) over (
partition by colour
) bricks_per_colour,
sum ( weight ) over (
partition by colour
) weight_per_colour
from bricks b;
Complete the following query to return the count and average weight of bricks for each shape:
select b.*,
count(*) over (
partition /* TODO */
) bricks_per_shape,
median ( weight ) over (
partition /* TODO */
) median_weight_per_shape
from bricks b
order by shape, weight, brick_id;
The updated query should give this output:
BRICK_ID COLOUR SHAPE WEIGHT BRICKS_PER_SHAPE MEDIAN_WEIGHT_PER_SHAPE 1 blue cube 1 3 1 3 red cube 1 3 1 4 red cube 2 3 1 6 green pyramid 1 3 2 2 blue pyramid 2 3 2 5 red pyramid 3 3 2
The order by clause enables you to compute running totals. For example, the following sorts the rows by brick_id. Then shows the total number of rows and sum of the weights for rows with a brick_id less than or equal to that of the current row:
select b.*,
count(*) over (
order by brick_id
) running_total,
sum ( weight ) over (
order by brick_id
) running_weight
from bricks b;
Complete the following query to get the running average weight, ordered by brick_id:
select b.brick_id, b.weight,
round ( avg ( weight ) over (
order /* TODO */
), 2 ) running_average_weight
from bricks b
order by brick_id;
The updated query should give this output:
BRICK_ID WEIGHT RUNNING_AVERAGE_WEIGHT 1 1 1 2 2 1.5 3 1 1.33 4 2 1.5 5 3 1.8 6 1 1.67
You can combine the partition by and order by clauses to get running totals within a group. For example, the following splits the rows by colour. It then gets the running count and weight of rows for each colour, sorted by brick_id:
select b.*,
count(*) over (
partition by colour
order by brick_id
) running_total,
sum ( weight ) over (
partition by colour
order by brick_id
) running_weight
from bricks b;
When you use order by, the database adds a default windowing clause of:
range between unbounded preceding and current row
This means:
Include all the rows with a value less than or equal to that of the current row.
This can lead to the function including values from rows after the current!
For example, there are several rows with the same weight. So when you sort by this, all rows with the same weight have the same running count and weight:
select b.*,
count(*) over (
order by weight
) running_total,
sum ( weight ) over (
order by weight
) running_weight
from bricks b
order by weight;
Usually this isn't what you want. Normally running totals should only include values from previous rows in the data set.
To do this, you must specify a windowing clause of
rows between unbounded preceding and current row
For example:
select b.*,
count(*) over (
order by weight
rows between unbounded preceding and current row
) running_total,
sum ( weight ) over (
order by weight
rows between unbounded preceding and current row
) running_weight
from bricks b
order by weight;
Note that this makes your results non-deterministic. Rows of the same weight could have their running totals in a different order each time you run the query.
To fix this, add columns to the order by until each set of values in the sort appears once in your results. This makes your results deterministic. Here that's the brick_id:
select b.*,
count(*) over (
order by weight, brick_id
rows between unbounded preceding and current row
) running_total,
sum ( weight ) over (
order by weight, brick_id
rows between unbounded preceding and current row
) running_weight
from bricks b
order by weight, brick_id;
As well as running totals so far, you can change the windowing clause to be a subset of the previous rows.
The following shows the total weight of:
select b.*,
sum ( weight ) over (
order by weight
rows between 1 preceding and current row
) running_row_weight,
sum ( weight ) over (
order by weight
range between 1 preceding and current row
) running_value_weight
from bricks b
order by weight, brick_id;
You can also change "current row" to rows or values after the current. Do this by specifying a number following. For example, this query computes the sliding weights including rows or values either side of the current row:
select b.*,
sum ( weight ) over (
order by weight
rows between 1 preceding and 1 following
) sliding_row_window,
sum ( weight ) over (
order by weight
range between 1 preceding and 1 following
) sliding_value_window
from bricks b
order by weight;
You can also offset the window, so it excludes the current row! You can do this either side of the current row.
For example, the following query has two counts. The first shows the number of rows with a weight one or two less than the current. The second counts those with weights greater than the current. So if the current weight = 2, the first count includes rows with the weight 0 or 1. The second rows with weight 3 or 4:
select b.*,
count (*) over (
order by weight
range between 2 preceding and 1 preceding
) count_weight_2_lower_than_current,
count (*) over (
order by weight
range between 1 following and 2 following
) count_weight_2_greater_than_current
from bricks b
order by weight;
Complete the windowing clauses to return:
select b.*,
min ( colour ) over (
order by brick_id
rows /* TODO */
) first_colour_two_prev,
count (*) over (
order by weight
range /* TODO */
) count_values_this_and_next
from bricks b
order by weight;
The output of the updated query should be:
BRICK_ID COLOUR SHAPE WEIGHT FIRST_COLOUR_TWO_PREV COUNT_VALUES_THIS_AND_NEXT 1 blue cube 1 <null> 5 3 red cube 1 blue 5 6 green pyramid 1 red 5 4 red cube 2 blue 3 2 blue pyramid 2 blue 3 5 red pyramid 3 red 1
Often you want to filter rows using the result of an aggregate. For example, to find all the colours you have two or more bricks of. You can do this with group by using the having clause:
select colour from bricks
group by colour
having count(*) >= 2;
But you can't see the other columns! One way to include these is to partition the count by colour.
But the database processes analytic functions after the where clause. So you can't use this in the where. The following raises an error:
select colour from bricks
where count(*) over ( partition by colour ) >= 2;
To resolve this, you must use the analytic in a subquery. Then filter it in the outer query. For example:
select * from (
select b.*,
count(*) over ( partition by colour ) colour_count
from bricks b
)
where colour_count >= 2;
Complete the following query to find the rows where
are both greater than four:
with totals as (
select b.*,
sum ( weight ) over (
/* TODO */
) weight_per_shape,
sum ( weight ) over (
/* TODO */
) running_weight_by_id
from bricks b
)
select * from totals
where /* TODO */
order by brick_id
The updated query should give the following output:
BRICK_ID COLOUR SHAPE WEIGHT WEIGHT_PER_SHAPE RUNNING_WEIGHT_BY_ID 5 red pyramid 3 6 9 6 green pyramid 1 6 10
You can add the over clause to aggregate functions to make them an analytic. But there are many functions which need the over clause.
Here is an overview of common analytic functions
The analytic functions rank, dense_rank and row_number all return an increasing counter, starting at one.
For example:
select brick_id, weight,
row_number() over ( order by weight ) rn,
rank() over ( order by weight ) rk,
dense_rank() over ( order by weight ) dr
from bricks;
Lag and lead enable you to get values from rows backwards and forwards in your results.
select b.*,
lag ( shape ) over ( order by brick_id ) prev_shape,
lead ( shape ) over ( order by brick_id ) next_shape
from bricks b;
You can get the first or last value in an ordered set with first_value and last_value:
select b.*,
first_value ( weight ) over (
order by brick_id
) first_weight_by_id,
last_value ( weight ) over (
order by brick_id
) last_weight_by_id
from bricks b;
Note the result of first_value stays the same. But for last_value it changes for each row. This is because the default windowing clause stops at the current row. To find the value from the last row in the data set, you change the end of the window to "unbounded following". For example:
select b.*,
first_value ( weight ) over (
order by brick_id
) first_weight_by_id,
last_value ( weight ) over (
order by brick_id
range between current row and unbounded following
) last_weight_by_id
from bricks b;
To learn more about these and other pure analytic functions in Oracle Database, check the docs.