Analytic Functions: Databases for Developers

View All Tutorials Login to Run Tutorial

Tutorial Analytic Functions: Databases for Developers
Description An introduction to analytic functions.
Tags analytics
Area SQL Analytics
Contributor Chris Saxon (Oracle)
Created Tuesday May 08, 2018
Modules 12

Prerequisite SQL

create table bricks (
  brick_id integer,
  colour   varchar2(10),
  shape    varchar2(10),
  weight   integer
);

insert into bricks values ( 1, 'blue', 'cube', 1 );
insert into bricks values ( 2, 'blue', 'pyramid', 2 );
insert into bricks values ( 3, 'red', 'cube', 1 );
insert into bricks values ( 4, 'red', 'cube', 2 );
insert into bricks values ( 5, 'red', 'pyramid', 3 );
insert into bricks values ( 6, 'green', 'pyramid', 1 );

commit;

Module 1
Introduction
Aggregate and analytic functions both enable you to do a calculation over many rows. Aggregate functions squash the output to one row per group. For example the following counts the total rows in the table. It returns one row:
```
select count(*) from bricks;
```
Adding the over clause converts it to an analytic. This preserves the input rows. So you get all six, each with the value six:
```
select count(*) over () from bricks;
```
This allows you to see the values from all the other columns, which you can't using group by:
```
select b.*, 
       count(*) over () total_count 
from   bricks b;
```
Module 2
Partition By
The group by clause splits rows into groups of the same value. For example, the following get the number of rows and total weight for each colour:
```
select colour, count(*), sum ( weight )
from   bricks
group  by colour;
```
You can carve up the input to an analytic function like this with the partition by clause. The following returns the total weight and count of rows of each colour. It includes all the rows:
```
select b.*, 
       count(*) over (
         partition by colour
       ) bricks_per_colour, 
       sum ( weight ) over (
         partition by colour
       ) weight_per_colour
from   bricks b;
```

Module 3

Try it!

Complete the following query to return the count and average weight of bricks for each shape:

select b.*, 
       count(*) over (
         partition /* TODO */
       ) bricks_per_shape, 
       median ( weight ) over (
         partition /* TODO */
       ) median_weight_per_shape
from   bricks b
order  by shape, weight, brick_id;

The updated query should give this output:

BRICK_ID   COLOUR   SHAPE     WEIGHT   BRICKS_PER_SHAPE   MEDIAN_WEIGHT_PER_SHAPE   
         1 blue     cube             1                  3                         1 
         3 red      cube             1                  3                         1 
         4 red      cube             2                  3                         1 
         6 green    pyramid          1                  3                         2 
         2 blue     pyramid          2                  3                         2 
         5 red      pyramid          3                  3                         2

Module 4
Order By
The order by clause enables you to compute running totals. For example, the following sorts the rows by brick_id. Then shows the total number of rows and sum of the weights for rows with a brick_id less than or equal to that of the current row:
```
select b.*, 
       count(*) over (
         order by brick_id
       ) running_total, 
       sum ( weight ) over (
         order by brick_id
       ) running_weight
from   bricks b;
```

Module 5

Try it!

Complete the following query to get the running average weight, ordered by brick_id:

select b.brick_id, b.weight,
       round ( avg ( weight ) over (
         order /* TODO */
       ), 2 ) running_average_weight
from   bricks b
order  by brick_id;

The updated query should give this output:

BRICK_ID   WEIGHT   RUNNING_AVERAGE_WEIGHT   
         1        1                        1 
         2        2                      1.5 
         3        1                     1.33 
         4        2                      1.5 
         5        3                      1.8 
         6        1                     1.67

Module 6
Partition By + Order By
You can combine the partition by and order by clauses to get running totals within a group. For example, the following splits the rows by colour. It then gets the running count and weight of rows for each colour, sorted by brick_id:
```
select b.*, 
       count(*) over (
         partition by colour
         order by brick_id
       ) running_total, 
       sum ( weight ) over (
         partition by colour
         order by brick_id
       ) running_weight
from   bricks b;
```

Module 7

Windowing Clause

When you use order by, the database adds a default windowing clause of:

range between unbounded preceding 
      and current row

This means:

Include all the rows with a value less than or equal to that of the current row.

This can lead to the function including values from rows after the current!

For example, there are several rows with the same weight. So when you sort by this, all rows with the same weight have the same running count and weight:

select b.*, 
       count(*) over (
         order by weight
       ) running_total, 
       sum ( weight ) over (
         order by weight
       ) running_weight
from   bricks b
order  by weight;

Usually this isn't what you want. Normally running totals should only include values from previous rows in the data set.

To do this, you must specify a windowing clause of

rows between unbounded preceding 
     and current row

For example:

select b.*, 
       count(*) over (
         order by weight
         rows between unbounded preceding and current row
       ) running_total, 
       sum ( weight ) over (
         order by weight
         rows between unbounded preceding and current row
       ) running_weight
from   bricks b
order  by weight;

Note that this makes your results non-deterministic. Rows of the same weight could have their running totals in a different order each time you run the query.

To fix this, add columns to the order by until each set of values in the sort appears once in your results. This makes your results deterministic. Here that's the brick_id:

select b.*, 
       count(*) over (
         order by weight, brick_id
         rows between unbounded preceding and current row
       ) running_total, 
       sum ( weight ) over (
         order by weight, brick_id
         rows between unbounded preceding and current row
       ) running_weight
from   bricks b
order  by weight, brick_id;

Module 8

Sliding Windows

As well as running totals so far, you can change the windowing clause to be a subset of the previous rows.

The following shows the total weight of:

The current row + the previous row
All rows with the same weight as the current + all rows with a weight one less than the current

select b.*, 
       sum ( weight ) over (
         order by weight
         rows between 1 preceding and current row
       ) running_row_weight, 
       sum ( weight ) over (
         order by weight
         range between 1 preceding and current row
       ) running_value_weight
from   bricks b
order  by weight, brick_id;

You can also change "current row" to rows or values after the current. Do this by specifying a number following. For example, this query computes the sliding weights including rows or values either side of the current row:

select b.*, 
       sum ( weight ) over (
         order by weight
         rows between 1 preceding and 1 following
       ) sliding_row_window, 
       sum ( weight ) over (
         order by weight
         range between 1 preceding and 1 following
       ) sliding_value_window
from   bricks b
order  by weight;

You can also offset the window, so it excludes the current row! You can do this either side of the current row.

For example, the following query has two counts. The first shows the number of rows with a weight one or two less than the current. The second counts those with weights greater than the current. So if the current weight = 2, the first count includes rows with the weight 0 or 1. The second rows with weight 3 or 4:

select b.*, 
       count (*) over (
         order by weight
         range between 2 preceding and 1 preceding 
       ) count_weight_2_lower_than_current, 
       count (*) over (
         order by weight
         range between 1 following and 2 following
       ) count_weight_2_greater_than_current
from   bricks b
order  by weight;

Module 9

Try It!

Complete the windowing clauses to return:

The minimum colour of the two rows before (but not including) the current row
The count of rows with the same weight as the current and one value following

select b.*, 
       min ( colour ) over (
         order by brick_id
         rows /* TODO */
       ) first_colour_two_prev, 
       count (*) over (
         order by weight
         range /* TODO */
       ) count_values_this_and_next
from   bricks b
order  by weight;

The output of the updated query should be:

BRICK_ID   COLOUR   SHAPE     WEIGHT   FIRST_COLOUR_TWO_PREV   COUNT_VALUES_THIS_AND_NEXT 
         1 blue     cube             1 <null>                                             5 
         3 red      cube             1 blue                                               5 
         6 green    pyramid          1 red                                                5 
         4 red      cube             2 blue                                               3 
         2 blue     pyramid          2 blue                                               3 
         5 red      pyramid          3 red                                                1

Module 10
Filtering Analytic Functions
Often you want to filter rows using the result of an aggregate. For example, to find all the colours you have two or more bricks of. You can do this with group by using the having clause:
```
select colour from bricks
group  by colour
having count(*) >= 2;
```
But you can't see the other columns! One way to include these is to partition the count by colour.

But the database processes analytic functions after the where clause. So you can't use this in the where. The following raises an error:
```
select colour from bricks
where  count(*) over ( partition by colour ) >= 2;
```
To resolve this, you must use the analytic in a subquery. Then filter it in the outer query. For example:
```
select * from (
  select b.*,
         count(*) over ( partition by colour ) colour_count
  from   bricks b
)
where  colour_count >= 2;
```

Module 11

Try It!

Complete the following query to find the rows where

The total weight for the shape
The running weight by brick_id

are both greater than four:

with totals as (
  select b.*,
         sum ( weight ) over ( 
           /* TODO */
         ) weight_per_shape,
         sum ( weight ) over ( 
           /* TODO */
         ) running_weight_by_id
  from   bricks b
)
  select * from totals
  where  /* TODO */
  order  by brick_id

The updated query should give the following output:

BRICK_ID   COLOUR   SHAPE     WEIGHT   WEIGHT_PER_SHAPE   RUNNING_WEIGHT_BY_ID   
         5 red      pyramid          3                  6                      9 
         6 green    pyramid          1                  6                     10

Module 12
More Analytic Functions
You can add the over clause to aggregate functions to make them an analytic. But there are many functions which need the over clause.

Here is an overview of common analytic functions

Row numbering

The analytic functions rank, dense_rank and row_number all return an increasing counter, starting at one.
- Rank - Rows with the same value in the order by have the same rank. The next row after a tie has the value N, where N is its position in the data set.
- Dense_rank - Rows with the same value in the order by have the same rank, but there are no gaps in the ranks
- Row_number - each row has a new value
For example:
```
select brick_id, weight, 
       row_number() over ( order by weight ) rn, 
       rank() over ( order by weight ) rk, 
       dense_rank() over ( order by weight ) dr
from   bricks;
```
Previous and Next Values

Lag and lead enable you to get values from rows backwards and forwards in your results.
```
select b.*,
       lag ( shape ) over ( order by brick_id ) prev_shape,
       lead ( shape ) over ( order by brick_id ) next_shape
from   bricks b;
```
First and Last Values

You can get the first or last value in an ordered set with first_value and last_value:
```
select b.*,
       first_value ( weight ) over ( 
         order by brick_id 
       ) first_weight_by_id,
       last_value ( weight ) over ( 
         order by brick_id 
       ) last_weight_by_id
from   bricks b;
```
Note the result of first_value stays the same. But for last_value it changes for each row. This is because the default windowing clause stops at the current row. To find the value from the last row in the data set, you change the end of the window to "unbounded following". For example:
```
select b.*,
       first_value ( weight ) over ( 
         order by brick_id 
       ) first_weight_by_id,
       last_value ( weight ) over ( 
         order by brick_id 
         range between current row and unbounded following
       ) last_weight_by_id
from   bricks b;
```
To learn more about these and other pure analytic functions in Oracle Database, check the docs.

Additional Information

Database on OTN SQL and PL/SQL Discussion forums
Oracle Database
Download Oracle Database

Analytic Functions: Databases for Developers

Prerequisite SQL

Introduction

Partition By

Try it!

Order By

Try it!

Partition By + Order By

Windowing Clause

Sliding Windows

Try It!

Filtering Analytic Functions

Try It!

More Analytic Functions

Row numbering

Previous and Next Values

First and Last Values

Links

Additional Information