# Analytic Functions: Databases for Developers

• Tutorial Analytic Functions: Databases for Developers
• Description An introduction to analytic functions.
• Tags analytics
• Area SQL Analytics
• Contributor Chris Saxon (Oracle)
• Created Tuesday May 08, 2018
• Modules 12
• ## Prerequisite SQL

``````create table bricks (
brick_id integer,
colour   varchar2(10),
shape    varchar2(10),
weight   integer
);

insert into bricks values ( 1, 'blue', 'cube', 1 );
insert into bricks values ( 2, 'blue', 'pyramid', 2 );
insert into bricks values ( 3, 'red', 'cube', 1 );
insert into bricks values ( 4, 'red', 'cube', 2 );
insert into bricks values ( 5, 'red', 'pyramid', 3 );
insert into bricks values ( 6, 'green', 'pyramid', 1 );

commit;``````
• Module 1

## Introduction

Aggregate and analytic functions both enable you to do a calculation over many rows. Aggregate functions squash the output to one row per group. For example the following counts the total rows in the table. It returns one row:

``select count(*) from bricks;``

Adding the over clause converts it to an analytic. This preserves the input rows. So you get all six, each with the value six:

``select count(*) over () from bricks;``

This allows you to see the values from all the other columns, which you can't using group by:

``````select b.*,
count(*) over () total_count
from   bricks b;``````

• Module 2

## Partition By

The group by clause splits rows into groups of the same value. For example, the following get the number of rows and total weight for each colour:

``````select colour, count(*), sum ( weight )
from   bricks
group  by colour;``````

You can carve up the input to an analytic function like this with the partition by clause. The following returns the total weight and count of rows of each colour. It includes all the rows:

``````select b.*,
count(*) over (
partition by colour
) bricks_per_colour,
sum ( weight ) over (
partition by colour
) weight_per_colour
from   bricks b;``````

• Module 3

## Try it!

Complete the following query to return the count and average weight of bricks for each shape:

``````select b.*,
count(*) over (
partition /* TODO */
) bricks_per_shape,
median ( weight ) over (
partition /* TODO */
) median_weight_per_shape
from   bricks b
order  by shape, weight, brick_id;``````

The updated query should give this output:

```BRICK_ID   COLOUR   SHAPE     WEIGHT   BRICKS_PER_SHAPE   MEDIAN_WEIGHT_PER_SHAPE
1 blue     cube             1                  3                         1
3 red      cube             1                  3                         1
4 red      cube             2                  3                         1
6 green    pyramid          1                  3                         2
2 blue     pyramid          2                  3                         2
5 red      pyramid          3                  3                         2 ```
• Module 4

## Order By

The order by clause enables you to compute running totals. For example, the following sorts the rows by brick_id. Then shows the total number of rows and sum of the weights for rows with a brick_id less than or equal to that of the current row:

``````select b.*,
count(*) over (
order by brick_id
) running_total,
sum ( weight ) over (
order by brick_id
) running_weight
from   bricks b;``````

• Module 5

## Try it!

Complete the following query to get the running average weight, ordered by brick_id:

``````select b.brick_id, b.weight,
round ( avg ( weight ) over (
order /* TODO */
), 2 ) running_average_weight
from   bricks b
order  by brick_id;``````

The updated query should give this output:

```BRICK_ID   WEIGHT   RUNNING_AVERAGE_WEIGHT
1        1                        1
2        2                      1.5
3        1                     1.33
4        2                      1.5
5        3                      1.8
6        1                     1.67```
• Module 6

## Partition By + Order By

You can combine the partition by and order by clauses to get running totals within a group. For example, the following splits the rows by colour. It then gets the running count and weight of rows for each colour, sorted by brick_id:

``````select b.*,
count(*) over (
partition by colour
order by brick_id
) running_total,
sum ( weight ) over (
partition by colour
order by brick_id
) running_weight
from   bricks b;``````

• Module 7

## Windowing Clause

When you use order by, the database adds a default windowing clause of:

```range between unbounded preceding
and current row```

This means:

Include all the rows with a value less than or equal to that of the current row.

This can lead to the function including values from rows after the current!

For example, there are several rows with the same weight. So when you sort by this, all rows with the same weight have the same running count and weight:

``````select b.*,
count(*) over (
order by weight
) running_total,
sum ( weight ) over (
order by weight
) running_weight
from   bricks b
order  by weight;``````

Usually this isn't what you want. Normally running totals should only include values from previous rows in the data set.

To do this, you must specify a windowing clause of

```rows between unbounded preceding
and current row```

For example:

``````select b.*,
count(*) over (
order by weight
rows between unbounded preceding and current row
) running_total,
sum ( weight ) over (
order by weight
rows between unbounded preceding and current row
) running_weight
from   bricks b
order  by weight;``````

Note that this makes your results non-deterministic. Rows of the same weight could have their running totals in a different order each time you run the query.

To fix this, add columns to the order by until each set of values in the sort appears once in your results. This makes your results deterministic. Here that's the brick_id:

``````select b.*,
count(*) over (
order by weight, brick_id
rows between unbounded preceding and current row
) running_total,
sum ( weight ) over (
order by weight, brick_id
rows between unbounded preceding and current row
) running_weight
from   bricks b
order  by weight, brick_id;``````

• Module 8

## Sliding Windows

As well as running totals so far, you can change the windowing clause to be a subset of the previous rows.

The following shows the total weight of:

1. The current row + the previous row
2. All rows with the same weight as the current + all rows with a weight one less than the current
``````select b.*,
sum ( weight ) over (
order by weight
rows between 1 preceding and current row
) running_row_weight,
sum ( weight ) over (
order by weight
range between 1 preceding and current row
) running_value_weight
from   bricks b
order  by weight, brick_id;``````

You can also change "current row" to rows or values after the current. Do this by specifying a number following. For example, this query computes the sliding weights including rows or values either side of the current row:

``````select b.*,
sum ( weight ) over (
order by weight
rows between 1 preceding and 1 following
) sliding_row_window,
sum ( weight ) over (
order by weight
range between 1 preceding and 1 following
) sliding_value_window
from   bricks b
order  by weight;``````

You can also offset the window, so it excludes the current row! You can do this either side of the current row.

For example, the following query has two counts. The first shows the number of rows with a weight one or two less than the current. The second counts those with weights greater than the current. So if the current weight = 2, the first count includes rows with the weight 0 or 1. The second rows with weight 3 or 4:

``````select b.*,
count (*) over (
order by weight
range between 2 preceding and 1 preceding
) count_weight_2_lower_than_current,
count (*) over (
order by weight
range between 1 following and 2 following
) count_weight_2_greater_than_current
from   bricks b
order  by weight;``````

• Module 9

## Try It!

Complete the windowing clauses to return:

• The minimum colour of the two rows before (but not including) the current row
• The count of rows with the same weight as the current and one value following
``````select b.*,
min ( colour ) over (
order by brick_id
rows /* TODO */
) first_colour_two_prev,
count (*) over (
order by weight
range /* TODO */
) count_values_this_and_next
from   bricks b
order  by weight;``````

The output of the updated query should be:

```BRICK_ID   COLOUR   SHAPE     WEIGHT   FIRST_COLOUR_TWO_PREV   COUNT_VALUES_THIS_AND_NEXT
1 blue     cube             1 <null>                                             5
3 red      cube             1 blue                                               5
6 green    pyramid          1 red                                                5
4 red      cube             2 blue                                               3
2 blue     pyramid          2 blue                                               3
5 red      pyramid          3 red                                                1```
• Module 10

## Filtering Analytic Functions

Often you want to filter rows using the result of an aggregate. For example, to find all the colours you have two or more bricks of. You can do this with group by using the having clause:

``````select colour from bricks
group  by colour
having count(*) >= 2;``````

But you can't see the other columns! One way to include these is to partition the count by colour.

But the database processes analytic functions after the where clause. So you can't use this in the where. The following raises an error:

``````select colour from bricks
where  count(*) over ( partition by colour ) >= 2;``````

To resolve this, you must use the analytic in a subquery. Then filter it in the outer query. For example:

``````select * from (
select b.*,
count(*) over ( partition by colour ) colour_count
from   bricks b
)
where  colour_count >= 2;``````

• Module 11

## Try It!

Complete the following query to find the rows where

• The total weight for the shape
• The running weight by brick_id

are both greater than four:

``````with totals as (
select b.*,
sum ( weight ) over (
/* TODO */
) weight_per_shape,
sum ( weight ) over (
/* TODO */
) running_weight_by_id
from   bricks b
)
select * from totals
where  /* TODO */
order  by brick_id``````

The updated query should give the following output:

```BRICK_ID   COLOUR   SHAPE     WEIGHT   WEIGHT_PER_SHAPE   RUNNING_WEIGHT_BY_ID
5 red      pyramid          3                  6                      9
6 green    pyramid          1                  6                     10
```
• Module 12

## More Analytic Functions

You can add the over clause to aggregate functions to make them an analytic. But there are many functions which need the over clause.

Here is an overview of common analytic functions

### Row numbering

The analytic functions rank, dense_rank and row_number all return an increasing counter, starting at one.

• Rank - Rows with the same value in the order by have the same rank. The next row after a tie has the value N, where N is its position in the data set.
• Dense_rank - Rows with the same value in the order by have the same rank, but there are no gaps in the ranks
• Row_number - each row has a new value

For example:

``````select brick_id, weight,
row_number() over ( order by weight ) rn,
rank() over ( order by weight ) rk,
dense_rank() over ( order by weight ) dr
from   bricks;``````

### Previous and Next Values

Lag and lead enable you to get values from rows backwards and forwards in your results.

``````select b.*,
lag ( shape ) over ( order by brick_id ) prev_shape,
lead ( shape ) over ( order by brick_id ) next_shape
from   bricks b;``````

### First and Last Values

You can get the first or last value in an ordered set with first_value and last_value:

``````select b.*,
first_value ( weight ) over (
order by brick_id
) first_weight_by_id,
last_value ( weight ) over (
order by brick_id
) last_weight_by_id
from   bricks b;``````

Note the result of first_value stays the same. But for last_value it changes for each row. This is because the default windowing clause stops at the current row. To find the value from the last row in the data set, you change the end of the window to "unbounded following". For example:

``````select b.*,
first_value ( weight ) over (
order by brick_id
) first_weight_by_id,
last_value ( weight ) over (
order by brick_id
range between current row and unbounded following
) last_weight_by_id
from   bricks b;``````