Subqueries: Databases for Developers

View All Tutorials Login to Run Tutorial

Tutorial Subqueries: Databases for Developers
Description An introduction to using subqueries in Oracle Database
Tags with clause, in, exists
Area SQL General
Contributor Chris Saxon (Oracle)
Created Tuesday May 01, 2018
Modules 14

Prerequisite SQL

create table bricks (
  brick_id integer,
  colour   varchar2(10)
);

create table colours (
  colour_name           varchar2(10),
  minimum_bricks_needed integer
);

insert into colours values ( 'blue', 2 );
insert into colours values ( 'green', 3 );
insert into colours values ( 'red', 2 );
insert into colours values ( 'orange', 1);
insert into colours values ( 'yellow', 1 );
insert into colours values ( 'purple', 1 );

insert into bricks values ( 1, 'blue' );
insert into bricks values ( 2, 'blue' );
insert into bricks values ( 3, 'blue' );
insert into bricks values ( 4, 'green' );
insert into bricks values ( 5, 'green' );
insert into bricks values ( 6, 'red' );
insert into bricks values ( 7, 'red' );
insert into bricks values ( 8, 'red' );
insert into bricks values ( 9, null );

commit;

Module 1
Introduction
This tutorial shows you how to write subqueries. It uses the bricks and colours table. Run the queries below to see their contents:
```
select * from bricks;

select * from colours;
```
Note: This tutorial makes heavy use of group by. If you want a refresher on this, check out module 7 of Databases for Developers: Foundations
Module 2
Inline Views
An inline view replaces a table in the from clause of your query. In this query, "select * from bricks" is an inline view:
```
select * from (
  select * from bricks
)
```
You use inline views to calculate an intermediate result set. For example, you can count the number of bricks you have of each colour using:
```
select * from (
  select colour, count(*) c
  from   bricks
  group  by colour
) brick_counts
```
You can then join the result of this to the colours table. This allows you to find out which colours you have fewer bricks of than the minimum needed defined in colours:
```
select * from (
  select colour, count(*) c
  from   bricks
  group  by colour
) brick_counts
right  join colours
on     brick_counts.colour = colours.colour_name
where  nvl ( brick_counts.c, 0 ) < colours.minimum_bricks_needed
```
The outer join is necessary to return colours for which you have no bricks (yellow, purple, and orange)
Module 3
Try it!
Complete the query below, using an inline view to find the min and max brick_id for each colour of brick:
```
select * from (

)
```
Hint: you need group by colour. Use min(brick_id) to find the minimum. The query should return the following rows (the order may be different):
```
COLOUR   MIN(BRICK_ID)   MAX(BRICK_ID)   
green                  4               5 
<null>                 9               9 
red                    6               8 
blue                   1               3
```
Module 4
Nested Subqueries
Nested subqueries go in your where clause. The query filters rows in the parent tables.

For example, to find all the rows in colours where you have a matching brick, you could write:
```
select * from colours c
where  c.colour_name in (
  select b.colour from bricks b
);
```
You can also use exists in a nested subquery. The following is the same as the previous query:
```
select * from colours c
where  exists (
  select null from bricks b
  where  b.colour = c.colour_name
);
```
You can filter rows in a subquery too. To find all the colours that have at least one brick with a brick_id less than 5, write:
```
select * from colours c
where  c.colour_name in (
  select b.colour from bricks b
  where  b.brick_id < 5
);
```
Or with exists:
```
select * from colours c
where  exists (
  select null from bricks b
  where  b.colour = c.colour_name 
  and    b.brick_id < 5
);
```
Module 5
Correlated vs. Uncorrelated
A subquery is correlated when it joins to a table from the parent query. If you don't, then it's uncorrelated.

This leads to a difference between IN and EXISTS. EXISTS returns rows from the parent query, as long as the subquery finds at least one row. So the following uncorrelated EXISTS returns all the rows in colours:
```
select * from colours
where  exists (
  select null from bricks
);
```
Note: When using EXISTS, what you select in the subquery does not matter because it is only checking the existence of a row that matches the where clause (if there is one). You can "select null" or "select 1" or select an actual column.

This is because the query means:

Return all the in colours if there is at least one row in bricks

To find all the colour rows which have at least one row in bricks of the same colour, you must join in the subquery. Normally you need to correlate EXISTS subqueries with a table in the parent.
Module 6
NOT IN vs NOT EXISTS
You can do the reverse of IN & EXISTS by placing NOT in front of them. This returns you all the rows from the parent which don't have a match in the subquery.

For example to find all the rows in colours without a matching colour in bricks, you can use NOT EXISTS:
```
select * from colours c
where  not exists (
  select null from bricks b
  where  b.colour = c.colour_name
);
```
But do the same with NOT IN:
```
select * from colours c
where  c.colour_name not in (
  select b.colour from bricks b
);
```
And your query returns nothing!

This is because there is a row in bricks with a null colour. So the previous query is the same as:
```
select * from colours c
where c.colour_name not in (
  'red', 'green', 'blue', 
  'orange', 'yellow', 'purple',
  null
);
```
For the NOT IN condition to be true, comparing all its elements to the parent table must return false.

But remember that comparing anything to null gives unknown! So the whole expression is unknown and you get no data.

To resolve this, either use NOT EXISTS or add a where clause to stop the subquery returning null values:
```
select * from colours c
where c.colour_name not in (
  select b.colour from bricks b
  where  b.colour is not null
);
```

Module 7

Try it!

Complete the subquery to find all the rows in bricks with a colour where colours.minimum_bricks_needed = 2:

select * from bricks b
where  b.colour in (

);

This should return

BRICK_ID COLOUR
       1 blue
       2 blue
       3 blue
       6 red
       7 red
       8 red

Module 8
Scalar Subqueries
Scalar subqueries return one column and at most one row. You can replace a column with a scalar subquery in most cases.

For example, to return a count of the number of bricks matching each colour, you could do the following:
```
select colour_name, (
         select count(*) 
         from   bricks b
         where  b.colour = c.colour_name
         group  by b.colour
       ) brick_counts
from   colours c;
```
Note the colours with no matching bricks return null. To show zero instead, you can use NVL or coalesce. This needs to go around the whole subquery:
```
select colour_name, nvl ( (
         select count(*) 
         from   bricks b
         where  b.colour = c.colour_name
         group  by b.colour
       ), 0 ) brick_counts
from   colours c;
```
You also need to join bricks to colours in the subquery. If you don't, it will return four rows (one for each different value for colour in bricks). This leads to an ORA-01427 error:
```
select c.colour_name, (
         select count(*) 
         from   bricks b
         group  by colour
       ) brick_counts
from   colours c;
```
Usually you will correlate a scalar subquery with a parent table to give the correct answer.

You can also use scalar subqueries in your having clause. So instead of a join, you could write the query in part 1 to find those bricks you have less than the minimum needed like so:
```
select colour, count(*) c
from   bricks b
group  by colour
having count(*) < (
  select c.minimum_bricks_needed 
  from   colours c
  where  c.colour_name = b.colour
);
```

Module 9

Try it!

Complete the scalar subquery below to find the minimum brick_id for each colour:

select c.colour_name, (

       ) min_brick_id
from   colours c
where  c.colour_name is not null;

The query should return the following rows:

COLOUR_NAME   MIN_BRICK_ID   
green                      4 
red                        6 
blue                       1 
purple                <null>
yellow                <null> 
orange                <null>

Module 10
Common Table Expressions
Common table expressions (CTEs) enable you to name subqueries. You then refer to these like normal tables elsewhere in your query. This can make your SQL easier to write and understand later.

CTEs go in the with clause above the select statement. The following defines a CTE that counts how many rows of each colour there are in the bricks table:
```
with brick_colour_counts as (
  select colour, count(*) 
  from   bricks
  group  by colour
) 
  select * from brick_colour_counts ;
```
Module 11
CTEs: Reusable Subqueries
Because you access CTEs in the same way as a regular table, you can use it many times in your query. This can help if you want to use this subquery many times. For example if you want to find:
- Which colours you have more bricks of than the minimum needed
- The average number of bricks you have of each colour
You need to group the bricks by colour. Then filter the colours table where this count is greater than the minimum_bricks_needed for that colour. And compute the mean of the counts.

You can do the filtering with a nested subquery. And show the average in a scalar subquery. This looks like:
```
select c.colour_name, 
       c.minimum_bricks_needed, (
         select avg ( count(*) )
         from   bricks b
         group  by b.colour
       ) mean_bricks_per_colour
from   colours c
where  c.minimum_bricks_needed < (
  select count(*) c
  from   bricks b
  where  b.colour = c.colour_name
  group  by b.colour
);
```
Note that "group by colour" appears twice in the statement. This creates maintenance problems. If you need to change this, say to join bricks to another table, you have to do this in two places.

Using CTEs, you can do the group by once. Then refer to it in your select:
```
with brick_counts as (
  select b.colour, count(*) c
  from   bricks b
  group  by b.colour
)
  select c.colour_name, 
         c.minimum_bricks_needed, (
           select avg ( bc.c )
           from   brick_counts bc
         ) mean_bricks_per_colour
  from   colours c
  where  c.minimum_bricks_needed < (
    select bc.c
    from   brick_counts bc
    where  bc.colour = c.colour_name
  );
```
So now if you need to join bricks to a new table to get the count you only have to edit the subquery brick_counts.

Oracle Database can also optimize queries that access the same CTE many times. This can make it more efficient than regular subqueries.
Module 12
Literate SQL
Literate programming is a concept introduced by Donald Knuth. The idea is to write code that makes sense if you read it like a book: from start to finish.

Simple SQL queries follow this principle. For example the query "Get me the brick_ids of all the bricks that are red or blue" is the following statement:
```
select brick_id 
from   bricks 
where  colour in ('red', 'blue');
```
But subqueries usually break this human readable flow. For example, if you want to find which bricks you have less of than the average number of each colour, you need to:
1. Count the bricks by colour
2. Take the average of these counts
3. Return those rows where the value in step 1 is less than in step 2
Without the with clause, you need to write something like the following:
```
select colour
from   bricks
group  by colour  
having count (*) < (
  select avg ( colour_count ) 
  from   (
    select colour, count (*) colour_count
    from   bricks
    group  by colour  
  )
);
```
Step one is at the bottom of the query! Using CTEs, you can order the subqueries to match the logic above:
```
with brick_counts as ( 
  -- 1. Count the bricks by colour
  select b.colour, count(*) c
  from   bricks b
  group  by b.colour
), average_bricks_per_colour as ( 
  -- 2. Take the average of these counts
  select avg ( c ) average_count
  from   brick_counts
)
  select * from brick_counts bc  
  join   average_bricks_per_colour ac
  -- 3. Return those rows where the value in step 1 is less than in step 2
  on     bc.c < average_count; 
```
This makes it easier to figure out what a SQL statement does when you return to it later.
Module 13
Testing Subqueries
Another big advantage of the with clause is it makes your SQL easier to test and debug.

If you want to check that the brick counts from the previous query are correct, update the final from clause:
```
with brick_counts as (
  select b.colour, count(*) c
  from   bricks b
  group  by b.colour
), average_bricks_per_colour as (
  select avg ( c ) average_count
  from   brick_counts
)
  select * from brick_counts bc;
```
But with inline views you have to rip out the correct subquery, which can be hard to do!
Module 14
Try it!
Complete the following with clause to count how many rows there are in colours:
```
with colour_count as (

)
  select * from colour_count;
```
This should return
```
COUNT(*)
6
```

Additional Information

Database on OTN SQL and PL/SQL Discussion forums
Oracle Database
Download Oracle Database

Subqueries: Databases for Developers

Prerequisite SQL

Introduction

Inline Views

Try it!

Nested Subqueries

Correlated vs. Uncorrelated

NOT IN vs NOT EXISTS

Try it!

Scalar Subqueries

Try it!

Common Table Expressions

CTEs: Reusable Subqueries

Literate SQL

Testing Subqueries

Try it!

Links

Additional Information