create table bricks (
brick_id integer,
colour varchar2(10)
);
create table colours (
colour_name varchar2(10),
minimum_bricks_needed integer
);
insert into colours values ( 'blue', 2 );
insert into colours values ( 'green', 3 );
insert into colours values ( 'red', 2 );
insert into colours values ( 'orange', 1);
insert into colours values ( 'yellow', 1 );
insert into colours values ( 'purple', 1 );
insert into bricks values ( 1, 'blue' );
insert into bricks values ( 2, 'blue' );
insert into bricks values ( 3, 'blue' );
insert into bricks values ( 4, 'green' );
insert into bricks values ( 5, 'green' );
insert into bricks values ( 6, 'red' );
insert into bricks values ( 7, 'red' );
insert into bricks values ( 8, 'red' );
insert into bricks values ( 9, null );
commit;
This tutorial shows you how to write subqueries. It uses the bricks and colours table. Run the queries below to see their contents:
select * from bricks;
select * from colours;
Note: This tutorial makes heavy use of group by. If you want a refresher on this, check out module 7 of Databases for Developers: Foundations
An inline view replaces a table in the from clause of your query. In this query, "select * from bricks" is an inline view:
select * from (
select * from bricks
)
You use inline views to calculate an intermediate result set. For example, you can count the number of bricks you have of each colour using:
select * from (
select colour, count(*) c
from bricks
group by colour
) brick_counts
You can then join the result of this to the colours table. This allows you to find out which colours you have fewer bricks of than the minimum needed defined in colours:
select * from (
select colour, count(*) c
from bricks
group by colour
) brick_counts
right join colours
on brick_counts.colour = colours.colour_name
where nvl ( brick_counts.c, 0 ) < colours.minimum_bricks_needed
The outer join is necessary to return colours for which you have no bricks (yellow, purple, and orange)
Complete the query below, using an inline view to find the min and max brick_id for each colour of brick:
select * from (
)
Hint: you need group by colour. Use min(brick_id) to find the minimum. The query should return the following rows (the order may be different):
COLOUR MIN(BRICK_ID) MAX(BRICK_ID) green 4 5 <null> 9 9 red 6 8 blue 1 3
Nested subqueries go in your where clause. The query filters rows in the parent tables.
For example, to find all the rows in colours where you have a matching brick, you could write:
select * from colours c
where c.colour_name in (
select b.colour from bricks b
);
You can also use exists in a nested subquery. The following is the same as the previous query:
select * from colours c
where exists (
select null from bricks b
where b.colour = c.colour_name
);
You can filter rows in a subquery too. To find all the colours that have at least one brick with a brick_id less than 5, write:
select * from colours c
where c.colour_name in (
select b.colour from bricks b
where b.brick_id < 5
);
Or with exists:
select * from colours c
where exists (
select null from bricks b
where b.colour = c.colour_name
and b.brick_id < 5
);
A subquery is correlated when it joins to a table from the parent query. If you don't, then it's uncorrelated.
This leads to a difference between IN and EXISTS. EXISTS returns rows from the parent query, as long as the subquery finds at least one row. So the following uncorrelated EXISTS returns all the rows in colours:
select * from colours
where exists (
select null from bricks
);
Note: When using EXISTS, what you select in the subquery does not matter because it is only checking the existence of a row that matches the where clause (if there is one). You can "select null" or "select 1" or select an actual column.
This is because the query means:
Return all the in colours if there is at least one row in bricks
To find all the colour rows which have at least one row in bricks of the same colour, you must join in the subquery. Normally you need to correlate EXISTS subqueries with a table in the parent.
You can do the reverse of IN & EXISTS by placing NOT in front of them. This returns you all the rows from the parent which don't have a match in the subquery.
For example to find all the rows in colours without a matching colour in bricks, you can use NOT EXISTS:
select * from colours c
where not exists (
select null from bricks b
where b.colour = c.colour_name
);
But do the same with NOT IN:
select * from colours c
where c.colour_name not in (
select b.colour from bricks b
);
And your query returns nothing!
This is because there is a row in bricks with a null colour. So the previous query is the same as:
select * from colours c
where c.colour_name not in (
'red', 'green', 'blue',
'orange', 'yellow', 'purple',
null
);
For the NOT IN condition to be true, comparing all its elements to the parent table must return false.
But remember that comparing anything to null gives unknown! So the whole expression is unknown and you get no data.
To resolve this, either use NOT EXISTS or add a where clause to stop the subquery returning null values:
select * from colours c
where c.colour_name not in (
select b.colour from bricks b
where b.colour is not null
);
Scalar subqueries return one column and at most one row. You can replace a column with a scalar subquery in most cases.
For example, to return a count of the number of bricks matching each colour, you could do the following:
select colour_name, (
select count(*)
from bricks b
where b.colour = c.colour_name
group by b.colour
) brick_counts
from colours c;
Note the colours with no matching bricks return null. To show zero instead, you can use NVL or coalesce. This needs to go around the whole subquery:
select colour_name, nvl ( (
select count(*)
from bricks b
where b.colour = c.colour_name
group by b.colour
), 0 ) brick_counts
from colours c;
You also need to join bricks to colours in the subquery. If you don't, it will return four rows (one for each different value for colour in bricks). This leads to an ORA-01427 error:
select c.colour_name, (
select count(*)
from bricks b
group by colour
) brick_counts
from colours c;
Usually you will correlate a scalar subquery with a parent table to give the correct answer.
You can also use scalar subqueries in your having clause. So instead of a join, you could write the query in part 1 to find those bricks you have less than the minimum needed like so:
select colour, count(*) c
from bricks b
group by colour
having count(*) < (
select c.minimum_bricks_needed
from colours c
where c.colour_name = b.colour
);
Complete the scalar subquery below to find the minimum brick_id for each colour:
select c.colour_name, (
) min_brick_id
from colours c
where c.colour_name is not null;
The query should return the following rows:
COLOUR_NAME MIN_BRICK_ID green 4 red 6 blue 1 purple <null> yellow <null> orange <null>
Common table expressions (CTEs) enable you to name subqueries. You then refer to these like normal tables elsewhere in your query. This can make your SQL easier to write and understand later.
CTEs go in the with clause above the select statement. The following defines a CTE that counts how many rows of each colour there are in the bricks table:
with brick_colour_counts as (
select colour, count(*)
from bricks
group by colour
)
select * from brick_colour_counts ;
Because you access CTEs in the same way as a regular table, you can use it many times in your query. This can help if you want to use this subquery many times. For example if you want to find:
You need to group the bricks by colour. Then filter the colours table where this count is greater than the minimum_bricks_needed for that colour. And compute the mean of the counts.
You can do the filtering with a nested subquery. And show the average in a scalar subquery. This looks like:
select c.colour_name,
c.minimum_bricks_needed, (
select avg ( count(*) )
from bricks b
group by b.colour
) mean_bricks_per_colour
from colours c
where c.minimum_bricks_needed < (
select count(*) c
from bricks b
where b.colour = c.colour_name
group by b.colour
);
Note that "group by colour" appears twice in the statement. This creates maintenance problems. If you need to change this, say to join bricks to another table, you have to do this in two places.
Using CTEs, you can do the group by once. Then refer to it in your select:
with brick_counts as (
select b.colour, count(*) c
from bricks b
group by b.colour
)
select c.colour_name,
c.minimum_bricks_needed, (
select avg ( bc.c )
from brick_counts bc
) mean_bricks_per_colour
from colours c
where c.minimum_bricks_needed < (
select bc.c
from brick_counts bc
where bc.colour = c.colour_name
);
So now if you need to join bricks to a new table to get the count you only have to edit the subquery brick_counts.
Oracle Database can also optimize queries that access the same CTE many times. This can make it more efficient than regular subqueries.
Literate programming is a concept introduced by Donald Knuth. The idea is to write code that makes sense if you read it like a book: from start to finish.
Simple SQL queries follow this principle. For example the query "Get me the brick_ids of all the bricks that are red or blue" is the following statement:
select brick_id
from bricks
where colour in ('red', 'blue');
But subqueries usually break this human readable flow. For example, if you want to find which bricks you have less of than the average number of each colour, you need to:
Without the with clause, you need to write something like the following:
select colour
from bricks
group by colour
having count (*) < (
select avg ( colour_count )
from (
select colour, count (*) colour_count
from bricks
group by colour
)
);
Step one is at the bottom of the query! Using CTEs, you can order the subqueries to match the logic above:
with brick_counts as (
-- 1. Count the bricks by colour
select b.colour, count(*) c
from bricks b
group by b.colour
), average_bricks_per_colour as (
-- 2. Take the average of these counts
select avg ( c ) average_count
from brick_counts
)
select * from brick_counts bc
join average_bricks_per_colour ac
-- 3. Return those rows where the value in step 1 is less than in step 2
on bc.c < average_count;
This makes it easier to figure out what a SQL statement does when you return to it later.
Another big advantage of the with clause is it makes your SQL easier to test and debug.
If you want to check that the brick counts from the previous query are correct, update the final from clause:
with brick_counts as (
select b.colour, count(*) c
from bricks b
group by b.colour
), average_bricks_per_colour as (
select avg ( c ) average_count
from brick_counts
)
select * from brick_counts bc;
But with inline views you have to rip out the correct subquery, which can be hard to do!