Union, Minus, and Intersect: Databases for Developers

View All Tutorials Login to Run Tutorial

Tutorial Union, Minus, and Intersect: Databases for Developers
Description An introduction to the set operators, union, minus, and intersect
Tags union, minus, intersect
Area SQL General
Contributor Chris Saxon (Oracle)
Created Sunday May 20, 2018
Modules 12

Prerequisite SQL

create table my_brick_collection (
  colour varchar2(10),
  shape  varchar2(10),
  weight integer
);

create table your_brick_collection (
  height integer,
  width  integer,
  depth  integer,
  colour varchar2(10),
  shape  varchar2(10)
);

insert into my_brick_collection values ( 'red', 'cube', 10 );
insert into my_brick_collection values ( 'blue', 'cuboid', 8 );
insert into my_brick_collection values ( 'green', 'pyramid', 20 );
insert into my_brick_collection values ( 'green', 'pyramid', 20 );
insert into my_brick_collection values ( null, 'cuboid', 20 );

insert into your_brick_collection values ( 2, 2, 2, 'red', 'cube' );
insert into your_brick_collection values ( 2, 2, 2, 'blue', 'cube' );
insert into your_brick_collection values ( 2, 2, 8, null, 'cuboid' );

commit;

Module 1
Introduction
The set operators union, intersect, and minus allow you to combine many tables into one. This tutorial will use these two tables for the queries:
```
select * from my_brick_collection;

select * from your_brick_collection;
```
Module 2
Union
The union operator combines two or more tables into a single result set. To use it, the tables must have the same number of columns with matching data types in each position.

The brick collection tables have different columns. So combining these with select * leads to an error:
```
select * from my_brick_collection
union 
select * from your_brick_collection;
```
To resolve this, select the common columns. Here that's the colour and shape. So this query returns a list of all the ( colour, shape ) values in the tables:
```
select colour, shape from my_brick_collection
union 
select colour, shape from your_brick_collection;
```
Module 3
Distinct
There are two red cube rows in the tables, one in each table. But union only displays one!

This is because union applies the distinct operator to your results. This discards duplicate rows. So your results only have one row for each set of column values.

You can use distinct by placing it after select and before the column list. This squashes out duplicates. So you only get one row for each set of values in your columns.

For example, there are two green pyramid rows in my collection. If you do a distinct *, you only get one copy of this brick:
```
select distinct * from my_brick_collection;
```
You can also use distinct on a subset of a table's columns. For example, to get one row for each shape in your collection, select "distinct shape":
```
select distinct shape from your_brick_collection;
```
Module 4
Union All
Often when combining tables you want to see all the rows. Including duplicates. Not the list of distinct values.

You can do this by sticking all after union:
```
select colour, shape from my_brick_collection
union all 
select colour, shape from your_brick_collection;
```
A standard union is the same as the following:
```
select distinct * from (
  select colour, shape from my_brick_collection
  union all 
  select colour, shape from your_brick_collection
);
```
The distinct operator adds an extra sorting step to your SQL. So in most cases you'll want to use union all. Save plain union for when you know you want to remove duplicate rows.

Module 5

Try It!

Complete this query to return a list of all the colours in the two tables. Each colour must only appear once:

select colour from my_brick_collection
/*TODO*/
select colour from your_brick_collection
order by colour;

The output of this query should be:

COLOUR   
blue     
green    
red      
<null>

Complete the following query to return a list of all the shapes in both tables. There must show one row for each row in the source tables:

select shape from my_brick_collection
/*TODO*/
select shape from your_brick_collection
order  by shape;

This query should return the following rows:

SHAPE       
cube      
cube      
cube      
cuboid    
cuboid    
cuboid    
pyramid   
pyramid

Module 6
Set Difference
The set difference operation returns all the rows in one table not in another. You can do this with not exists. For example:
```
select colour, shape from your_brick_collection ybc
where  not exists (
  select null from my_brick_collection mbc
  where  ybc.colour = mbc.colour
  and    ybc.shape = mbc.shape
);
```
If you're comparing many columns this is a lot of typing. And tricky to understand later.

But there's a bigger issue. It handles nulls incorrectly! Because null = null => unknown, it returns rows where shape or colour is null. So the null-coloured cuboid is in the output. Even though this row exists in both tables!

To fix this, test if the columns are equal or both are null:
```
select colour, shape from your_brick_collection ybc
where  not exists (
  select null from my_brick_collection mbc
  where  ( ybc.colour = mbc.colour or 
           ( ybc.colour is null and mbc.colour is null ) 
         )
  and    ( ybc.shape = mbc.shape or
           ( ybc.shape is null and mbc.shape is null ) 
         )
);
```
Module 7
Minus
Oracle Database includes an operator that implements set difference: minus

This is easier to use than not exists. All you need to do is select the relevant columns from each table with minus between them. And set operators are one of the rare cases where the database considers null values to be equal.

So the following query returns the same rows as not exists with null checking:
```
select colour, shape from your_brick_collection
minus
select colour, shape from my_brick_collection;
```
But there is another subtle difference. Like union, minus applies a distinct to the output. There are two green pyramids in my collection not in yours. But minus only returns one of these:
```
select colour, shape from my_brick_collection
minus
select colour, shape from your_brick_collection
```
To see both, you need to use not exists:
```
select colour, shape from my_brick_collection mbc
where  not exists (
  select null from your_brick_collection ybc
  where  ( ybc.colour = mbc.colour or ( ybc.colour is null and mbc.colour is null ) )
  and    ybc.shape = mbc.shape
);
```
Oracle Database 21c added the all option for minus. It also included except as a synonym for minus.

Module 8

Finding Common Values

You can find values that are in both table with exists. As with not exists, do to this correctly you need to test for null in the subquery:

select colour, shape from your_brick_collection ybc
where  exists (
  select null from my_brick_collection mbc
  where  ( ybc.colour = mbc.colour or ( ybc.colour is null and mbc.colour is null ) )
  and    ybc.shape = mbc.shape
);

Module 9
Intersect
There is also an operator to find the common values: intersect

You use this in the same way as union and minus: place it between a select from each table:
```
select colour, shape from your_brick_collection
intersect
select colour, shape from my_brick_collection;
```
As with minus, the database considers null values to be the same and applies a distinct operator to the results. Oracle Database 21c added the all clause to intersect as well as minus.
Module 10
Try It!
Complete the following query to return a list of all the shapes in my collection not in yours:
```
select shape from my_brick_collection
/*TODO*/
select shape from your_brick_collection;
```
This should return the following row:
```
SHAPE     
pyramid  
```
Complete the following query to return a list of all the colours that are in both tables:
```
select colour from my_brick_collection
/*TODO*/
select colour from your_brick_collection
order  by colour;
```
This should return the following rows:
```
COLOUR   
blue     
red      
<null>   
```
Module 11
Finding the Difference Between Two Tables (Symmetric Difference)
You can combine set operators to implement a classic use case:

Comparing two tables, returning a list of all the values that only exist in one table.

This is also known as the symmetric difference. There isn't a native operator that does this. But you can do it by:
- Finding the rows in table one not in table two with minus
- Finding the rows in table two not in table one with minus
- Combining the output of these two operations with union (all)
My collection has a blue cuboid and two green pyramids not in yours. And you have a blue cube I don't. So these should appear in the output. But when you chain the operators as described above:
```
select colour, shape from your_brick_collection
minus
select colour, shape from my_brick_collection
union all
select colour, shape from my_brick_collection
minus
select colour, shape from your_brick_collection;
```
You only get blue cuboid and green pyramid!

This is because the set operators all have the same priority in Oracle Database. To fix this and do the minuses before union, you need parentheses. Place them around the operations that should happen first:
```
select * from ( 
  select colour, shape from your_brick_collection
  minus
  select colour, shape from my_brick_collection
) union all (
  select colour, shape from my_brick_collection
  minus
  select colour, shape from your_brick_collection
);
```
Or, you could use the following method:
- Combine all the rows from the two tables with union (all)
- Find the values that exist in both tables with intersect
- Minus the second query from the first
As before, you need brackets to ensure correct order of processing. Which gives:
```
select * from ( 
  select colour, shape from your_brick_collection
  union all
  select colour, shape from my_brick_collection
) minus (
  select colour, shape from my_brick_collection
  intersect
  select colour, shape from your_brick_collection
);
```
Module 12
Symmetric Difference with Group By
There are a couple of drawbacks to both the previous methods. Firstly you have to read all the rows from both tables twice. Secondly minus and intersect return distinct values. So you only see the values only in one table. Not all the rows.

So if you get an extra red cube, you have one more than me. But the queries don't show this:
```
insert into your_brick_collection values ( 4, 4, 4, 'red', 'cube' );

select * from ( 
  select colour, shape from your_brick_collection
  minus
  select colour, shape from my_brick_collection
) union all (
  select colour, shape from my_brick_collection
  minus
  select colour, shape from your_brick_collection
);
```
Luckily there's an alternative which solves both of these issues.

Check whether each table has the same number of rows for each set of values.

You can do this by union alling the two tables together with a couple of extra columns. One to count the rows from the first table, the other for the second. By returning the values 1 or 0 you can sum these up to get the count.

To see the different rows, return those where these sums are not equal in the having clause.

This gives the following query:
```
select colour, shape, sum ( your_bricks ), sum ( my_bricks ) 
from (
  select colour, shape, 1 your_bricks, 0 my_bricks
  from   your_brick_collection
  union all
  select colour, shape, 0 your_bricks, 1 my_bricks 
  from   my_brick_collection
)
group  by colour, shape
having sum ( your_bricks ) <> sum ( my_bricks );
```
Using this method you only read the rows in each table once. So this can be notably faster than the classic way. And you can see how many rows have each set of values in each table.

This leads to another benefit of this method: it enables you to see which table has more rows. You can do this by comparing the sum of bricks for each colour and shape. Then return the name of the table that has more. You can also get the number of extra rows. Do this by finding the absolute value of the difference between these two sums.

For example:
```
select colour, shape, 
       case
         when sum ( your_bricks ) < sum ( my_bricks ) then 'ME'
         when sum ( your_bricks ) > sum ( my_bricks ) then 'YOU'
         else 'EQUAL'
       end who_has_extra,
       abs ( sum ( your_bricks ) - sum ( my_bricks ) ) how_many
from (
  select colour, shape, 1 your_bricks, 0 my_bricks
  from   your_brick_collection
  union all
  select colour, shape, 0 your_bricks, 1 my_bricks 
  from   my_brick_collection
)
group  by colour, shape;
```

Additional Information

Database on OTN SQL and PL/SQL Discussion forums
Oracle Database
Download Oracle Database

Union, Minus, and Intersect: Databases for Developers

Prerequisite SQL

Introduction

Union

Distinct

Union All

Try It!

Set Difference

Minus

Finding Common Values

Intersect

Try It!

Finding the Difference Between Two Tables (Symmetric Difference)

Symmetric Difference with Group By

Links

Additional Information