create table my_brick_collection (
colour varchar2(10),
shape varchar2(10),
weight integer
);
create table your_brick_collection (
height integer,
width integer,
depth integer,
colour varchar2(10),
shape varchar2(10)
);
insert into my_brick_collection values ( 'red', 'cube', 10 );
insert into my_brick_collection values ( 'blue', 'cuboid', 8 );
insert into my_brick_collection values ( 'green', 'pyramid', 20 );
insert into my_brick_collection values ( 'green', 'pyramid', 20 );
insert into my_brick_collection values ( null, 'cuboid', 20 );
insert into your_brick_collection values ( 2, 2, 2, 'red', 'cube' );
insert into your_brick_collection values ( 2, 2, 2, 'blue', 'cube' );
insert into your_brick_collection values ( 2, 2, 8, null, 'cuboid' );
commit;
The union operator combines two or more tables into a single result set. To use it, the tables must have the same number of columns with matching data types in each position.
The brick collection tables have different columns. So combining these with select * leads to an error:
select * from my_brick_collection
union
select * from your_brick_collection;
To resolve this, select the common columns. Here that's the colour and shape. So this query returns a list of all the ( colour, shape ) values in the tables:
select colour, shape from my_brick_collection
union
select colour, shape from your_brick_collection;
There are two red cube rows in the tables, one in each table. But union only displays one!
This is because union applies the distinct operator to your results. This discards duplicate rows. So your results only have one row for each set of column values.
You can use distinct by placing it after select and before the column list. This squashes out duplicates. So you only get one row for each set of values in your columns.
For example, there are two green pyramid rows in my collection. If you do a distinct *, you only get one copy of this brick:
select distinct * from my_brick_collection;
You can also use distinct on a subset of a table's columns. For example, to get one row for each shape in your collection, select "distinct shape":
select distinct shape from your_brick_collection;
Often when combining tables you want to see all the rows. Including duplicates. Not the list of distinct values.
You can do this by sticking all after union:
select colour, shape from my_brick_collection
union all
select colour, shape from your_brick_collection;
A standard union is the same as the following:
select distinct * from (
select colour, shape from my_brick_collection
union all
select colour, shape from your_brick_collection
);
The distinct operator adds an extra sorting step to your SQL. So in most cases you'll want to use union all. Save plain union for when you know you want to remove duplicate rows.
Complete this query to return a list of all the colours in the two tables. Each colour must only appear once:
select colour from my_brick_collection
/*TODO*/
select colour from your_brick_collection
order by colour;
The output of this query should be:
COLOUR blue green red <null>
Complete the following query to return a list of all the shapes in both tables. There must show one row for each row in the source tables:
select shape from my_brick_collection
/*TODO*/
select shape from your_brick_collection
order by shape;
This query should return the following rows:
SHAPE cube cube cube cuboid cuboid cuboid pyramid pyramid
The set difference operation returns all the rows in one table not in another. You can do this with not exists. For example:
select colour, shape from your_brick_collection ybc
where not exists (
select null from my_brick_collection mbc
where ybc.colour = mbc.colour
and ybc.shape = mbc.shape
);
If you're comparing many columns this is a lot of typing. And tricky to understand later.
But there's a bigger issue. It handles nulls incorrectly! Because null = null => unknown, it returns rows where shape or colour is null. So the null-coloured cuboid is in the output. Even though this row exists in both tables!
To fix this, test if the columns are equal or both are null:
select colour, shape from your_brick_collection ybc
where not exists (
select null from my_brick_collection mbc
where ( ybc.colour = mbc.colour or
( ybc.colour is null and mbc.colour is null )
)
and ( ybc.shape = mbc.shape or
( ybc.shape is null and mbc.shape is null )
)
);
Oracle Database includes an operator that implements set difference: minus
This is easier to use than not exists. All you need to do is select the relevant columns from each table with minus between them. And set operators are one of the rare cases where the database considers null values to be equal.
So the following query returns the same rows as not exists with null checking:
select colour, shape from your_brick_collection
minus
select colour, shape from my_brick_collection;
But there is another subtle difference. Like union, minus applies a distinct to the output. There are two green pyramids in my collection not in yours. But minus only returns one of these:
select colour, shape from my_brick_collection
minus
select colour, shape from your_brick_collection
To see both, you need to use not exists:
select colour, shape from my_brick_collection mbc
where not exists (
select null from your_brick_collection ybc
where ( ybc.colour = mbc.colour or ( ybc.colour is null and mbc.colour is null ) )
and ybc.shape = mbc.shape
);
Oracle Database 21c added the all option for minus. It also included except as a synonym for minus.You can find values that are in both table with exists. As with not exists, do to this correctly you need to test for null in the subquery:
select colour, shape from your_brick_collection ybc
where exists (
select null from my_brick_collection mbc
where ( ybc.colour = mbc.colour or ( ybc.colour is null and mbc.colour is null ) )
and ybc.shape = mbc.shape
);
There is also an operator to find the common values: intersect
You use this in the same way as union and minus: place it between a select from each table:
select colour, shape from your_brick_collection
intersect
select colour, shape from my_brick_collection;
As with minus, the database considers null values to be the same and applies a distinct operator to the results. Oracle Database 21c added the all clause to intersect as well as minus.
Complete the following query to return a list of all the shapes in my collection not in yours:
select shape from my_brick_collection
/*TODO*/
select shape from your_brick_collection;
This should return the following row:
SHAPE pyramid
Complete the following query to return a list of all the colours that are in both tables:
select colour from my_brick_collection
/*TODO*/
select colour from your_brick_collection
order by colour;
This should return the following rows:
COLOUR blue red <null>
You can combine set operators to implement a classic use case:
Comparing two tables, returning a list of all the values that only exist in one table.
This is also known as the symmetric difference. There isn't a native operator that does this. But you can do it by:
My collection has a blue cuboid and two green pyramids not in yours. And you have a blue cube I don't. So these should appear in the output. But when you chain the operators as described above:
select colour, shape from your_brick_collection
minus
select colour, shape from my_brick_collection
union all
select colour, shape from my_brick_collection
minus
select colour, shape from your_brick_collection;
You only get blue cuboid and green pyramid!
This is because the set operators all have the same priority in Oracle Database. To fix this and do the minuses before union, you need parentheses. Place them around the operations that should happen first:
select * from (
select colour, shape from your_brick_collection
minus
select colour, shape from my_brick_collection
) union all (
select colour, shape from my_brick_collection
minus
select colour, shape from your_brick_collection
);
Or, you could use the following method:
As before, you need brackets to ensure correct order of processing. Which gives:
select * from (
select colour, shape from your_brick_collection
union all
select colour, shape from my_brick_collection
) minus (
select colour, shape from my_brick_collection
intersect
select colour, shape from your_brick_collection
);
There are a couple of drawbacks to both the previous methods. Firstly you have to read all the rows from both tables twice. Secondly minus and intersect return distinct values. So you only see the values only in one table. Not all the rows.
So if you get an extra red cube, you have one more than me. But the queries don't show this:
insert into your_brick_collection values ( 4, 4, 4, 'red', 'cube' );
select * from (
select colour, shape from your_brick_collection
minus
select colour, shape from my_brick_collection
) union all (
select colour, shape from my_brick_collection
minus
select colour, shape from your_brick_collection
);
Luckily there's an alternative which solves both of these issues.
Check whether each table has the same number of rows for each set of values.
You can do this by union alling the two tables together with a couple of extra columns. One to count the rows from the first table, the other for the second. By returning the values 1 or 0 you can sum these up to get the count.
To see the different rows, return those where these sums are not equal in the having clause.
This gives the following query:
select colour, shape, sum ( your_bricks ), sum ( my_bricks )
from (
select colour, shape, 1 your_bricks, 0 my_bricks
from your_brick_collection
union all
select colour, shape, 0 your_bricks, 1 my_bricks
from my_brick_collection
)
group by colour, shape
having sum ( your_bricks ) <> sum ( my_bricks );
Using this method you only read the rows in each table once. So this can be notably faster than the classic way. And you can see how many rows have each set of values in each table.
This leads to another benefit of this method: it enables you to see which table has more rows. You can do this by comparing the sum of bricks for each colour and shape. Then return the name of the table that has more. You can also get the number of extra rows. Do this by finding the absolute value of the difference between these two sums.
For example:
select colour, shape,
case
when sum ( your_bricks ) < sum ( my_bricks ) then 'ME'
when sum ( your_bricks ) > sum ( my_bricks ) then 'YOU'
else 'EQUAL'
end who_has_extra,
abs ( sum ( your_bricks ) - sum ( my_bricks ) ) how_many
from (
select colour, shape, 1 your_bricks, 0 my_bricks
from your_brick_collection
union all
select colour, shape, 0 your_bricks, 1 my_bricks
from my_brick_collection
)
group by colour, shape;