Introduction to MATCH_RECOGNIZE

View All Tutorials Login to Run Tutorial

Tutorial Introduction to MATCH_RECOGNIZE
Description This is a simple example that introduces the main keywords used in MATCH_RECOGNIZE. During this tutorial you will explore how to search for V-shaped patterns in a stock-ticker data set.
Tags SQL, PATTERN MATCHING
Area SQL Analytics
Contributor Keith Laker (Oracle)
Created Tuesday January 24, 2017
Modules 17

Module 1
Overview of SQL Pattern Matching
This simple tutorial provides a nice gentle introduction to the various keywords within the 12c MATCH_RECOGNIZE clause. It's broken into three parts

Part 1 covers setting up the ticker dataset

Part 2 covers running your first simple pattern matching query. This focuses on the PARTITION BY and ORDER BY clauses showing what happens if you omit these optional keywords.

Part 3 covers running a more complex pattern matching query where you are going to search for V-shaped patters within the ticker data. This section looks at the other keywords that make up the MATCH_RECOGNIZE clause such as PATTER, DEFINE, SKIP TO..., ALL ROWS, ONE ROW etc.

If you have any questions about this tutorial or the MATCH_RECOGNIZE clause then please contact me via email: keith.laker@oracle.com.

Module 2

Part 1: Create my dataset

Creating our ticker data set

First step is to setup our data table and then populate it with data

CREATE TABLE ticker (SYMBOL VARCHAR2(10), tstamp DATE, price NUMBER);

BEGIN
INSERT INTO ticker VALUES('ACME', '01-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '17-Apr-11', 8);
INSERT INTO ticker VALUES('GLOBEX', '01-Apr-11', 11);
INSERT INTO ticker VALUES('OSCORP', '20-Apr-11', 9);
INSERT INTO ticker VALUES('ACME', '02-Apr-11', 17);
INSERT INTO ticker VALUES('OSCORP', '19-Apr-11', 11);
INSERT INTO ticker VALUES('ACME', '03-Apr-11', 19);
INSERT INTO ticker VALUES('GLOBEX', '03-Apr-11', 13);
INSERT INTO ticker VALUES('OSCORP', '18-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '02-Apr-11', 12);
INSERT INTO ticker VALUES('ACME', '04-Apr-11', 21);
INSERT INTO ticker VALUES('GLOBEX', '04-Apr-11', 12);
INSERT INTO ticker VALUES('OSCORP', '17-Apr-11', 14);
INSERT INTO ticker VALUES('OSCORP', '15-Apr-11', 12);
INSERT INTO ticker VALUES('OSCORP', '14-Apr-11', 15);
INSERT INTO ticker VALUES('OSCORP', '16-Apr-11', 16);
INSERT INTO ticker VALUES('ACME', '05-Apr-11', 25);
INSERT INTO ticker VALUES('GLOBEX', '05-Apr-11', 11);
INSERT INTO ticker VALUES('ACME', '06-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '06-Apr-11', 10);
INSERT INTO ticker VALUES('ACME', '07-Apr-11', 15);
INSERT INTO ticker VALUES('GLOBEX', '07-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '08-Apr-11', 8);
INSERT INTO ticker VALUES('ACME', '08-Apr-11', 20);
INSERT INTO ticker VALUES('OSCORP', '13-Apr-11', 11);
INSERT INTO ticker VALUES('ACME', '13-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '10-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '11-Apr-11', 19);
INSERT INTO ticker VALUES('ACME', '09-Apr-11', 24);
INSERT INTO ticker VALUES('GLOBEX', '09-Apr-11', 9);
INSERT INTO ticker VALUES('OSCORP', '12-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '10-Apr-11', 9);
INSERT INTO ticker VALUES('OSCORP', '11-Apr-11', 15);
INSERT INTO ticker VALUES('GLOBEX', '11-Apr-11', 9);
INSERT INTO ticker VALUES('OSCORP', '10-Apr-11', 15);
INSERT INTO ticker VALUES('ACME', '12-Apr-11', 15);
INSERT INTO ticker VALUES('GLOBEX', '12-Apr-11', 9);
INSERT INTO ticker VALUES('OSCORP', '09-Apr-11', 16);
INSERT INTO ticker VALUES('GLOBEX', '13-Apr-11', 10);
INSERT INTO ticker VALUES('OSCORP', '08-Apr-11', 20);
INSERT INTO ticker VALUES('ACME', '14-Apr-11', 25);
INSERT INTO ticker VALUES('GLOBEX', '14-Apr-11', 11);
INSERT INTO ticker VALUES('OSCORP', '07-Apr-11', 17);
INSERT INTO ticker VALUES('OSCORP', '06-Apr-11', 20);
INSERT INTO ticker VALUES('ACME', '15-Apr-11', 14);
INSERT INTO ticker VALUES('GLOBEX', '15-Apr-11', 12);
INSERT INTO ticker VALUES('ACME', '17-Apr-11', 14);
INSERT INTO ticker VALUES('ACME', '16-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '16-Apr-11', 11);
INSERT INTO ticker VALUES('OSCORP', '05-Apr-11', 17);
INSERT INTO ticker VALUES('ACME', '18-Apr-11', 24);
INSERT INTO ticker VALUES('GLOBEX', '18-Apr-11', 7);
INSERT INTO ticker VALUES('OSCORP', '04-Apr-11', 18);
INSERT INTO ticker VALUES('OSCORP', '03-Apr-11', 19);
INSERT INTO ticker VALUES('ACME', '19-Apr-11', 23);
INSERT INTO ticker VALUES('GLOBEX', '19-Apr-11', 5);
INSERT INTO ticker VALUES('OSCORP', '02-Apr-11', 22);
INSERT INTO ticker VALUES('ACME', '20-Apr-11', 22);
INSERT INTO ticker VALUES('GLOBEX', '20-Apr-11', 3);
INSERT INTO ticker VALUES('OSCORP', '01-Apr-11', 22);

commit;
END;

Module 3
Viewing my data
Quick review of our ticker data

Now let's check to see how many rows are in our dataset
```
SELECT count(*) FROM ticker;
SELECT symbol, min(tstamp), max(tstamp), count(*) FROM ticker GROUP BY symbol;
```
You should have 60 rows of data spread across three symbols (ACME, GLOBEX, OSCORP) with 20 rows of data for each ticker symbol. Our ticker data for each symbol starts on April 1 and ends on April 20.
Module 4
Part 2: Your first simple pattern
For this data set we would expect to be searching for patterns within each stock symbol and to correctly find a "pattern" we will need to have the data ordered by the timestamp for each trade. Hopefully, that is an obvious statement.

Here is your first MATCH_RECOGNIZE statement. It's relatively simple and we are going to built on it during this tutorial. Firstly, notice that we have used the ORDER BY clause to sort the data by the timestamp for each trade.

Let's just run the code and view the results
```
SELECT * FROM ticker 
MATCH_RECOGNIZE (
 ORDER BY tstamp 
 MEASURES e.tstamp as st, e.symbol as s, e.price as p 
 ONE ROW PER MATCH 
 PATTERN (e) 
 DEFINE 
 e AS price=price);
```
First observation: you should have noticed that the data is in random order when it comes to the stock ticker because there is more than one record per tstamp. This is why the results appear in a random order in terms of the stock ticker column.
Module 5
How to group your data
What we need to do is group the data by each stock ticker symbol so we need to use the PARTITION BY clause

Let's amend our example and now we do it the right way by adding a PARTITION BY clause:
```
SELECT * FROM ticker 
MATCH_RECOGNIZE (
 PARTITION BY symbol ORDER BY tstamp 
 MEASURES e.tstamp as st, e.symbol as s, e.price as p 
 ONE ROW PER MATCH 
 PATTERN (e) 
 DEFINE 
 e AS price=price);
```
There is more information about PARTITION BY and ORDER BY in section 14 of this tutorial
Module 6
Correct number of rows?
Now let's change our simple pattern and test to see if the timestamp in the current row is greater than the timestamp in the previous row. Which it obviously will be because we are ordering by the timestamp column. The new pattern is described within the DEFINE clause.

How many rows will be returned when you run this query?
```
SELECT * FROM ticker 
MATCH_RECOGNIZE (
 PARTITION BY symbol ORDER BY tstamp 
 ALL ROWS PER MATCH 
 PATTERN (e+) 
 DEFINE 
  e AS tstamp > prev(tstamp));
```
You should be expecting to see 60 rows returned because that is how many rows are in our source table. So why do we only get 57 rows returned and not 60?

This answer is because the very first record in each partition has no previous timestamp to compare it with, therefore, the first record in each partition is not matched. As we have three partitions (3 symbols: ACME, GLOBEX, OSCORP) that means we lose three rows.
Module 7
Using ALWAYS TRUE events
To correct the "missing first row" issue we need to add an always true event to our PATTERN clause, as shown below.

This query will now return all 60 rows from our dataset as expected because the timestamp for each row is greater than the timestamp in the previous row.
```
SELECT * FROM ticker 
MATCH_RECOGNIZE (
 PARTITION BY symbol ORDER BY tstamp 
 ALL ROWS PER MATCH 
 PATTERN (strt e+) 
 DEFINE 
  e AS tstamp > prev(tstamp));
```
Module 8
Checking results
How can we check the results from our MATCH_RECOGNIZE query? We can use two of the built-in measures to help us make sure that our pattern is being correctly applied to our data set. These two measures are CLASSIFIER and MATCH_NUMBER().

Let's run the code below and focus on the results just for symbol ACME by adding a WHERE clause.
```
SELECT * FROM ticker 
MATCH_RECOGNIZE (
PARTITION BY symbol ORDER BY tstamp 
MEASURES classifier() event, 
         match_number() match 
ALL ROWS PER MATCH 
PATTERN (strt e+) 
DEFINE 
 e AS tstamp > prev(tstamp))
WHERE symbol='ACME';
```
What you should see is that the first row (01-Apr-11) is mapped to our always-true event "STRT" and all the other rows in our partition are mapped to our timestamp-based event "E".
Module 9
MATCH_NUMBER and partitions
Now let's look at what happens to our MATCH_NUMBER() function within each PARTITION by removing the WHERE clause.

Let's run the code below:
```
SELECT * FROM ticker 
MATCH_RECOGNIZE (
PARTITION BY symbol ORDER BY tstamp 
MEASURES e.tstamp as t, 
         e.symbol as s, 
         e.price as p, 
         classifier() event, 
         match_number() match 
ALL ROWS PER MATCH 
PATTERN (strt e+) 
DEFINE 
 e AS tstamp > prev(tstamp));
```
What you should see is that the first row (01-Apr-11) in each partition is mapped to the "STRT" event and the MATCH_NUMBER function starts at 1 within each partition.
Module 10
Part 3: Searching for V-Shaped Patterns
Now let's build a slightly more complex example where we search our dataset for V-shaped trading patterns, i.e. where the price goes down and then the price goes up.

Below we have changed the PATTERN and DEFINE clauses to use an always-true event (so we can capture the first row in each partition) as before along with two pattern variables that will test the value of the price in the current row against the price in the previous row

Let's run the code:
```
SELECT *
FROM Ticker 
MATCH_RECOGNIZE (
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES 
  FIRST(DOWN.tstamp) AS start_ts,
  LAST(DOWN.tstamp) AS bottom_ts,
  LAST(UP.tstamp) AS end_ts
 ONE ROW PER MATCH
 AFTER MATCH SKIP PAST LAST ROW
 PATTERN (STRT DOWN+ UP+)
 DEFINE
  DOWN AS price < PREV(price), 
  UP AS price > PREV(price)
) MR
ORDER BY MR.symbol, MR.start_ts;
```
You should see 6 rows returned

The following sections will explain each of the main keywords in more detail.
Module 11
What do PARTITION BY + ORDER BY do?
If you have used the analytic SQL functions (windows) then you will already be familiar with how these keywords are used and MATCH_RECOGNIZE simply refuses these same concepts.

The keywords PARTITION BY divides the data from the Ticker table into logical groups where, in this case, each group contains one stock symbol.

This clause is optional but it is highly likely that every MATCH_RECOGNIZE clause that you write will contain a PARTITION BY clause

The ORDER BY clause orders the data within each logical group. This clause is also optional but as before, it is highly likely that every MATCH_RECOGNIZE clause that you write will contain a ORDER BY clause
Module 12
What does the keyword MEASURES do?
The pattern matching clause enables you to create expressions useful in a wide range of analyses. These are presented as columns in the output table by using the MEASURES clause. This clause defines row pattern measure columns, whose value is computed by evaluating an expression related to a particular match.

In our example we are defining three measures:
1. the timestamp at the beginning of a V-shape (start_ts)
2. the timestamp at the bottom of a V-shape (bottom_ts)
3. the timestamp at the end of the a V-shape (end_ts)
The start_ts uses the FIRST() function to extract the first timestamp value from the rows matched to the DOWN pattern variable. The bottom_ts and end_ts measures use the LAST() function to extract the last timestamp values from the rows matched to the DOWN and UP pattern variables.

Module 13

How to control the output

You will sometimes want summary data about the matches and other times need more detailed information. You can control this by using one of following clauses...

ONE ROW PER MATCH
ALL ROWS PER MATCH

ONE ROW PER MATCH means that for every match found, there will be one row of output.

The alternative is to use the ALL ROWS PER MATCH keywords which provides a more verbose report and opens up the opportunity to use some of the built-in MEASURES that can be used to help you debug your pattern matching process

Compare the output of the two statements:

SELECT *
FROM Ticker 
MATCH_RECOGNIZE (
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES 
  FIRST(down.tstamp) AS start_ts,
  LAST(DOWN.tstamp) AS bottom_ts,
  LAST(UP.tstamp) AS end_ts
 ALL ROWS PER MATCH
 AFTER MATCH SKIP PAST LAST ROW
 PATTERN (STRT DOWN+ UP+)
 DEFINE
  DOWN AS price < PREV(price), 
  UP AS price > PREV(price)
) MR
ORDER BY MR.symbol, MR.start_ts;

Notice that when we run the code below the number of columns returned is reduced as well as the number of rows

SELECT *
FROM Ticker 
MATCH_RECOGNIZE (
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES 
  FIRST(down.tstamp) AS start_ts,
  LAST(DOWN.tstamp) AS bottom_ts,
  LAST(UP.tstamp) AS end_ts
 ONE ROW PER MATCH
 AFTER MATCH SKIP PAST LAST ROW
 PATTERN (STRT DOWN+ UP+)
 DEFINE
  DOWN AS price < PREV(price), 
  UP AS price > PREV(price)
) MR
ORDER BY MR.symbol, MR.start_ts;

Module 14
What does AFTER MATCH SKIP do?
The AFTER MATCH SKIP PAST LAST ROW clause means that whenever we find a match, we restart the search process for the next match at the next row after the last row mapped to the UP pattern variable. This is the default behavior.

We have a lot of control over where to restart the matching process. For this example what we want to explore is the impact on the results if we change the default behavior. Lets try AFTER MATCH SKIP TO LAST UP and see what happens
```
SELECT *
FROM Ticker 
MATCH_RECOGNIZE (
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES 
  FIRST(down.tstamp) AS start_ts,
  LAST(DOWN.tstamp) AS bottom_ts,
  LAST(UP.tstamp) AS end_ts
 ONE ROW PER MATCH
 AFTER MATCH SKIP TO LAST UP
 PATTERN (STRT DOWN+ UP+)
 DEFINE
  DOWN AS price < PREV(price), 
  UP AS price > PREV(price)
) MR
ORDER BY MR.symbol, MR.start_ts;
```
You will now find that more rows are returned because we are starting our search for the next match at a point where we a guaranteed to find that the price of next row is lower than the price of the previous row. The STRT variable is mapped to the last row mapped to up - in effect we have a row being double mapped.

Hopefully, it is obvious that this is an important clause and you will need to think very carefully about where you to restart searching for your patterns.
Module 15
What does the keyword PATTERN do?
The PATTERN clause defines a regular expression in similar way to the existing Oracle regular expression functions. However, within this clause we have create a completely new a highly expressive way to search for patterns.

In the above example, the PATTERN clause is defined as:

PATTERN (STRT DOWN+ UP+)

this syntax indicates that the pattern we are searching for has three pattern variables: STRT, DOWN, and UP. The plus sign (+) after DOWN and UP means that at least one row must be mapped to each of them.

The plus sign (+) is known as a "quantifier" and there is a large library of quantifiers available to help you define your own patterns. Quantifiers define the number of iterations accepted for a match. Use the link below to access the documentation to learn more about quantifiers.
Module 16
What does the keyword DEFINE do?
The DEFINE keyword gives us the conditions that must be met for a row to map to your row pattern variables STRT, DOWN, and UP. Because there is no condition for STRT, any row can be mapped to STRT. It is in effect and "always-true" event.

Why have a pattern variable with no condition?

We are using the STRT variable as a starting point for testing and anchoring our matching process. It ensures that we capture and test our pattern beginning at the very first row within each of our partition.

Both DOWN and UP take advantage of the PREV() function, which lets them compare the price in the current row to the price in the prior row. DOWN is matched when a row has a lower price than the row that preceded it, so it defines the downward (left) leg of our V-shape.

A row can be mapped to UP if the row has a higher price than the row that preceded it. 
Module 17
Summary
This simple tutorial has explained the main keywords in the MATCH_RECOGNIZE clause. There is a lot more information about these keywords in the related tutorials that I have posted on this site.

Additional Information

Database on OTN SQL and PL/SQL Discussion forums
Oracle Database
Download Oracle Database

Introduction to MATCH_RECOGNIZE

Overview of SQL Pattern Matching

Part 1: Create my dataset

Creating our ticker data set

Viewing my data

Quick review of our ticker data

Part 2: Your first simple pattern

How to group your data

Correct number of rows?

Using ALWAYS TRUE events

Checking results

MATCH_NUMBER and partitions

Part 3: Searching for V-Shaped Patterns

What do PARTITION BY + ORDER BY do?

What does the keyword MEASURES do?

How to control the output

What does AFTER MATCH SKIP do?

What does the keyword PATTERN do?

What does the keyword DEFINE do?

Summary

Links

Additional Information