MATCH_RECOGNIZE - Using Built-In Measures

View All Tutorials Login to Run Tutorial

Tutorial MATCH_RECOGNIZE - Using Built-In Measures
Description In this tutorial we will review the two built-in measures that are part of MATCH_RECOGNIZE. These measures are designed to help you understand how your data set is mapped to the pattern that you have defined.
Tags MATCH_RECOGNZE, pattern matching
Area SQL Analytics
Contributor Keith Laker (Oracle)
Created Wednesday October 12, 2016
Modules 14

Module 1
Overview
In this tutorial we will review the two built-in measures that are part of MATCH_RECOGNIZE:
- MATCH_NMUBER()
- CLASSIFIER()
These measures are designed to help you understand how your data set is mapped to the pattern that you have defined.

These two functions will typically only be used within the MEASURE clause. Interestingly, it is possible to use both these functions within the DEFINE clause but it is doubtful, at least in my mind, as to whether this would ever be useful.
Module 2
What does MATCH_NUMBER() do?
To get you started here is a quick definition of MATCH_NUMBER()

You might have a large number of matches for your pattern inside a given row partition. How do you tell these matches apart? This is done with the MATCH_NUMBER() function. Matches within a row pattern partition are numbered sequentially starting with 1 in the order they are found. This numbering starts at 1 within each row pattern partition, because there is no linked ordering across row pattern partitions.
Module 3
What does CLASSIFIER() do?
To get you started here is a quick definition of CLASSIFIER(

Along with knowing which MATCH_NUMBER you are seeing, you may want to know which component of a pattern applies to a specific row. This is done using the CLASSIFIER() function. The classifier of a row is the pattern variable that the row is mapped to by a row pattern match. The CLASSIFIER() function returns a character string whose value is the name of the pattern variable defined within the PATTERN clause.
Module 4
Reset my tutorial environment
Just in case...

If at anytime you need to restart this tutorial then simply run the following statement to reset your environment
```
DROP TABLE ticker PURGE;
```

Module 5

Create my dataset

Creating our ticker data set

First step is to setup our data table and then populate it with data

CREATE TABLE ticker (SYMBOL VARCHAR2(10), tstamp DATE, price NUMBER);

BEGIN
INSERT INTO ticker VALUES('ACME', '01-Apr-11', 12);
INSERT INTO ticker VALUES('ACME', '02-Apr-11', 17);
INSERT INTO ticker VALUES('ACME', '03-Apr-11', 19);
INSERT INTO ticker VALUES('ACME', '04-Apr-11', 21);
INSERT INTO ticker VALUES('ACME', '05-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '06-Apr-11', 12);
INSERT INTO ticker VALUES('ACME', '07-Apr-11', 15);
INSERT INTO ticker VALUES('ACME', '08-Apr-11', 20);
INSERT INTO ticker VALUES('ACME', '09-Apr-11', 24);
INSERT INTO ticker VALUES('ACME', '10-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '11-Apr-11', 19);
INSERT INTO ticker VALUES('ACME', '12-Apr-11', 15);
INSERT INTO ticker VALUES('ACME', '13-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '14-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '15-Apr-11', 14);
INSERT INTO ticker VALUES('ACME', '16-Apr-11', 12);
INSERT INTO ticker VALUES('ACME', '17-Apr-11', 14);
INSERT INTO ticker VALUES('ACME', '18-Apr-11', 24);
INSERT INTO ticker VALUES('ACME', '19-Apr-11', 23);
INSERT INTO ticker VALUES('ACME', '20-Apr-11', 22);

INSERT INTO ticker VALUES('GLOBEX', '01-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '02-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '03-Apr-11', 13);
INSERT INTO ticker VALUES('GLOBEX', '04-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '05-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '06-Apr-11', 10);
INSERT INTO ticker VALUES('GLOBEX', '07-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '08-Apr-11', 8);
INSERT INTO ticker VALUES('GLOBEX', '09-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '10-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '11-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '12-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '13-Apr-11', 10);
INSERT INTO ticker VALUES('GLOBEX', '14-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '15-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '16-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '17-Apr-11', 8);
INSERT INTO ticker VALUES('GLOBEX', '18-Apr-11', 7);
INSERT INTO ticker VALUES('GLOBEX', '19-Apr-11', 5);
INSERT INTO ticker VALUES('GLOBEX', '20-Apr-11', 3);

INSERT INTO ticker VALUES('OSCORP', '01-Apr-11', 22);
INSERT INTO ticker VALUES('OSCORP', '02-Apr-11', 22);
INSERT INTO ticker VALUES('OSCORP', '03-Apr-11', 19);
INSERT INTO ticker VALUES('OSCORP', '04-Apr-11', 18);
INSERT INTO ticker VALUES('OSCORP', '05-Apr-11', 17);
INSERT INTO ticker VALUES('OSCORP', '06-Apr-11', 20);
INSERT INTO ticker VALUES('OSCORP', '07-Apr-11', 17);
INSERT INTO ticker VALUES('OSCORP', '08-Apr-11', 20);
INSERT INTO ticker VALUES('OSCORP', '09-Apr-11', 16);
INSERT INTO ticker VALUES('OSCORP', '10-Apr-11', 15);
INSERT INTO ticker VALUES('OSCORP', '11-Apr-11', 15);
INSERT INTO ticker VALUES('OSCORP', '12-Apr-11', 12);
INSERT INTO ticker VALUES('OSCORP', '13-Apr-11', 11);
INSERT INTO ticker VALUES('OSCORP', '14-Apr-11', 15);
INSERT INTO ticker VALUES('OSCORP', '15-Apr-11', 12);
INSERT INTO ticker VALUES('OSCORP', '16-Apr-11', 16);
INSERT INTO ticker VALUES('OSCORP', '17-Apr-11', 14);
INSERT INTO ticker VALUES('OSCORP', '18-Apr-11', 12);
INSERT INTO ticker VALUES('OSCORP', '19-Apr-11', 11);
INSERT INTO ticker VALUES('OSCORP', '20-Apr-11', 9);

commit;

END;

Module 6
Viewing my data
Quick review of our ticker data

Now let's check to see how many rows are in our dataset
```
SELECT count(*) FROM ticker;
SELECT symbol, min(tstamp), max(tstamp), count(*) FROM ticker GROUP BY symbol;
```
You should have 60 rows of data spread across three symbols (ACME, GLOBEX, OSCORP) with 20 rows of data for each ticker symbol. Our ticker data for each symbol starts on April 1 and ends on April 20.

You can view the full data set using the following code:
```
SELECT * FROM ticker ORDER BY symbol, tstamp;
```

Module 7

Finding W-Patterns

First let’s start by using our normal stock ticker data set and searching for W-shaped patterns. Here is the code:

SELECT symbol, first_x, last_z 
FROM ticker
MATCH_RECOGNIZE (
  PARTITION BY symbol ORDER BY tstamp
  MEASURES FIRST(x.tstamp) AS first_x,
           LAST(z.tstamp) AS last_z
  ONE ROW PER MATCH
  PATTERN (X+ Y+ W+ Z+)
  DEFINE X AS (price < PREV(price)),
         Y AS (price > PREV(price)), 
         W AS (price < PREV(price)),
         Z AS (price > PREV(price)))
WHERE symbol='OSCORP';

For the symbol OSCORP we have matched two W-shapes: the first starts on 03-APR-11 and ends on 08-APR-11 and the second starts on 12-APR-11 and ends on 16-APR-11.

Module 8
Getting detailed feedback
Line-by-line analysis of how MATCH_RECOGNIZE is working

Let's switch to displaying the more detailed report so that we can understand how the pattern is being applied to our ticker data set. First let's use the MATCH_NUMBER() function to see how many occurrences of our W-pattern are contained within the data set for OSCORP:
```
SELECT symbol, tstamp, price, mn, first_x, last_z 
FROM ticker
MATCH_RECOGNIZE (
  PARTITION BY symbol ORDER BY tstamp
  MEASURES FIRST(x.tstamp) AS first_x,
           LAST(z.tstamp) AS last_z,
           MATCH_NUMBER() AS mn
  ALL ROWS PER MATCH OMIT EMPTY MATCHES
  PATTERN (X+ Y+ W+ Z+)
  DEFINE X AS (price < PREV(price)),
         Y AS (price > PREV(price)), 
         W AS (price < PREV(price)),
         Z AS (price > PREV(price)))
WHERE symbol='OSCORP';
```
What you should spot is that the MATCH_NUMBER() column, mn, contains a contiguous set of numbers. Our first W-shape which starts on 03-Apr-11 and the second W-shape starts on 12-Apr-11
Module 9
MATCH_NUMBER and Partition Boundaries
What happens to the MATCH_NUMBER counter across partition boundaries

Using the code below we can see how MATCH_NUMBER() functions across the partition boundary, defined as the column symbol.
```
SELECT symbol, tstamp, price, mn, first_x, last_z 
FROM ticker
MATCH_RECOGNIZE (
  PARTITION BY symbol ORDER BY tstamp
  MEASURES FIRST(x.tstamp) AS first_x,
           LAST(z.tstamp) AS last_z,
           MATCH_NUMBER() AS mn
  ALL ROWS PER MATCH OMIT EMPTY MATCHES
  PATTERN (X+ Y+ W+ Z+)
  DEFINE X AS (price < PREV(price)),
         Y AS (price > PREV(price)), 
         W AS (price < PREV(price)),
         Z AS (price > PREV(price)));
```
Note that we have removed the WHERE clause so that you can see the how the MATCH_NUMBER() sequencing is handled across partition boundaries - i.e. you should note that the match numbering starts at 1 within each symbol-based partition. For symbol ‘ACME’ we have one matched one W-shaped pattern and within the OSCORP partition we matched two W-shapes.

This shows that the MTACH_NUMBER() functions works independently within each partition.

Module 10

Non-contiguous set of matches?

Is it possible to generate a non-contiguous set of MATCH_NUMBER()s?

Simple answer is yes - if the results contains empty matches (there is a separate tutorial on this topic if you are unsure about the meaning of this specific term)

Let's change the details of our pattern to search for zero or more matches by using the asterisk (*) quantifier

SELECT symbol, tstamp, price, mn, first_x, last_z 
FROM ticker
MATCH_RECOGNIZE (
  PARTITION BY symbol ORDER BY tstamp
  MEASURES FIRST(x.tstamp) AS first_x,
           LAST(z.tstamp) AS last_z,
           MATCH_NUMBER() AS mn
  ALL ROWS PER MATCH OMIT EMPTY MATCHES
  PATTERN (X* Y* W* Z*)
  DEFINE X AS (price < PREV(price)),
         Y AS (price > PREV(price)), 
         W AS (price < PREV(price)),
         Z AS (price > PREV(price)))
WHERE symbol='OSCORP';

We can now see that our results shows a non-contiguous sequence of matches returned by the MATCH_NUMBER() function. The reason for this is that we have excluded the empty matches (generated because we are using the asterisk (*) quantifier) by using the keywords OMIT EMPTY MATCHES

If we remove these keywords then we can see the full resultset:

SELECT symbol, tstamp, price, mn, first_x, last_z 
FROM ticker
MATCH_RECOGNIZE (
  PARTITION BY symbol ORDER BY tstamp
  MEASURES FIRST(x.tstamp) AS first_x,
           LAST(z.tstamp) AS last_z,
           MATCH_NUMBER() AS mn
  ALL ROWS PER MATCH
  PATTERN (X* Y* W* Z*)
  DEFINE X AS (price < PREV(price)),
         Y AS (price > PREV(price)), 
         W AS (price < PREV(price)),
         Z AS (price > PREV(price)))
WHERE symbol='OSCORP';

These unmatched rows still increments the MATCH_NUMBER() counter which is why in our previous example we saw the column mn start at the value 3.

Module 11

Using the CLASSIFIER() function

Matching each row to its respective pattern variable

The CLASSIFIER() function allows us to see which pattern variable is matched to each row

SELECT symbol, tstamp, price, mn, p_var, first_x, last_z 
FROM ticker
MATCH_RECOGNIZE (
  PARTITION BY symbol ORDER BY tstamp
  MEASURES FIRST(x.tstamp) AS first_x,
           LAST(z.tstamp) AS last_z,
           MATCH_NUMBER() AS mn,
           CLASSIFIER() AS p_var
  ALL ROWS PER MATCH
  PATTERN (X+ Y+ W+ Z+)
  DEFINE X AS (price < PREV(price)),
         Y AS (price > PREV(price)), 
         W AS (price < PREV(price)),
         Z AS (price > PREV(price)))
WHERE symbol='OSCORP';

Here we can see that pattern variable X (where the price is declining) is matched to the first three rows in our result set, followed by one occurrence of Y, one occurrence of W and finally one occurrence of Z.

Module 12
CLASSIFIER with ONE ROW PER MATCH
What happens is you combine ONE ROW PER MATCH and CLASSIFIER()?

It is possible to use the CLASSIFIER() function in conjunction with the keywords ONE ROW PER MATCH, however, the results are not very useful as you an see...
```
SELECT symbol, 
  tstamp,
  first_price,
  last_price,
  match_number,
  classifier,
  first_x, 
  last_z 
FROM ticker
MATCH_RECOGNIZE (
  PARTITION BY symbol ORDER BY tstamp
  MEASURES FIRST(tstamp) as tstamp,
           FIRST(x.tstamp) AS first_x,
           LAST(z.tstamp) AS last_z,
           MATCH_NUMBER() AS match_number,
           CLASSIFIER() AS classifier,
           FIRST(x.price) as first_price,
           LAST(z.price) as last_price 
  ONE ROW PER MATCH
  PATTERN (X+ Y+ W+ Z+)
  DEFINE X AS (price <= PREV(price)),
         Y AS (price >= PREV(price)), 
         W AS (price <= PREV(price)),
         Z AS (price >= PREV(price)))
WHERE symbol='OSCORP';
```
You should notice that the value returned by the CLASSIFIER() function is now truncated to the last variable listed in the DEFINE clause, which in this case is the pattern variable “Z”.

Therefore, when using CLASSIFIER() function to check how the pattern is being applied you should use one of the ALL ROWS PER MATCH options which allows you to see which rows are assigned to which pattern variable. The real benefit of these functions will become obvious if you look at the tutorial on greedy vs. reluctant quantifiers.

Note that even with the ONE ROW PER MATCH syntax the MATCH_NUMBER() function continues to operate as before.

Module 13

A few restrictions

There are a few restrictions on the use of these built-in measures which are related to wrapping other aggregate functions around them within the MEASURE clause.

For example, extending the example above where we are using ONE ROW PER MATCH...if we wanted to pull-out the first and last instance of the pattern variables we could try the following:

SELECT symbol, 
  tstamp,
  first_price,
  last_price,
  match_number,
  f_c,
  l_c,
  first_x, 
  last_z 
FROM ticker
MATCH_RECOGNIZE (
  PARTITION BY symbol ORDER BY tstamp
  MEASURES FIRST(tstamp) as tstamp,
           FIRST(x.tstamp) AS first_x,
           LAST(z.tstamp) AS last_z,
           MATCH_NUMBER() AS match_number,
           FIRST(CLASSIFIER()) AS f_c,
           LAST(CLASSIFIER()) AS l_c,
           FIRST(x.price) as first_price,
           LAST(z.price) as last_price 
  ONE ROW PER MATCH
  PATTERN (X+ Y+ W+ Z+)
  DEFINE X AS (price <= PREV(price)),
         Y AS (price >= PREV(price)), 
         W AS (price <= PREV(price)),
         Z AS (price >= PREV(price)))
WHERE symbol='OSCORP';

But as you can see, by wrapping FIRST() and LAST() functions around the CLASSIFIER() function generates a SQL error:

ORA-62507: illegal use of MATCH_NUMBER or CLASSIFIER in MATCH_RECOGNIZE clause

The same is true if we use the MAX and MIN functions:

SELECT symbol, 
  tstamp,
  first_price,
  last_price,
  match_number,
  f_c,
  l_c,
  first_x, 
  last_z 
FROM ticker
MATCH_RECOGNIZE (
  PARTITION BY symbol ORDER BY tstamp
  MEASURES FIRST(tstamp) as tstamp,
           FIRST(x.tstamp) AS first_x,
           LAST(z.tstamp) AS last_z,
           MATCH_NUMBER() AS match_number,
           MIN(CLASSIFIER()) AS f_c,
           MAX(CLASSIFIER()) AS l_c,
           FIRST(x.price) as first_price,
           LAST(z.price) as last_price 
  ONE ROW PER MATCH
  PATTERN (X+ Y+ W+ Z+)
  DEFINE X AS (price <= PREV(price)),
         Y AS (price >= PREV(price)), 
         W AS (price <= PREV(price)),
         Z AS (price >= PREV(price)))
WHERE symbol='OSCORP';

Module 14
Summary
The two built-in functions MATCH_NUMBER() and CLASSIFIER() functions are part of MATCH_RECOGNIZE. They allow you to check how your pattern is being matched to your data set. They are really useful debugging tools when working with MATCH_RECOGNIZE but don't forget that you can also use them within your result set.

Additional Information

Database on OTN SQL and PL/SQL Discussion forums
Oracle Database
Download Oracle Database

MATCH_RECOGNIZE - Using Built-In Measures

Overview

What does MATCH_NUMBER() do?

What does CLASSIFIER() do?

Reset my tutorial environment

Just in case...

Create my dataset

Creating our ticker data set

Viewing my data

Quick review of our ticker data

Finding W-Patterns

Getting detailed feedback

Line-by-line analysis of how MATCH_RECOGNIZE is working

MATCH_NUMBER and Partition Boundaries

What happens to the MATCH_NUMBER counter across partition boundaries

Non-contiguous set of matches?

Is it possible to generate a non-contiguous set of MATCH_NUMBER()s?

Using the CLASSIFIER() function

Matching each row to its respective pattern variable

CLASSIFIER with ONE ROW PER MATCH

What happens is you combine ONE ROW PER MATCH and CLASSIFIER()?

A few restrictions

Summary

Links

Additional Information