- Tutorial MATCH_RECOGNIZE - Using Built-In Measures
- Description In this tutorial we will review the two built-in measures that are part of MATCH_RECOGNIZE. These measures are designed to help you understand how your data set is mapped to the pattern that you have defined.
- Tags MATCH_RECOGNZE, pattern matching
- Area SQL Analytics
- Contributor Keith Laker (Oracle)
- Created Wednesday October 12, 2016
- Modules 14

## Overview

In this tutorial we will review the two built-in measures that are part of MATCH_RECOGNIZE:

- MATCH_NMUBER()
- CLASSIFIER()

These measures are designed to help you understand how your data set is mapped to the pattern that you have defined.

These two functions will typically only be used within the MEASURE clause. Interestingly, it is possible to use both these functions within the DEFINE clause but it is doubtful, at least in my mind, as to whether this would ever be useful.

## What does MATCH_NUMBER() do?

To get you started here is a quick definition of MATCH_NUMBER()

You might have a large number of matches for your pattern inside a given row partition. How do you tell these matches apart? This is done with the MATCH_NUMBER() function. Matches within a row pattern partition are numbered sequentially starting with 1 in the order they are found. This numbering starts at 1 within each row pattern partition, because there is no linked ordering across row pattern partitions.

## What does CLASSIFIER() do?

To get you started here is a quick definition of CLASSIFIER(

Along with knowing which MATCH_NUMBER you are seeing, you may want to know which component of a pattern applies to a specific row. This is done using the CLASSIFIER() function. The classifier of a row is the pattern variable that the row is mapped to by a row pattern match. The CLASSIFIER() function returns a character string whose value is the name of the pattern variable defined within the PATTERN clause.

## Create my dataset

### Creating our ticker data set

First step is to setup our data table and then populate it with data

`CREATE TABLE ticker (SYMBOL VARCHAR2(10), tstamp DATE, price NUMBER); BEGIN INSERT INTO ticker VALUES('ACME', '01-Apr-11', 12); INSERT INTO ticker VALUES('ACME', '02-Apr-11', 17); INSERT INTO ticker VALUES('ACME', '03-Apr-11', 19); INSERT INTO ticker VALUES('ACME', '04-Apr-11', 21); INSERT INTO ticker VALUES('ACME', '05-Apr-11', 25); INSERT INTO ticker VALUES('ACME', '06-Apr-11', 12); INSERT INTO ticker VALUES('ACME', '07-Apr-11', 15); INSERT INTO ticker VALUES('ACME', '08-Apr-11', 20); INSERT INTO ticker VALUES('ACME', '09-Apr-11', 24); INSERT INTO ticker VALUES('ACME', '10-Apr-11', 25); INSERT INTO ticker VALUES('ACME', '11-Apr-11', 19); INSERT INTO ticker VALUES('ACME', '12-Apr-11', 15); INSERT INTO ticker VALUES('ACME', '13-Apr-11', 25); INSERT INTO ticker VALUES('ACME', '14-Apr-11', 25); INSERT INTO ticker VALUES('ACME', '15-Apr-11', 14); INSERT INTO ticker VALUES('ACME', '16-Apr-11', 12); INSERT INTO ticker VALUES('ACME', '17-Apr-11', 14); INSERT INTO ticker VALUES('ACME', '18-Apr-11', 24); INSERT INTO ticker VALUES('ACME', '19-Apr-11', 23); INSERT INTO ticker VALUES('ACME', '20-Apr-11', 22); INSERT INTO ticker VALUES('GLOBEX', '01-Apr-11', 11); INSERT INTO ticker VALUES('GLOBEX', '02-Apr-11', 12); INSERT INTO ticker VALUES('GLOBEX', '03-Apr-11', 13); INSERT INTO ticker VALUES('GLOBEX', '04-Apr-11', 12); INSERT INTO ticker VALUES('GLOBEX', '05-Apr-11', 11); INSERT INTO ticker VALUES('GLOBEX', '06-Apr-11', 10); INSERT INTO ticker VALUES('GLOBEX', '07-Apr-11', 9); INSERT INTO ticker VALUES('GLOBEX', '08-Apr-11', 8); INSERT INTO ticker VALUES('GLOBEX', '09-Apr-11', 9); INSERT INTO ticker VALUES('GLOBEX', '10-Apr-11', 9); INSERT INTO ticker VALUES('GLOBEX', '11-Apr-11', 9); INSERT INTO ticker VALUES('GLOBEX', '12-Apr-11', 9); INSERT INTO ticker VALUES('GLOBEX', '13-Apr-11', 10); INSERT INTO ticker VALUES('GLOBEX', '14-Apr-11', 11); INSERT INTO ticker VALUES('GLOBEX', '15-Apr-11', 12); INSERT INTO ticker VALUES('GLOBEX', '16-Apr-11', 11); INSERT INTO ticker VALUES('GLOBEX', '17-Apr-11', 8); INSERT INTO ticker VALUES('GLOBEX', '18-Apr-11', 7); INSERT INTO ticker VALUES('GLOBEX', '19-Apr-11', 5); INSERT INTO ticker VALUES('GLOBEX', '20-Apr-11', 3); INSERT INTO ticker VALUES('OSCORP', '01-Apr-11', 22); INSERT INTO ticker VALUES('OSCORP', '02-Apr-11', 22); INSERT INTO ticker VALUES('OSCORP', '03-Apr-11', 19); INSERT INTO ticker VALUES('OSCORP', '04-Apr-11', 18); INSERT INTO ticker VALUES('OSCORP', '05-Apr-11', 17); INSERT INTO ticker VALUES('OSCORP', '06-Apr-11', 20); INSERT INTO ticker VALUES('OSCORP', '07-Apr-11', 17); INSERT INTO ticker VALUES('OSCORP', '08-Apr-11', 20); INSERT INTO ticker VALUES('OSCORP', '09-Apr-11', 16); INSERT INTO ticker VALUES('OSCORP', '10-Apr-11', 15); INSERT INTO ticker VALUES('OSCORP', '11-Apr-11', 15); INSERT INTO ticker VALUES('OSCORP', '12-Apr-11', 12); INSERT INTO ticker VALUES('OSCORP', '13-Apr-11', 11); INSERT INTO ticker VALUES('OSCORP', '14-Apr-11', 15); INSERT INTO ticker VALUES('OSCORP', '15-Apr-11', 12); INSERT INTO ticker VALUES('OSCORP', '16-Apr-11', 16); INSERT INTO ticker VALUES('OSCORP', '17-Apr-11', 14); INSERT INTO ticker VALUES('OSCORP', '18-Apr-11', 12); INSERT INTO ticker VALUES('OSCORP', '19-Apr-11', 11); INSERT INTO ticker VALUES('OSCORP', '20-Apr-11', 9); commit; END;`

## Viewing my data

### Quick review of our ticker data

Now let's check to see how many rows are in our dataset

`SELECT count(*) FROM ticker; SELECT symbol, min(tstamp), max(tstamp), count(*) FROM ticker GROUP BY symbol;`

You should have 60 rows of data spread across three symbols (ACME, GLOBEX, OSCORP) with 20 rows of data for each ticker symbol. Our ticker data for each symbol starts on April 1 and ends on April 20.

You can view the full data set using the following code:

`SELECT * FROM ticker ORDER BY symbol, tstamp;`

## Finding W-Patterns

First let’s start by using our normal stock ticker data set and searching for W-shaped patterns. Here is the code:

`SELECT symbol, first_x, last_z FROM ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(x.tstamp) AS first_x, LAST(z.tstamp) AS last_z ONE ROW PER MATCH PATTERN (X+ Y+ W+ Z+) DEFINE X AS (price < PREV(price)), Y AS (price > PREV(price)), W AS (price < PREV(price)), Z AS (price > PREV(price))) WHERE symbol='OSCORP';`

For the symbol OSCORP we have matched two W-shapes: the first starts on 03-APR-11 and ends on 08-APR-11 and the second starts on 12-APR-11 and ends on 16-APR-11.

## Getting detailed feedback

### Line-by-line analysis of how MATCH_RECOGNIZE is working

Let's switch to displaying the more detailed report so that we can understand how the pattern is being applied to our ticker data set. First let's use the MATCH_NUMBER() function to see how many occurrences of our W-pattern are contained within the data set for OSCORP:

`SELECT symbol, tstamp, price, mn, first_x, last_z FROM ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(x.tstamp) AS first_x, LAST(z.tstamp) AS last_z, MATCH_NUMBER() AS mn ALL ROWS PER MATCH OMIT EMPTY MATCHES PATTERN (X+ Y+ W+ Z+) DEFINE X AS (price < PREV(price)), Y AS (price > PREV(price)), W AS (price < PREV(price)), Z AS (price > PREV(price))) WHERE symbol='OSCORP';`

What you should spot is that the MATCH_NUMBER() column, mn, contains a contiguous set of numbers. Our first W-shape which starts on 03-Apr-11 and the second W-shape starts on 12-Apr-11

## MATCH_NUMBER and Partition Boundaries

### What happens to the MATCH_NUMBER counter across partition boundaries

Using the code below we can see how MATCH_NUMBER() functions across the partition boundary, defined as the column symbol.

`SELECT symbol, tstamp, price, mn, first_x, last_z FROM ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(x.tstamp) AS first_x, LAST(z.tstamp) AS last_z, MATCH_NUMBER() AS mn ALL ROWS PER MATCH OMIT EMPTY MATCHES PATTERN (X+ Y+ W+ Z+) DEFINE X AS (price < PREV(price)), Y AS (price > PREV(price)), W AS (price < PREV(price)), Z AS (price > PREV(price)));`

Note that we have removed the WHERE clause so that you can see the how the MATCH_NUMBER() sequencing is handled across partition boundaries - i.e. you should note that the match numbering starts at 1 within each symbol-based partition. For symbol ‘ACME’ we have one matched one W-shaped pattern and within the OSCORP partition we matched two W-shapes.

This shows that the MTACH_NUMBER() functions works independently within each partition.

## Non-contiguous set of matches?

### Is it possible to generate a non-contiguous set of MATCH_NUMBER()s?

Simple answer is yes - if the results contains

*empty matches*(there is a separate tutorial on this topic if you are unsure about the meaning of this specific term)Let's change the details of our pattern to search for zero or more matches by using the asterisk (*) quantifier

`SELECT symbol, tstamp, price, mn, first_x, last_z FROM ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(x.tstamp) AS first_x, LAST(z.tstamp) AS last_z, MATCH_NUMBER() AS mn ALL ROWS PER MATCH OMIT EMPTY MATCHES PATTERN (X* Y* W* Z*) DEFINE X AS (price < PREV(price)), Y AS (price > PREV(price)), W AS (price < PREV(price)), Z AS (price > PREV(price))) WHERE symbol='OSCORP';`

We can now see that our results shows a non-contiguous sequence of matches returned by the MATCH_NUMBER() function. The reason for this is that we have excluded the empty matches (generated because we are using the asterisk (*) quantifier) by using the keywords

**OMIT EMPTY MATCHES**If we remove these keywords then we can see the full resultset:

`SELECT symbol, tstamp, price, mn, first_x, last_z FROM ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(x.tstamp) AS first_x, LAST(z.tstamp) AS last_z, MATCH_NUMBER() AS mn ALL ROWS PER MATCH PATTERN (X* Y* W* Z*) DEFINE X AS (price < PREV(price)), Y AS (price > PREV(price)), W AS (price < PREV(price)), Z AS (price > PREV(price))) WHERE symbol='OSCORP';`

These unmatched rows still increments the MATCH_NUMBER() counter which is why in our previous example we saw the column mn start at the value 3.

## Using the CLASSIFIER() function

### Matching each row to its respective pattern variable

The CLASSIFIER() function allows us to see which pattern variable is matched to each row

`SELECT symbol, tstamp, price, mn, p_var, first_x, last_z FROM ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(x.tstamp) AS first_x, LAST(z.tstamp) AS last_z, MATCH_NUMBER() AS mn, CLASSIFIER() AS p_var ALL ROWS PER MATCH PATTERN (X+ Y+ W+ Z+) DEFINE X AS (price < PREV(price)), Y AS (price > PREV(price)), W AS (price < PREV(price)), Z AS (price > PREV(price))) WHERE symbol='OSCORP';`

Here we can see that pattern variable X (where the price is declining) is matched to the first three rows in our result set, followed by one occurrence of Y, one occurrence of W and finally one occurrence of Z.

## CLASSIFIER with ONE ROW PER MATCH

### What happens is you combine ONE ROW PER MATCH and CLASSIFIER()?

It is possible to use the CLASSIFIER() function in conjunction with the keywords ONE ROW PER MATCH, however, the results are not very useful as you an see...

`SELECT symbol, tstamp, first_price, last_price, match_number, classifier, first_x, last_z FROM ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(tstamp) as tstamp, FIRST(x.tstamp) AS first_x, LAST(z.tstamp) AS last_z, MATCH_NUMBER() AS match_number, CLASSIFIER() AS classifier, FIRST(x.price) as first_price, LAST(z.price) as last_price ONE ROW PER MATCH PATTERN (X+ Y+ W+ Z+) DEFINE X AS (price <= PREV(price)), Y AS (price >= PREV(price)), W AS (price <= PREV(price)), Z AS (price >= PREV(price))) WHERE symbol='OSCORP';`

You should notice that the value returned by the CLASSIFIER() function is now truncated to the last variable listed in the DEFINE clause, which in this case is the pattern variable “Z”.

Therefore, when using CLASSIFIER() function to check how the pattern is being applied you should use one of the ALL ROWS PER MATCH options which allows you to see which rows are assigned to which pattern variable. The real benefit of these functions will become obvious if you look at the tutorial on greedy vs. reluctant quantifiers.

Note that even with the ONE ROW PER MATCH syntax the MATCH_NUMBER() function continues to operate as before.

## A few restrictions

There are a few restrictions on the use of these built-in measures which are related to wrapping other aggregate functions around them within the MEASURE clause.

For example, extending the example above where we are using ONE ROW PER MATCH...if we wanted to pull-out the first and last instance of the pattern variables we could try the following:

`SELECT symbol, tstamp, first_price, last_price, match_number, f_c, l_c, first_x, last_z FROM ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(tstamp) as tstamp, FIRST(x.tstamp) AS first_x, LAST(z.tstamp) AS last_z, MATCH_NUMBER() AS match_number, FIRST(CLASSIFIER()) AS f_c, LAST(CLASSIFIER()) AS l_c, FIRST(x.price) as first_price, LAST(z.price) as last_price ONE ROW PER MATCH PATTERN (X+ Y+ W+ Z+) DEFINE X AS (price <= PREV(price)), Y AS (price >= PREV(price)), W AS (price <= PREV(price)), Z AS (price >= PREV(price))) WHERE symbol='OSCORP';`

But as you can see, by wrapping FIRST() and LAST() functions around the CLASSIFIER() function generates a SQL error:

ORA-62507: illegal use of MATCH_NUMBER or CLASSIFIER in MATCH_RECOGNIZE clause

The same is true if we use the MAX and MIN functions:

`SELECT symbol, tstamp, first_price, last_price, match_number, f_c, l_c, first_x, last_z FROM ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(tstamp) as tstamp, FIRST(x.tstamp) AS first_x, LAST(z.tstamp) AS last_z, MATCH_NUMBER() AS match_number, MIN(CLASSIFIER()) AS f_c, MAX(CLASSIFIER()) AS l_c, FIRST(x.price) as first_price, LAST(z.price) as last_price ONE ROW PER MATCH PATTERN (X+ Y+ W+ Z+) DEFINE X AS (price <= PREV(price)), Y AS (price >= PREV(price)), W AS (price <= PREV(price)), Z AS (price >= PREV(price))) WHERE symbol='OSCORP';`

## Summary

The two built-in functions MATCH_NUMBER() and CLASSIFIER() functions are part of MATCH_RECOGNIZE. They allow you to check how your pattern is being matched to your data set. They are really useful debugging tools when working with MATCH_RECOGNIZE but don't forget that you can also use them within your result set.