CREATE TABLE ticker (SYMBOL VARCHAR2(10), tstamp DATE, price NUMBER);
BEGIN
INSERT INTO ticker VALUES('ACME', '01-Apr-11', 12);
INSERT INTO ticker VALUES('ACME', '02-Apr-11', 17);
INSERT INTO ticker VALUES('ACME', '03-Apr-11', 19);
INSERT INTO ticker VALUES('ACME', '04-Apr-11', 21);
INSERT INTO ticker VALUES('ACME', '05-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '06-Apr-11', 12);
INSERT INTO ticker VALUES('ACME', '07-Apr-11', 15);
INSERT INTO ticker VALUES('ACME', '08-Apr-11', 20);
INSERT INTO ticker VALUES('ACME', '09-Apr-11', 24);
INSERT INTO ticker VALUES('ACME', '10-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '11-Apr-11', 19);
INSERT INTO ticker VALUES('ACME', '12-Apr-11', 15);
INSERT INTO ticker VALUES('ACME', '13-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '14-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '15-Apr-11', 14);
INSERT INTO ticker VALUES('ACME', '16-Apr-11', 12);
INSERT INTO ticker VALUES('ACME', '17-Apr-11', 14);
INSERT INTO ticker VALUES('ACME', '18-Apr-11', 24);
INSERT INTO ticker VALUES('ACME', '19-Apr-11', 23);
INSERT INTO ticker VALUES('ACME', '20-Apr-11', 22);
INSERT INTO ticker VALUES('GLOBEX', '01-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '02-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '03-Apr-11', 13);
INSERT INTO ticker VALUES('GLOBEX', '04-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '05-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '06-Apr-11', 10);
INSERT INTO ticker VALUES('GLOBEX', '07-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '08-Apr-11', 8);
INSERT INTO ticker VALUES('GLOBEX', '09-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '10-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '11-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '12-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '13-Apr-11', 10);
INSERT INTO ticker VALUES('GLOBEX', '14-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '15-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '16-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '17-Apr-11', 8);
INSERT INTO ticker VALUES('GLOBEX', '18-Apr-11', 7);
INSERT INTO ticker VALUES('GLOBEX', '19-Apr-11', 5);
INSERT INTO ticker VALUES('GLOBEX', '20-Apr-11', 3);
INSERT INTO ticker VALUES('OSCORP', '01-Apr-11', 22);
INSERT INTO ticker VALUES('OSCORP', '02-Apr-11', 22);
INSERT INTO ticker VALUES('OSCORP', '03-Apr-11', 19);
INSERT INTO ticker VALUES('OSCORP', '04-Apr-11', 18);
INSERT INTO ticker VALUES('OSCORP', '05-Apr-11', 17);
INSERT INTO ticker VALUES('OSCORP', '06-Apr-11', 20);
INSERT INTO ticker VALUES('OSCORP', '07-Apr-11', 17);
INSERT INTO ticker VALUES('OSCORP', '08-Apr-11', 20);
INSERT INTO ticker VALUES('OSCORP', '09-Apr-11', 16);
INSERT INTO ticker VALUES('OSCORP', '10-Apr-11', 15);
INSERT INTO ticker VALUES('OSCORP', '11-Apr-11', 15);
INSERT INTO ticker VALUES('OSCORP', '12-Apr-11', 12);
INSERT INTO ticker VALUES('OSCORP', '13-Apr-11', 11);
INSERT INTO ticker VALUES('OSCORP', '14-Apr-11', 15);
INSERT INTO ticker VALUES('OSCORP', '15-Apr-11', 12);
INSERT INTO ticker VALUES('OSCORP', '16-Apr-11', 16);
INSERT INTO ticker VALUES('OSCORP', '17-Apr-11', 14);
INSERT INTO ticker VALUES('OSCORP', '18-Apr-11', 12);
INSERT INTO ticker VALUES('OSCORP', '19-Apr-11', 11);
INSERT INTO ticker VALUES('OSCORP', '20-Apr-11', 9);
commit;
END;
Recognizing patterns in a sequence of rows has been a capability that was widely desired, but not really possible with SQL until now. There were many workarounds, but these were difficult to write, hard to understand, and inefficient to execute.
The aim of this tutorial is to explain how to use the AFTER MATCH SKIP syntax which contains a lot of powerful features.
We use the AFTER MATCH SKIP clause to determine the precise point to resume row pattern matching after a non-empty match is found. If you don’t supply an AFTER MATCH SKIP clause then the default is AFTER MATCH SKIP PAST LAST ROW.
Of course there are quite a few options available:
- AFTER MATCH SKIP TO NEXT ROW - Resume pattern matching at the row after the first row of the current match.
- AFTER MATCH SKIP PAST LAST ROW - Resume pattern matching at the next row after the last row of the current match.
- AFTER MATCH SKIP TO FIRST pattern_variable - Resume pattern matching at the first row that is mapped to the pattern variable.
- AFTER MATCH SKIP TO LAST pattern_variable - Resume pattern matching at the last row that is mapped to the pattern variable.
- AFTER MATCH SKIP TO pattern_variable - The same as AFTER MATCH SKIP TO LAST pattern_variable.
During this tutorial we will explore how to use some these clauses
First step is to setup our data table and then populate it with data
CREATE TABLE ticker (SYMBOL VARCHAR2(10), tstamp DATE, price NUMBER);
BEGIN
INSERT INTO ticker VALUES('ACME', '01-Apr-11', 12);
INSERT INTO ticker VALUES('ACME', '02-Apr-11', 17);
INSERT INTO ticker VALUES('ACME', '03-Apr-11', 19);
INSERT INTO ticker VALUES('ACME', '04-Apr-11', 21);
INSERT INTO ticker VALUES('ACME', '05-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '06-Apr-11', 12);
INSERT INTO ticker VALUES('ACME', '07-Apr-11', 15);
INSERT INTO ticker VALUES('ACME', '08-Apr-11', 20);
INSERT INTO ticker VALUES('ACME', '09-Apr-11', 24);
INSERT INTO ticker VALUES('ACME', '10-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '11-Apr-11', 19);
INSERT INTO ticker VALUES('ACME', '12-Apr-11', 15);
INSERT INTO ticker VALUES('ACME', '13-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '14-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '15-Apr-11', 14);
INSERT INTO ticker VALUES('ACME', '16-Apr-11', 12);
INSERT INTO ticker VALUES('ACME', '17-Apr-11', 14);
INSERT INTO ticker VALUES('ACME', '18-Apr-11', 24);
INSERT INTO ticker VALUES('ACME', '19-Apr-11', 23);
INSERT INTO ticker VALUES('ACME', '20-Apr-11', 22);
INSERT INTO ticker VALUES('GLOBEX', '01-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '02-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '03-Apr-11', 13);
INSERT INTO ticker VALUES('GLOBEX', '04-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '05-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '06-Apr-11', 10);
INSERT INTO ticker VALUES('GLOBEX', '07-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '08-Apr-11', 8);
INSERT INTO ticker VALUES('GLOBEX', '09-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '10-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '11-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '12-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '13-Apr-11', 10);
INSERT INTO ticker VALUES('GLOBEX', '14-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '15-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '16-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '17-Apr-11', 8);
INSERT INTO ticker VALUES('GLOBEX', '18-Apr-11', 7);
INSERT INTO ticker VALUES('GLOBEX', '19-Apr-11', 5);
INSERT INTO ticker VALUES('GLOBEX', '20-Apr-11', 3);
INSERT INTO ticker VALUES('OSCORP', '01-Apr-11', 22);
INSERT INTO ticker VALUES('OSCORP', '02-Apr-11', 22);
INSERT INTO ticker VALUES('OSCORP', '03-Apr-11', 19);
INSERT INTO ticker VALUES('OSCORP', '04-Apr-11', 18);
INSERT INTO ticker VALUES('OSCORP', '05-Apr-11', 17);
INSERT INTO ticker VALUES('OSCORP', '06-Apr-11', 20);
INSERT INTO ticker VALUES('OSCORP', '07-Apr-11', 17);
INSERT INTO ticker VALUES('OSCORP', '08-Apr-11', 20);
INSERT INTO ticker VALUES('OSCORP', '09-Apr-11', 16);
INSERT INTO ticker VALUES('OSCORP', '10-Apr-11', 15);
INSERT INTO ticker VALUES('OSCORP', '11-Apr-11', 15);
INSERT INTO ticker VALUES('OSCORP', '12-Apr-11', 12);
INSERT INTO ticker VALUES('OSCORP', '13-Apr-11', 11);
INSERT INTO ticker VALUES('OSCORP', '14-Apr-11', 15);
INSERT INTO ticker VALUES('OSCORP', '15-Apr-11', 12);
INSERT INTO ticker VALUES('OSCORP', '16-Apr-11', 16);
INSERT INTO ticker VALUES('OSCORP', '17-Apr-11', 14);
INSERT INTO ticker VALUES('OSCORP', '18-Apr-11', 12);
INSERT INTO ticker VALUES('OSCORP', '19-Apr-11', 11);
INSERT INTO ticker VALUES('OSCORP', '20-Apr-11', 9);
commit;
END;
Now let's check to see how many rows are in our dataset
SELECT count(*) FROM ticker;
SELECT symbol, min(tstamp), max(tstamp), count(*) FROM ticker GROUP BY symbol;
You should have 60 rows of data spread across three symbols (ACME, GLOBEX, OSCORP) with 20 rows of data for each ticker symbol. Our ticker data for each symbol starts on April 1 and ends on April 20.
Looking at the source data for the sessionization example it’s clear that as we walk through the entries in the log file to check if an entry is part of the current session or not, there is no point in stepping backwards to begin searching again once a match has been found. The sessionization tutorial is a good example of when to use this type of SKIP operation.
See here https://livesql.oracle.com/apex/livesql/file/tutorial_C863BMBWSE9GK32X19LDPO2SF.html
Note that you can set the restart point to be linked to a specific pattern variable which allows you to work with overlapping patterns - i.e. where you are searching for “shapes” within your data set such as “W” shaped patterns within our ticker data stream. But what happens if the pattern variable within the SKIP TO clause is not matched? Let’s look at the following example:
SELECT *
FROM Ticker MATCH_RECOGNIZE (
PARTITION BY symbol
ORDER BY tstamp
MEASURES STRT.tstamp AS start_tstamp,
LAST(UP.tstamp) as end_tstamp,
MATCH_NUMBER() AS match_num,
CLASSIFIER() AS var_match
ALL ROWS PER MATCH
AFTER MATCH SKIP TO DOWN
PATTERN (STRT DOWN* UP)
DEFINE
DOWN AS DOWN.price < PREV(DOWN.price),
UP AS UP.price > PREV(UP.price)
) MR
WHERE symbol='ACME'
ORDER BY MR.symbol, MR.tstamp;
here we are stating that we need at least zero or more matches of the variable DOWN to occur and once a match has been found then we will resume the search for the next pattern at the DOWN event. With this pattern it is possible that DOWN will never get matched so the AFTER MATCH SKIP TO DOWN cannot happen even though a complete match for the pattern is found. Therefore, the compiler throws an error to let you know that this code will not work:
ORA-62514: AFTER MATCH SKIP TO variable is not bounded in the match found.
62514. 00000 - "AFTER MATCH SKIP TO variable is not bounded in the match found."
*Cause: AFTER MATCH SKIP TO variable was not bound in the match found due
to pattern operators such as |, *, ?, and so on.
*Action: Modify the query and retry the operation
As I previously stated, you might think the obvious position to start searching for the next occurrence of a pattern is the next record after the last row of the current match. But what if there are overlapping patterns where the middle of an earlier match overlaps with the start of the next match? For example if we are looking for a w-shaped pattern within our ticker data set then it is quite possible to have overlapping w-shapes where the next “W” starts within the second down phase of the previous ”W”.
Fortunately MATCH_RECOGNIZE provides great flexibility in terms of being able to specify the restart point. If we look at the source data for the ACME symbol within our ticker data set then we can see that there are overlapping W-shapes (assuming we allow for the flat-top in the middle of the 2nd w-shape by using the <= and >= tests for each pattern variable!).
Let’s use this example to explore the various AFTER MATCH SKIP TO options…starting with the default behaviour:
SELECT *
FROM Ticker MATCH_RECOGNIZE (
PARTITION BY symbol ORDER BY tstamp
MEASURES STRT.tstamp AS start_w,
LAST(z.tstamp) AS end_w
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW<
PATTERN (STRT x+ y+ w+ z+)
DEFINE
x AS x.price <= PREV(x.price),
y AS y.price >= PREV(y.price),
w AS w.price <= PREV(w.price),
z AS z.price >= PREV(z.price)
) MR
WHERE symbol='ACME'
ORDER BY symbol, MR.start_w;
returns only one match within the ACME data set.
Now let’s change the above code sample so that after the first pattern has been found we begin searching at the row after the end of the matching process for the Y variable - i.e. row 6, 10-Apr-11.
SELECT *
FROM Ticker MATCH_RECOGNIZE (
PARTITION BY symbol ORDER BY tstamp
MEASURES STRT.tstamp AS start_w,
LAST(z.tstamp) AS end_w
ONE ROW PER MATCH
AFTER MATCH SKIP TO LAST Y
PATTERN (STRT x+ y+ w+ z+)
DEFINE
x AS x.price <= PREV(x.price),
y AS y.price >= PREV(y.price),
w AS w.price <= PREV(w.price),
z AS z.price >= PREV(z.price)
) MR
WHERE symbol='ACME'
ORDER BY symbol,start_w;
which now finds two w-shapes with the second W starting on 10-Apr-11 and ending on 18-Apr-11.
but what is going on under-the-covers?
SELECT *
FROM Ticker MATCH_RECOGNIZE (
PARTITION BY symbol ORDER BY tstamp
MEASURES STRT.tstamp AS start_w,
LAST(z.tstamp) AS end_w,
classifier() AS pv,
match_number() AS mn,
count(*) as row_count
ALL ROWS PER MATCH
AFTER MATCH SKIP TO LAST Y
PATTERN (STRT x+ y+ w+ z+)
DEFINE
x AS x.price <= PREV(x.price),
y AS y.price >= PREV(y.price),
w AS w.price <= PREV(w.price),
z AS z.price >= PREV(z.price)
) MR
WHERE symbol='ACME'
ORDER BY symbol, mn, tstamp;
now shows us that the records for 10-Apr-11 to 14-Apr-11 were actually processed twice.
What about using the SKIP TO NEXT ROW syntax? How does that affect our results? It is important to remember that this will force MATCH_RECOGNIZE to resume pattern matching at the row after the first row of the current match. Using our ticker data we can see that this would actually increase the number of W-shapes to three!
In match 2 we have two occurrences of pattern variable x, therefore, once the second W-shape has been matched the search process restarts on row 12, i.e. the first row of the current match, which is row 12 mapped to STRT.
SELECT *
FROM Ticker MATCH_RECOGNIZE (
PARTITION BY symbol ORDER BY tstamp
MEASURES STRT.tstamp AS start_w,
LAST(z.tstamp) AS end_w
ONE ROW PER MATCH
AFTER MATCH SKIP TO NEXT ROW
PATTERN (STRT x+ y+ w+ z+)
DEFINE
x AS x.price <= PREV(x.price),
y AS y.price >= PREV(y.price),
w AS w.price <= PREV(w.price),
z AS z.price >= PREV(z.price)
) MR
WHERE symbol='ACME'
ORDER BY symbol, mr.start_w;
and if we change our code to return for this statement to show the more detailed report we can see how the pattern is actually being matched:
SELECT *
FROM Ticker MATCH_RECOGNIZE (
PARTITION BY symbol ORDER BY tstamp
MEASURES STRT.tstamp AS start_w,
LAST(z.tstamp) AS end_w,
classifier() AS pv,
match_number() AS mn,
count(*) as row_count
ALL ROWS PER MATCH
AFTER MATCH SKIP TO NEXT ROW
PATTERN (STRT x+ y+ w+ z+)
DEFINE
x AS x.price <= PREV(x.price),
y AS y.price >= PREV(y.price),
w AS w.price <= PREV(w.price),
z AS z.price >= PREV(z.price)
) MR
WHERE symbol='ACME'
ORDER BY symbol, mn, tstamp;
Note that match two, the 2nd W-shape, starts on line 11 but we began the search for this second match on row 2, i.e. the next row after the first start variable. Similarly, the search for the third W-shape started on row 12 after the second STRT variable.
Given that our original data set for ACME only contained 20 rows you can see from this example that it is in fact possible to do a lot more pattern discovery when you start to fully exploit the power of the AFTER MATCH SKIP syntax.
The AFTER MATCH SKIP clause determines the point at which we will resume searching for the next match after a non-empty match has been found. The default for the clause is AFTER MATCH SKIP PAST LAST ROW: resume pattern matching at the next row after the last row of the current match.
In most examples of using MATCH_RECOGNIZE you will notice that the AFTER MATCH clause is not present and the developer blindly assumes that the AFTER MATCH SKIP PAST LAST ROW clause is applied. This obviously does not help the next developer who has to amend the code to fit new business requirements.
Therefore, my recommendation is that you should always clearly state where you want the matching process to start searching for the next match. Never assume the default will behaviour will be good enough!
Hopefully this tutorial has explained the ways in which you can use the AFTER MATCH SKIP… clause to ensure that you capture all of the required patterns/shapes within your data set.
It’s always a good idea to explicitly include this clause because it is very important - if you don’t want to allow for overlapping matches then clearly state this in your code by using AFTER MATCH SKIP PAST LAST ROW clause.
Don’t assume the default will kick-in and that the next developer will have time to read all your detailed documentation when making the next round of changes to the code.