MATCH_RECOGNIZE - Fraud demo for OracleCODE events

View All Tutorials Login to Run Tutorial

Tutorial MATCH_RECOGNIZE - Fraud demo for OracleCODE events
Description This is a simple demo showing how to use SQL pattern matching for fraud analysis. It is part of a presentation for the OracleCODE events program for developers.
Tags PATTERN MATCHING, MATCH_RECOGNIZE,
Area SQL Analytics
Contributor Keith Laker (Oracle)
Created Thursday March 09, 2017
Modules 13

Module 1
Overview
In Database 12c there is a new SQL construct MATCH_RECOGNIZE for finding rows matching a pattern across a sequence of rows, using a regular expressions.

Pattern matching using SQL is based around four logical concepts:
- Logically partition and order the data/li>
- Define pattern using regular expression and pattern variables/li>
- Regular expression is matched against a sequence of rows/li>
- Each pattern variable is defined using conditions on rows and aggregates
Business Problem: Finding Suspicious Money Transfers

In this tutorial we want to explore how to use MATCH_RECOGNIZE to find potentially fraudulent transactions.

Let's define a suspicious money transfer pattern for an account as:
- 3 or more small (<2K) money transfers within 30 days
- Large transfer (>=1M) within 10 days of last small transfer
When we find this pattern we need to report the following information: account, date of first small transfer, date of last large transfer and the amount of the large transfer.
Module 2
Setup
First step is to create the JSON table that will hold our transaction log. The log will provide the details of the time, account/user id, type of event (deposit or transfer) and the transaction amount.

Oracle Database 12c supports storing JSON documents inside the database. Use following code to create a table to store the transaction log which is in JSON format
```
CREATE TABLE json_transactions 
(transaction_doc CLOB, 
 CONSTRAINT "VALID_JSON" CHECK (transaction_doc IS JSON) ENABLE
);
```

Module 3

Add data

Next step is to add some data to our JSON table using the normal JSON notation of key-value pairs.

INSERT INTO json_transactions VALUES ('{"time_id":"01-JAN-12","user_id":"John","event_id":"Deposit","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-12","user_id":"John","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"05-JAN-12","user_id":"John","event_id":"Withdrawal","trans_amount":2000}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-JAN-12","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-JAN-12","user_id":"John","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"25-JAN-12","user_id":"John","event_id":"Deposit","trans_amount":1200000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-JAN-12","user_id":"John","event_id":"Transfer","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-12","user_id":"John","event_id":"Deposit","trans_amount":500000}');
COMMIT;

Module 4

View transaction data

Using the new JSON SQL notation we can query our data using a simple SELECT statement

SELECT 
  j.transaction_doc.time_id as time_id, 
  j.transaction_doc.user_id as user_id,
  j.transaction_doc.event_id as event_id,   
  j.transaction_doc.trans_amount as amount
FROM json_transactions j;

Here is the list of transfer transactions that we are of interest for our analysis

SELECT 
       TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
       j.transaction_doc.user_id as user_id,
       j.transaction_doc.event_id as event_id,   
       to_number(j.transaction_doc.trans_amount) as amount
     FROM json_transactions j
     WHERE  j.transaction_doc.event_id = 'Transfer';

Module 5

Find fraudulent transactions

The input into MATCH_RECOGNIZE is a SELECT statement that returns the JSON data in a standard relational table format

SELECT user_id, first_t, last_t, big_amount 
FROM (SELECT 
       TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
       j.transaction_doc.user_id as user_id,
       j.transaction_doc.event_id as event_id,   
       j.transaction_doc.transfer_id as transfer_id,   
       TO_NUMBER(j.transaction_doc.trans_amount) as amount
     FROM json_transactions j
     WHERE  j.transaction_doc.event_id = 'Transfer')
MATCH_RECOGNIZE(
 PARTITION BY user_id 
 ORDER BY time_id
 MEASURES 
   FIRST(x.time_id) AS first_t, 
   LAST(y.time_id) AS last_t, 
   y.amount AS big_amount
 ONE ROW PER MATCH
 PATTERN (X{3,} Y)
 DEFINE
   X as (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y as (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10);

How does this statement breakdown?

Module 6
Breaking down the query
Set the PARTITION BY and ORDER BY clauses

Need to group and order the data to make the pattern “visible” within the sequence of rows...,therefore, we need to group and partition the data by each user_id and then sort the data within each partition by the transaction date.

Define output MEASURES

The MEASURES clause allows us to list the columns that will be returned. These can be existing columns from the input table/view and/or calculated columns. In this case we need to return the account/user id along with the value of the large amount and we need to calculate the date of first small transfer and the date of last large transfer

Report Type

For this query we only need a summary report so the ONE ROW PER MATCH clause is used.

PATTERN clause

We have two pattern variables: X and Y. In this example we are searching for three or more occurrences of X followed by a single occurrence of Y.
Describing each pattern variable

For a match on variable X we are searching for small transfers of amounts less than 2K and all three transfers must occur within a 30 day window

For a match on variable Y we are searching for a large transfer of more than 10K. The large transfer must occur within 10 days of last small transfer.
Module 7
Check pattern correctly applied
We can check if out pattern has found all the relevant matches by using two of the built-in measures:
- MATCH_NUMBER()
- CLASSIFIER
MATCH_NUMBER()

You might have a large number of matches for your pattern inside a given row partition. How do you tell these matches apart? This is done with the MATCH_NUMBER() function. Matches within a row pattern partition are numbered sequentially starting with 1 in the order they are found. This numbering starts at 1 within each row pattern partition, because there is no linked ordering across row pattern partitions.

CLASSIFIER()

Along with knowing which MATCH_NUMBER you are seeing, you may want to know which component of a pattern applies to a specific row. This is done using the CLASSIFIER() function. The classifier of a row is the pattern variable that the row is mapped to by a row pattern match. The CLASSIFIER() function returns a character string whose value is the name of the pattern variable defined within the PATTERN clause.

The last step we need to do is change the amount of information returned from summary to detailed. Therefore, we need to use ALL ROWS PER MATCH WITH UNMATCHED ROW - there is a separate livesql tutorial that covers this topic
```
SELECT user_id, time_id, amount, first_t, last_t, first_small_amount, last_small_amount, big_amount, mn, classifier 
FROM (SELECT 
       TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
       j.transaction_doc.user_id as user_id,
       j.transaction_doc.event_id as event_id,   
       j.transaction_doc.transfer_id as transfer_id,   
       TO_NUMBER(j.transaction_doc.trans_amount) as amount
     FROM json_transactions j
     WHERE  j.transaction_doc.event_id = 'Transfer')
MATCH_RECOGNIZE(
 PARTITION BY user_id 
 ORDER BY time_id
 MEASURES 
   FIRST(x.time_id) AS first_t, 
   LAST(y.time_id) AS last_t,
   FIRST(x.amount) AS first_small_amount,
   LAST(x.amount) AS last_small_amount,  
   y.amount AS big_amount,
   match_number() AS mn,
   classifier() AS classifier
 ALL ROWS PER MATCH WITH UNMATCHED ROWS
 PATTERN (X{3,} Y)
 DEFINE
   X as (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y as (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10); 
```
As you can now see, we have correctly identified the three small transfers on 02-Jan, 10-Jan and 20-Jan, the difference between 02-Jan and 20-Jan is less than 30 days. The large transfer is within our range of 10,000 or greater and the time difference between the last small transfer on Jan 20 and the large transfer on 27-Jan is less than 10 days.

Module 8

Changing the Requirements

Having found some fraudulent transfers the business has now come back with an additional requirement:

Check for transfers across different accounts
Total sum of small transfers must be less than 20K

The application developers have modified the JSON model to include the id of the person receiving the transfer so let's update our table with this new information:

TRUNCATE TABLE json_transactions;
INSERT INTO json_transactions VALUES ('{"time_id":"01-JAN-12","user_id":"John","event_id":"Deposit","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-12","user_id":"John","event_id":"Transfer","transfer_id":"John","trans_amount":100}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-12","user_id":"John","event_id":"Transfer","transfer_id":"Bob","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"05-JAN-12","user_id":"John","event_id":"Withdrawal","trans_amount":2000}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-JAN-12","user_id":"John","event_id":"Transfer","transfer_id":"Allen","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-JAN-12","user_id":"John","event_id":"Transfer","transfer_id":"Tim","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"25-JAN-12","user_id":"John","event_id":"Deposit","trans_amount":1200000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-JAN-12","user_id":"John","event_id":"Transfer","transfer_id":"Tim","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-12","user_id":"John","event_id":"Deposit","trans_amount":500000}');
COMMIT;

Module 9

Viewing new data set

As before we can use the new JSON SQL notation to query our data using a simple SELECT statement

SELECT 
  j.transaction_doc.time_id as time_id, 
  j.transaction_doc.user_id as user_id,
  j.transaction_doc.event_id as event_id,   
  j.transaction_doc.transfer_id as transfer_id,   
  j.transaction_doc.trans_amount as amount
FROM json_transactions j;

Here is the list of the transfer transactions that we are of interest for our analysis which includes the new column: transfer_id

SELECT 
 TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
 j.transaction_doc.user_id as user_id,
 j.transaction_doc.event_id as event_id,   
 j.transaction_doc.transfer_id as transfer_id,   
 to_number(j.transaction_doc.trans_amount) as amount
FROM json_transactions j
WHERE  j.transaction_doc.event_id = 'Transfer';

Module 10

Does previous query still run?

Now that we have modified our original JSON model to now include the id of the person receiving the transfer it is still possible to run the original pattern matching query which is based on the original business rules:

SELECT user_id, first_t, last_t, big_amount 
FROM (SELECT 
       TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
       j.transaction_doc.user_id as user_id,
       j.transaction_doc.event_id as event_id,   
       j.transaction_doc.transfer_id as transfer_id,   
       TO_NUMBER(j.transaction_doc.trans_amount) as amount
     FROM json_transactions j
     WHERE  j.transaction_doc.event_id = 'Transfer')
MATCH_RECOGNIZE(
 PARTITION BY user_id 
 ORDER BY time_id
 MEASURES 
   FIRST(x.time_id) AS first_t, 
   LAST(y.time_id) AS last_t, 
   y.amount AS big_amount
 ONE ROW PER MATCH
 PATTERN (X{3,} Y)
 DEFINE
   X as (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y as (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10);

As we can see, our original query does in fact still run even though we have added new fields to our JSON data.

Module 11

Using new business rules

Let's change the definitions of the two pattern variables to include the new business requirements:

1) Check for transfers across different accounts

PREV(transfer_id) <> transfer_id

2) Total sum of small transfers must be less than 20K

SUM(x.amount) < 20000)

SELECT user_id, first_t, last_t, big_amount 
FROM (SELECT 
       TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
       j.transaction_doc.user_id as user_id,
       j.transaction_doc.event_id as event_id,   
       j.transaction_doc.transfer_id as transfer_id,   
       TO_NUMBER(j.transaction_doc.trans_amount) as amount
     FROM json_transactions j
     WHERE  j.transaction_doc.event_id = 'Transfer')
MATCH_RECOGNIZE(
 PARTITION BY user_id 
 ORDER BY time_id
 MEASURES 
   FIRST(x.time_id) AS first_t, 
   LAST(y.time_id) AS last_t, 
   y.amount AS big_amount
 ONE ROW PER MATCH
 PATTERN (X{3,} Y)
 DEFINE
   X as (amount < 2000) AND 
         PREV(transfer_id) <> transfer_id AND
         LAST(time_id) - FIRST(time_id) < 30,
   Y as (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10 AND
         SUM(x.amount) < 20000);

Module 12

Extracting more value...

We can extract a lot more information from our pattern using some of the built-in functions. For example we can extract data for each of the three small transfers by using the FIRST() and LAST() functions. Notice that we can use the FIRST(value, x) syntax to extract information about the 2nd transfer.

Therefore, FIRST(x.transfer_id,1) will return the transfer_id for the second match.

SELECT user_id, first_t, amount_1, transfer_1, amount_2, transfer_2, amount_3, transfer_3, last_t, big_amount,transfer_4 
FROM (SELECT 
       TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
       j.transaction_doc.user_id as user_id,
       j.transaction_doc.event_id as event_id,   
       j.transaction_doc.transfer_id as transfer_id,   
       TO_NUMBER(j.transaction_doc.trans_amount) as amount
     FROM json_transactions j
     WHERE  j.transaction_doc.event_id = 'Transfer')
MATCH_RECOGNIZE(
 PARTITION BY user_id 
 ORDER BY time_id
 MEASURES 
   FIRST(x.time_id) AS first_t, 
   LAST(y.time_id) AS last_t, 
   FIRST(x.amount) AS amount_1,
   FIRST(x.transfer_id) AS transfer_1,
   FIRST(x.amount,1) AS amount_2,
   FIRST(x.transfer_id,1) AS transfer_2,
   LAST(x.amount) AS amount_3,
   LAST(x.transfer_id) AS transfer_3,
   y.amount AS big_amount,
   y.transfer_id AS transfer_4
 ONE ROW PER MATCH
 PATTERN (X{3,} Y)
 DEFINE
   X as (amount < 2000) AND 
         PREV(transfer_id) <> transfer_id AND
         LAST(time_id) - FIRST(time_id) < 30,
   Y as (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10 AND
         SUM(x.amount) < 20000);

Module 13
Summary
This tutorial has explained how to use the new JSON feature to store data inside the database and query it using familiar SQL syntax.

We have designed a fraud pattern matching SQL statement based on a set of business rules and then enhanced the pattern matching SQL statement in response to changing business rules.

Finally we have used some of the built-in measures to validate that our pattern matching statement is working correctly (using the MATCH_NUMBER() and CLASSIFIER() functions) along with the FIRST() and LAST() functions to retrieve specific data points within our pattern.

Additional Information

Database on OTN SQL and PL/SQL Discussion forums
Oracle Database
Download Oracle Database

MATCH_RECOGNIZE - Fraud demo for OracleCODE events

Overview

Business Problem: Finding Suspicious Money Transfers

Setup

Add data

View transaction data

Find fraudulent transactions

Breaking down the query

Set the PARTITION BY and ORDER BY clauses

Define output MEASURES

Report Type

PATTERN clause

Describing each pattern variable

Check pattern correctly applied

MATCH_NUMBER()

CLASSIFIER()