JavaOne - Simplified and Fast Fraud Detection using SQL Pattern Matching

View All Tutorials Login to Run Tutorial

Tutorial JavaOne - Simplified and Fast Fraud Detection using SQL Pattern Matching
Description This session demonstrates how, using SQL pattern matching techniques, you can speed up searching for fraudulent events within unstructured data, such as JSON files, compared to using other programming languages. As part of the presentation, a live demo reviews a specific business use case from financial services and provides conceptual simplified SQL to solve an evolving set of business requirements. You’ll see a simple five-step framework for constructing pattern-matching queries, and a live demonstration shows how easy it is to write and amend SQL-based pattern queries, compared to other programming languages.
Tags pattern, sql
Area SQL Analytics
Contributor Keith Laker (Oracle)
Created Tuesday September 26, 2017
Modules 19

Module 1
Business Problem - Find Suspicious Transfers
The aim of this exercise is to find suspicious money transfers within a series of accounts. We define a suspicious money transfer pattern as:
1. 3 or more small (<2K) money transfers within 30 days
2. Large transfer (>=1M) within 10 days of last small transfer
The pattern matching process needs to identify the following data points:
1. Account
2. Date of first small transfer
3. Date of last small transfer
4. Date of large transfer
5. Amount of large transfer

Module 2

Create table to store JSON data

Create new table that will hold transactions in JSON format. We can ensure the data is in the correct format by using the new IS JSON check constraint

CREATE TABLE json_transactions 
(transaction_doc CLOB, 
 CONSTRAINT "VALID_JSON" CHECK (transaction_doc IS JSON) ENABLE
);

Add data in JSON format. Note we could use external table to read directly from a JSON log file but in this case we need to insert the data directly into a table because we don't have access to external filesystem.

truncate table json_transactions;
INSERT INTO json_transactions VALUES ('{"time_id":"01-JAN-17","user_id":"John","event_id":"Deposit","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-17","user_id":"John","event_id":"Deposit","trans_amount":500000}');
INSERT INTO json_transactions VALUES ('{"time_id":"25-JAN-17","user_id":"John","event_id":"Deposit","trans_amount":1200000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-17","user_id":"John","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-MAR-17","user_id":"John","event_id":"Transfer","trans_amount":1200500}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-FEB-17","user_id":"John","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-FEB-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-APR-17","user_id":"John","event_id":"Transfer","trans_amount":3100400}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"05-JAN-17","user_id":"John","event_id":"Withdrawal","trans_amount":2000}');
INSERT INTO json_transactions VALUES ('{"time_id":"05-MAR-17","user_id":"John","event_id":"Withdrawal","trans_amount":12000}');

COMMIT;

Module 3
View JSON data
Now we need to view the data that's been added. The data is returned in JSON format, i.e. key-value pairs, if we use a simple 'SELECT *' statement:
```
SELECT *
FROM json_transactions j;
```
We can also view the data in a transformed state where the resultset is a normal relational set of columns and rows. This is down using the new syntax for accessing JSON data: document_ref.attribute_name
```
SELECT 
  j.transaction_doc.time_id as time_id, 
  j.transaction_doc.user_id as user_id,
  j.transaction_doc.event_id as event_id,   
  j.transaction_doc.trans_amount as amount
FROM json_transactions j;
```
All the columns are returned as VARCHAR2!

Using SQL we can apply predicates to our query so that it returns all the transactions that are transfers.
```
SELECT 
       TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
       j.transaction_doc.user_id as user_id,
       j.transaction_doc.event_id as event_id,   
       to_number(j.transaction_doc.trans_amount) as amount
FROM json_transactions j
WHERE  j.transaction_doc.event_id = 'Transfer'
ORDER BY 2,1;
```
Important point is that you can see that we can do normal SQL operations on our JSON data

Module 4

Create a view

To make our pattern matching SQL statements easier to read we will create a view that will form the input into our pattern matching statement

CREATE VIEW transfers_view AS
SELECT 
       TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
       j.transaction_doc.user_id as user_id,
       j.transaction_doc.event_id as event_id,   
       TO_NUMBER(j.transaction_doc.trans_amount) as amount
FROM json_transactions j
WHERE  j.transaction_doc.event_id = 'Transfer';

Note that we have converted the first column to a normal date format and the amount column is now a number format.

Module 5
Simple pattern matching expression
Let's create a very basic pattern matching statement where we will accept the default values for as possible.
```
SELECT user_id
FROM transfers_view
 MATCH_RECOGNIZE(
 PATTERN (X{3,} Y)
 DEFINE
   X AS (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y AS (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10);
```
This code will not run! It will return an error:

ORA-00904: "USER_ID": invalid identifier

Why? Because we have not defined the columns to be returned by MATCH_RECOGNIZE. So can we simply use 'SELECT *' to return the data we need?
```
SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
 PATTERN (X{3,} Y)
 DEFINE
   X AS (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y AS (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10);
```
This no generates a different error:

ORA-30732: table contains no user-visible columns

Answer: No! We need to define the information to be returned by using the MEASURES clause. To keep things simple, we will return only the user_id and the amount.
```
SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
 MEASURES
   user_id AS user_id,
   amount AS amount
 PATTERN (X{3,} Y)
 DEFINE
   X AS (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y AS (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10);
```
This query returns no rows which does not seem correct because if we look at the source data it's clear that we have some suspicious transfers. So what is going on?

Module 6

Correct order is important

What we are missing the keyword ORDER BY which will sort the data within the pattern matching process.

SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
 ORDER BY time_id
 MEASURES
   user_id AS user_id,
   amount AS amount
 PATTERN (X{3,} Y)
 DEFINE
   X AS (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y AS (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10);

Now we see that 2 matches have been found. But is this correct?

Module 7

Adding more transactions

To test whether our MATCH_RECOGNIZE statement is processing data correctly we need to add some more transactions that include ones for a different user_id:

truncate table json_transactions;
INSERT INTO json_transactions VALUES ('{"time_id":"01-JAN-17","user_id":"Keith","event_id":"Transfer","trans_amount":10000}');
INSERT INTO json_transactions VALUES ('{"time_id":"01-JAN-17","user_id":"John","event_id":"Deposit","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-17","user_id":"John","event_id":"Deposit","trans_amount":500000}');
INSERT INTO json_transactions VALUES ('{"time_id":"25-JAN-17","user_id":"John","event_id":"Deposit","trans_amount":1200000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-17","user_id":"Keith","event_id":"Transfer","trans_amount":50000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"03-JAN-17","user_id":"Keith","event_id":"Transfer","trans_amount":12000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-MAR-17","user_id":"Keith","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-MAR-17","user_id":"Keith","event_id":"Transfer","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-MAR-17","user_id":"Keith","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-MAR-17","user_id":"Keith","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-17","user_id":"John","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-MAR-17","user_id":"John","event_id":"Transfer","trans_amount":1200500}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-FEB-17","user_id":"John","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-FEB-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-APR-17","user_id":"John","event_id":"Transfer","trans_amount":3100400}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"15-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"16-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"17-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"05-JAN-17","user_id":"John","event_id":"Withdrawal","trans_amount":2000}');
INSERT INTO json_transactions VALUES ('{"time_id":"05-MAR-17","user_id":"John","event_id":"Withdrawal","trans_amount":12000}');
COMMIT;

Module 8
Working with multiple user_ids
Now let's re-run the pattern matching but point to a specific occurrence of user_id by referencing our pattern matching variables.

In this case we want x.user_id which means the user_id when the pattern is matched for variable x

You will see that the resultset changes!
```
SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
 ORDER BY time_id
 MEASURES
   x.user_id AS user_id,
   amount AS amount
 PATTERN (X{3,} Y)
 DEFINE
   X AS (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y AS (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10); 
```
We still get two rows but the user_id in the first row has changed! It's now "Keith" and not "John". So something is still not correct.
Module 9
Using the PARTITION BY clause
What need to do is put each user_id in a separate block, or partition, and then process the transactions for each user_id where the transactions are sorted in data order. To do this we use the PARTITION BY and ORDER BY clauses. This query will now return three matches.
```
SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
 PARTITION BY user_id
 ORDER BY time_id
 MEASURES
   amount AS amount
 PATTERN (X{3,} Y)
 DEFINE
   X AS (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y AS (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10);
```
Module 10
Determining the type of output
Can we extract any more information about each match which will allow us to determine if our pattern matching query is returning the correct information?

We can control the output using ONE ROW PER MATCH or ALL ROWS PER MATCH.
```
SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
 PARTITION BY user_id
 ORDER BY time_id
 MEASURES
   amount AS t_amount
 ALL ROWS PER MATCH  
 PATTERN (X{3,} Y)
 DEFINE
   X AS (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y AS (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10);
```
Now we can see the details of the rows that are matched against the pattern we have defined.

Also note that the amount column now appears twice because using ALL ROWS PER MATCH forces MATCH_RECOGNIZE to returns all the columns in the source table.

Therefore, if you add columns to your source table the new columns will automatically appear in the resultset - which may or may not be a problem.

Therefore, think very carefully whether you want to use SELECT * FROM... or whether you want to list the columns to be returned.

Module 11

Checking the results

Are there any debugging tools to help us understand how the pattern is being applied? Yes: MATCH_NUMBER() and CLASSIFIER() functions.

SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
 partition by user_id
 order by time_id
 MEASURES
   amount AS t_amount,
   match_number() AS mn,
   classifier() AS cf
 ALL ROWS PER MATCH  
 PATTERN (X{3,} Y)
 DEFINE
   X AS (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y AS (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10);

We can also see the rejected rows, i.e. the rows that were not matched by adding some more keywords: ALL ROWS PER MATCH WITH UNMATCHED ROWS

SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
 partition by user_id
 order by time_id
 MEASURES
   match_number() as mn,
   classifier() as cf
 ALL ROWS PER MATCH WITH UNMATCHED ROWS  
 PATTERN (X{3,} Y)
 DEFINE
   X as (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y as (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10);

Module 12

Cleaning up the output

Let's clean-up the column selection within our SELECT statement

SELECT user_id,time_id, amount, mn, cf
FROM transfers_view
MATCH_RECOGNIZE(
 PARTITION BY user_id
 ORDER BY time_id
 MEASURES
   match_number() AS mn,
   classifier() AS cf
 ALL ROWS PER MATCH WITH UNMATCHED ROWS  
 PATTERN (X{3,} Y)
 DEFINE
   X AS (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y AS (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10);

Can we create a summary report that contains more information about each match?

SELECT user_id, start_small, end_small, small_amount, avg_small_amount, start_big, big_amount
FROM transfers_view
MATCH_RECOGNIZE(
 PARTITION BY user_id
 ORDER BY time_id
 MEASURES
   FIRST(x.time_id) AS start_small,
   LAST(x.time_id) AS end_small,
   MAX(x.amount) AS small_amount,
   TRUNC(AVG(x.amount),2) AS avg_small_amount,
   FIRST(y.time_id) AS start_big,
   y.amount AS big_amount
 ONE ROW PER MATCH  
 PATTERN (X{3,} Y)
 DEFINE
   X AS (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y AS (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10);

Module 13

New Requirements

Now we are going to add some new business requirements to our definition of suspicious activity within an account. Two new rules need to be applied:

Small transfers do not go to the same account twice in a row
Total sum of small transfers must be less than 20K

Let's add some transactions which contain additional information:

TRUNCATE TABLE json_transactions;
INSERT INTO json_transactions VALUES ('{"time_id":"01-JAN-17","user_id":"John","event_id":"Deposit","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"25-JAN-17","user_id":"John","event_id":"Deposit","trans_amount":1200000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-17","user_id":"John","event_id":"Deposit","trans_amount":500000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-17","user_id":"John","event_id":"Transfer","transfer_id":"John","trans_amount":100}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-17","user_id":"John","event_id":"Transfer","transfer_id":"Bob","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-JAN-17","user_id":"John","event_id":"Transfer","transfer_id":"Allen","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-JAN-17","user_id":"John","event_id":"Transfer","transfer_id":"Tim","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-JAN-17","user_id":"John","event_id":"Transfer","transfer_id":"Tim","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-17","user_id":"John","event_id":"Transfer","transfer_id":"John","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-MAR-17","user_id":"John","event_id":"Transfer","transfer_id":"Allen","trans_amount":1200500}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-FEB-17","user_id":"John","event_id":"Transfer","transfer_id":"Allen","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-FEB-17","user_id":"John","event_id":"Transfer","transfer_id":"John", "trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-APR-17","user_id":"John","event_id":"Transfer","transfer_id":"Tim","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-APR-17","user_id":"John","event_id":"Transfer","transfer_id":"John","trans_amount":3100400}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-APR-17","user_id":"John","event_id":"Transfer","transfer_id":"Tim","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-APR-17","user_id":"John","event_id":"Transfer","transfer_id":"John","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"15-APR-17","user_id":"John","event_id":"Transfer","transfer_id":"Tim","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"16-APR-17","user_id":"John","event_id":"Transfer","transfer_id":"Bob","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"17-APR-17","user_id":"John","event_id":"Transfer","transfer_id":"John","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"05-JAN-17","user_id":"John","event_id":"Withdrawal","trans_amount":2000}');

COMMIT;

Question: has the addition of new sets of key-value pairs broken our original queries?

SELECT * FROM json_transactions;

Still lists all our JSON transactions and our view still works:

SELECT * FROM transfers_view;

But the view does not include the new data.

Module 14

Amend View

As we have added a new set of key-value pairs to our JSON table, we need to update our view to include this new information

CREATE OR REPLACE VIEW transfers_view AS
SELECT 
       TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id, 
       j.transaction_doc.user_id as user_id,
       j.transaction_doc.event_id as event_id,
       j.transaction_doc.transfer_id as transfer_id,   
       TO_NUMBER(j.transaction_doc.trans_amount) as amount
FROM json_transactions j
WHERE  j.transaction_doc.event_id = 'Transfer';

Note that we have converted the first column to a normal date format and the amount column is now a number format.

Does our original pattern matching query still work?

Pattern matching query still works!

SELECT user_id, start_small, end_small, small_amount, avg_small_amount, start_big, big_amount
FROM transfers_view
MATCH_RECOGNIZE(
 PARTITION BY user_id
 ORDER BY time_id
 MEASURES
   FIRST(x.time_id) AS start_small,
   LAST(x.time_id) AS end_small,
   MAX(x.amount) AS small_amount,
   TRUNC(AVG(x.amount),2) AS avg_small_amount,
   FIRST(y.time_id) AS start_big,
   y.amount AS big_amount
 ONE ROW PER MATCH  
 PATTERN (X{3,} Y)
 DEFINE
   X AS (amount < 2000) AND 
         LAST(time_id) - FIRST(time_id) < 30,
   Y AS (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10);

Yes it works!

Module 15

Coding the new requirements

The changes we need to make are all contained within the DEFINE clause. Firstly we need to check if the transfer_id changes for each small transaction:

PREV(transfer_id) <> transfer_id

Secondly we need to check the total of all the small transactions:

SUM(x.amount) < 20000

The full code looks like this:

SELECT user_id, start_small, end_small, sum_small, start_big, big_amount 
FROM transfers_view
MATCH_RECOGNIZE(
 PARTITION BY user_id 
 ORDER BY time_id
 MEASURES 
   FIRST(x.time_id) AS start_small,
   LAST(x.time_id) AS end_small,
   SUM(x.amount) AS sum_small,
   FIRST(y.time_id) AS start_big,
   y.amount AS big_amount
 ONE ROW PER MATCH
 PATTERN (X{3,} Y)
 DEFINE
   X AS (amount < 2000) AND 
         PREV(transfer_id) <> transfer_id AND
         LAST(time_id) - FIRST(time_id) < 30,
   Y AS (amount >= 1000000) AND 
         time_id - LAST(x.time_id) < 10 AND
         SUM(x.amount) < 20000);

Module 16

Adding Richer Analysis

Let's look at a new scenario where we want to track the location of credit card transactions to determine if card has been illegally replicated. We will use the geo-spatial features that are part of the Oracle Database.

First let's build a JSON table to hold our sample data set

CREATE TABLE json_transactions_geo 
(trans_geo_doc CLOB, 
 CONSTRAINT "VALID_GEO_JSON" CHECK (trans_geo_doc IS JSON) ENABLE
);

Now let's add some data

TRUNCATE TABLE json_transactions_geo;
INSERT INTO json_transactions_geo VALUES ('{"time_id":"25-JAN-17 08:30:23","user_id":"John","merchant_id":"Baskin Robins", "item_ref":"Ice Cream","trans_amount":10.00, "x":-71.48923,"y":42.72347,"l_x":-71.48923,"l_y":42.72347}');
INSERT INTO json_transactions_geo VALUES ('{"time_id":"25-JAN-17 09:35:10","user_id":"John","merchant_id":"Costco","item_ref":"Groceries","trans_amount":55.00,"x":-71.48923,"y":42.72347,"l_x":-71.48989,"l_y":42.72347}');
INSERT INTO json_transactions_geo VALUES ('{"time_id":"25-JAN-17 10:40:10","user_id":"John","merchant_id":"Starbucks","item_ref":"Coffee","trans_amount":5.00,"x":-71.48923,"y":42.72347,"l_x":-71.48854,"l_y":42.72347, }');
INSERT INTO json_transactions_geo VALUES ('{"time_id":"25-JAN-17 16:36:10","user_id":"John","merchant_id":"BestBuy","item_ref":"Apple Mac Pro","trans_amount":4150.00,"x":-91.48923,"y":42.72347,"l_x":-71.48923,"l_y":42.72347}');
COMMIT;

Module 17

Check credit card data

Let's quickly look at out credit card data...note that at this point we simply have numeric values for our geospatial coordinates. We will convert these to geospatial datatypes later

SELECT 
       TO_DATE(j.trans_geo_doc.time_id, 'DD-MON-YYYY, HH:MI:SS') as time_id, 
       j.trans_geo_doc.user_id as user_id,
       j.trans_geo_doc.merchant_id as merchant_id,   
       j.trans_geo_doc.item_ref as item_ref,
       to_number(j.trans_geo_doc.trans_amount) as trans_amount,
       to_number(j.trans_geo_doc.x) as geo_x,
       to_number(j.trans_geo_doc.y) as geo_y,
       to_number(j.trans_geo_doc.l_x) as geo_lag_x,
       to_number(j.trans_geo_doc.l_y) as geo_lag_y
FROM json_transactions_geo j;

Module 18

Building A View

To keep our pattern matching code simple and readable, let's create a view over our JSON data...

CREATE OR REPLACE VIEW VW_JSON_CC_TRANS AS
SELECT 
    TO_DATE(j.trans_geo_doc.time_id, 'DD-MON-YYYY, HH24:MI:SS') as day_id, 
    TO_CHAR(TO_TIMESTAMP(j.trans_geo_doc.time_id, 'DD-MON-YYYY, HH24:MI:SS'), 'HH24:MI:SS') as time_id, 
    TRUNC((TO_DATE(j.trans_geo_doc.time_id, 'DD-MON-RR, HH24:MI:SS') - 
    (TO_DATE(LAG(j.trans_geo_doc.time_id,1) over (order by j.trans_geo_doc.time_id), 'DD-MON-RR, HH24:MI:SS'))) * 24, 0) AS LAG_TIME,
    j.trans_geo_doc.user_id as user_id,
    j.trans_geo_doc.merchant_id as merchant_id,   
    j.trans_geo_doc.item_ref as item_ref,
    to_number(j.trans_geo_doc.trans_amount) as trans_amount,
    to_number(j.trans_geo_doc.x) as geo_x,
    to_number(j.trans_geo_doc.y) as geo_y,
    to_number(j.trans_geo_doc.l_x) as geo_lag_x,
    to_number(j.trans_geo_doc.l_y) as geo_lag_y
FROM json_transactions_geo j;

Module 19
Using spatial analytics
We can now used the SDO_GEOM.SDO_DISTANCE function to calculate the distance between transactions...
```
SELECT 
 l.day_id as day_id,
 l.time_id as time_id,
 l.lag_time as mins,
 l.user_id,
 l.merchant_id,
 l.item_ref,
 l.trans_amount,
 SDO_GEOM.SDO_DISTANCE(SDO_GEOMETRY(2001,8307,SDO_POINT_TYPE(l.geo_x, l.geo_y, NULL),NULL,NULL),
                      SDO_GEOMETRY(2001,8307,SDO_POINT_TYPE(l.geo_lag_x, l.geo_lag_y, NULL),NULL,NULL),
                      0.0001,'unit=km') AS DISTANCE_BETWEEN_TRANS
FROM VW_JSON_CC_TRANS l 
ORDER BY l.day_id, time_id;
```
Your task now is to use this information in MATCH_RECOGNIZE query that looks for 3 low value transactions, followed by a large transaction where the distance between the last small transaction and the large transaction is too large (i.e. suspicious)

Using the previous examples you should be able to construct this query yourself.

Additional Information

Database on OTN SQL and PL/SQL Discussion forums
Oracle Database
Download Oracle Database

JavaOne - Simplified and Fast Fraud Detection using SQL Pattern Matching

Business Problem - Find Suspicious Transfers

Create table to store JSON data

View JSON data