The aim of this exercise is to find suspicious money transfers within a series of accounts. We define a suspicious money transfer pattern as:
The pattern matching process needs to identify the following data points:
Create new table that will hold transactions in JSON format. We can ensure the data is in the correct format by using the new IS JSON check constraint
CREATE TABLE json_transactions
(transaction_doc CLOB,
CONSTRAINT "VALID_JSON" CHECK (transaction_doc IS JSON) ENABLE
);
Add data in JSON format. Note we could use external table to read directly from a JSON log file but in this case we need to insert the data directly into a table because we don't have access to external filesystem.
truncate table json_transactions;
INSERT INTO json_transactions VALUES ('{"time_id":"01-JAN-17","user_id":"John","event_id":"Deposit","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-17","user_id":"John","event_id":"Deposit","trans_amount":500000}');
INSERT INTO json_transactions VALUES ('{"time_id":"25-JAN-17","user_id":"John","event_id":"Deposit","trans_amount":1200000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-17","user_id":"John","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-MAR-17","user_id":"John","event_id":"Transfer","trans_amount":1200500}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-FEB-17","user_id":"John","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-FEB-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-APR-17","user_id":"John","event_id":"Transfer","trans_amount":3100400}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"05-JAN-17","user_id":"John","event_id":"Withdrawal","trans_amount":2000}');
INSERT INTO json_transactions VALUES ('{"time_id":"05-MAR-17","user_id":"John","event_id":"Withdrawal","trans_amount":12000}');
COMMIT;
Now we need to view the data that's been added. The data is returned in JSON format, i.e. key-value pairs, if we use a simple 'SELECT *' statement:
SELECT *
FROM json_transactions j;
We can also view the data in a transformed state where the resultset is a normal relational set of columns and rows. This is down using the new syntax for accessing JSON data: document_ref.attribute_name
SELECT
j.transaction_doc.time_id as time_id,
j.transaction_doc.user_id as user_id,
j.transaction_doc.event_id as event_id,
j.transaction_doc.trans_amount as amount
FROM json_transactions j;
All the columns are returned as VARCHAR2!
Using SQL we can apply predicates to our query so that it returns all the transactions that are transfers.
SELECT
TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id,
j.transaction_doc.user_id as user_id,
j.transaction_doc.event_id as event_id,
to_number(j.transaction_doc.trans_amount) as amount
FROM json_transactions j
WHERE j.transaction_doc.event_id = 'Transfer'
ORDER BY 2,1;
Important point is that you can see that we can do normal SQL operations on our JSON data
To make our pattern matching SQL statements easier to read we will create a view that will form the input into our pattern matching statement
CREATE VIEW transfers_view AS
SELECT
TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id,
j.transaction_doc.user_id as user_id,
j.transaction_doc.event_id as event_id,
TO_NUMBER(j.transaction_doc.trans_amount) as amount
FROM json_transactions j
WHERE j.transaction_doc.event_id = 'Transfer';
Note that we have converted the first column to a normal date format and the amount column is now a number format.
Let's create a very basic pattern matching statement where we will accept the default values for as possible.
SELECT user_id
FROM transfers_view
MATCH_RECOGNIZE(
PATTERN (X{3,} Y)
DEFINE
X AS (amount < 2000) AND
LAST(time_id) - FIRST(time_id) < 30,
Y AS (amount >= 1000000) AND
time_id - LAST(x.time_id) < 10);
This code will not run! It will return an error:
ORA-00904: "USER_ID": invalid identifier
Why? Because we have not defined the columns to be returned by MATCH_RECOGNIZE. So can we simply use 'SELECT *' to return the data we need?
SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
PATTERN (X{3,} Y)
DEFINE
X AS (amount < 2000) AND
LAST(time_id) - FIRST(time_id) < 30,
Y AS (amount >= 1000000) AND
time_id - LAST(x.time_id) < 10);
This no generates a different error:
ORA-30732: table contains no user-visible columns
Answer: No! We need to define the information to be returned by using the MEASURES clause. To keep things simple, we will return only the user_id and the amount.
SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
MEASURES
user_id AS user_id,
amount AS amount
PATTERN (X{3,} Y)
DEFINE
X AS (amount < 2000) AND
LAST(time_id) - FIRST(time_id) < 30,
Y AS (amount >= 1000000) AND
time_id - LAST(x.time_id) < 10);
This query returns no rows which does not seem correct because if we look at the source data it's clear that we have some suspicious transfers. So what is going on?
What we are missing the keyword ORDER BY which will sort the data within the pattern matching process.
SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
ORDER BY time_id
MEASURES
user_id AS user_id,
amount AS amount
PATTERN (X{3,} Y)
DEFINE
X AS (amount < 2000) AND
LAST(time_id) - FIRST(time_id) < 30,
Y AS (amount >= 1000000) AND
time_id - LAST(x.time_id) < 10);
Now we see that 2 matches have been found. But is this correct?
To test whether our MATCH_RECOGNIZE statement is processing data correctly we need to add some more transactions that include ones for a different user_id:
truncate table json_transactions;
INSERT INTO json_transactions VALUES ('{"time_id":"01-JAN-17","user_id":"Keith","event_id":"Transfer","trans_amount":10000}');
INSERT INTO json_transactions VALUES ('{"time_id":"01-JAN-17","user_id":"John","event_id":"Deposit","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-17","user_id":"John","event_id":"Deposit","trans_amount":500000}');
INSERT INTO json_transactions VALUES ('{"time_id":"25-JAN-17","user_id":"John","event_id":"Deposit","trans_amount":1200000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-JAN-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-17","user_id":"Keith","event_id":"Transfer","trans_amount":50000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"03-JAN-17","user_id":"Keith","event_id":"Transfer","trans_amount":12000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-MAR-17","user_id":"Keith","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-MAR-17","user_id":"Keith","event_id":"Transfer","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-MAR-17","user_id":"Keith","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-MAR-17","user_id":"Keith","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-17","user_id":"John","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-MAR-17","user_id":"John","event_id":"Transfer","trans_amount":1200500}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-FEB-17","user_id":"John","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-FEB-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-APR-17","user_id":"John","event_id":"Transfer","trans_amount":3100400}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"15-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"16-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"17-APR-17","user_id":"John","event_id":"Transfer","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"05-JAN-17","user_id":"John","event_id":"Withdrawal","trans_amount":2000}');
INSERT INTO json_transactions VALUES ('{"time_id":"05-MAR-17","user_id":"John","event_id":"Withdrawal","trans_amount":12000}');
COMMIT;
Now let's re-run the pattern matching but point to a specific occurrence of user_id by referencing our pattern matching variables.
In this case we want x.user_id which means the user_id when the pattern is matched for variable x
You will see that the resultset changes!
SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
ORDER BY time_id
MEASURES
x.user_id AS user_id,
amount AS amount
PATTERN (X{3,} Y)
DEFINE
X AS (amount < 2000) AND
LAST(time_id) - FIRST(time_id) < 30,
Y AS (amount >= 1000000) AND
time_id - LAST(x.time_id) < 10);
We still get two rows but the user_id in the first row has changed! It's now "Keith" and not "John". So something is still not correct.
What need to do is put each user_id in a separate block, or partition, and then process the transactions for each user_id where the transactions are sorted in data order. To do this we use the PARTITION BY and ORDER BY clauses. This query will now return three matches.
SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
PARTITION BY user_id
ORDER BY time_id
MEASURES
amount AS amount
PATTERN (X{3,} Y)
DEFINE
X AS (amount < 2000) AND
LAST(time_id) - FIRST(time_id) < 30,
Y AS (amount >= 1000000) AND
time_id - LAST(x.time_id) < 10);
Can we extract any more information about each match which will allow us to determine if our pattern matching query is returning the correct information?
We can control the output using ONE ROW PER MATCH or ALL ROWS PER MATCH.
SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
PARTITION BY user_id
ORDER BY time_id
MEASURES
amount AS t_amount
ALL ROWS PER MATCH
PATTERN (X{3,} Y)
DEFINE
X AS (amount < 2000) AND
LAST(time_id) - FIRST(time_id) < 30,
Y AS (amount >= 1000000) AND
time_id - LAST(x.time_id) < 10);
Now we can see the details of the rows that are matched against the pattern we have defined.
Also note that the amount column now appears twice because using ALL ROWS PER MATCH forces MATCH_RECOGNIZE to returns all the columns in the source table.
Therefore, if you add columns to your source table the new columns will automatically appear in the resultset - which may or may not be a problem.
Therefore, think very carefully whether you want to use SELECT * FROM... or whether you want to list the columns to be returned.
Are there any debugging tools to help us understand how the pattern is being applied? Yes: MATCH_NUMBER() and CLASSIFIER() functions.
SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
partition by user_id
order by time_id
MEASURES
amount AS t_amount,
match_number() AS mn,
classifier() AS cf
ALL ROWS PER MATCH
PATTERN (X{3,} Y)
DEFINE
X AS (amount < 2000) AND
LAST(time_id) - FIRST(time_id) < 30,
Y AS (amount >= 1000000) AND
time_id - LAST(x.time_id) < 10);
We can also see the rejected rows, i.e. the rows that were not matched by adding some more keywords: ALL ROWS PER MATCH WITH UNMATCHED ROWS
SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
partition by user_id
order by time_id
MEASURES
match_number() as mn,
classifier() as cf
ALL ROWS PER MATCH WITH UNMATCHED ROWS
PATTERN (X{3,} Y)
DEFINE
X as (amount < 2000) AND
LAST(time_id) - FIRST(time_id) < 30,
Y as (amount >= 1000000) AND
time_id - LAST(x.time_id) < 10);
Let's clean-up the column selection within our SELECT statement
SELECT user_id,time_id, amount, mn, cf
FROM transfers_view
MATCH_RECOGNIZE(
PARTITION BY user_id
ORDER BY time_id
MEASURES
match_number() AS mn,
classifier() AS cf
ALL ROWS PER MATCH WITH UNMATCHED ROWS
PATTERN (X{3,} Y)
DEFINE
X AS (amount < 2000) AND
LAST(time_id) - FIRST(time_id) < 30,
Y AS (amount >= 1000000) AND
time_id - LAST(x.time_id) < 10);
Can we create a summary report that contains more information about each match?
SELECT user_id, start_small, end_small, small_amount, avg_small_amount, start_big, big_amount
FROM transfers_view
MATCH_RECOGNIZE(
PARTITION BY user_id
ORDER BY time_id
MEASURES
FIRST(x.time_id) AS start_small,
LAST(x.time_id) AS end_small,
MAX(x.amount) AS small_amount,
TRUNC(AVG(x.amount),2) AS avg_small_amount,
FIRST(y.time_id) AS start_big,
y.amount AS big_amount
ONE ROW PER MATCH
PATTERN (X{3,} Y)
DEFINE
X AS (amount < 2000) AND
LAST(time_id) - FIRST(time_id) < 30,
Y AS (amount >= 1000000) AND
time_id - LAST(x.time_id) < 10);
Now we are going to add some new business requirements to our definition of suspicious activity within an account. Two new rules need to be applied:
Let's add some transactions which contain additional information:
TRUNCATE TABLE json_transactions;
INSERT INTO json_transactions VALUES ('{"time_id":"01-JAN-17","user_id":"John","event_id":"Deposit","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"25-JAN-17","user_id":"John","event_id":"Deposit","trans_amount":1200000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-17","user_id":"John","event_id":"Deposit","trans_amount":500000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-17","user_id":"John","event_id":"Transfer","transfer_id":"John","trans_amount":100}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-JAN-17","user_id":"John","event_id":"Transfer","transfer_id":"Bob","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-JAN-17","user_id":"John","event_id":"Transfer","transfer_id":"Allen","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-JAN-17","user_id":"John","event_id":"Transfer","transfer_id":"Tim","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-JAN-17","user_id":"John","event_id":"Transfer","transfer_id":"Tim","trans_amount":1000000}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-FEB-17","user_id":"John","event_id":"Transfer","transfer_id":"John","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-MAR-17","user_id":"John","event_id":"Transfer","transfer_id":"Allen","trans_amount":1200500}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-FEB-17","user_id":"John","event_id":"Transfer","transfer_id":"Allen","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-FEB-17","user_id":"John","event_id":"Transfer","transfer_id":"John", "trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"02-APR-17","user_id":"John","event_id":"Transfer","transfer_id":"Tim","trans_amount":1000}');
INSERT INTO json_transactions VALUES ('{"time_id":"27-APR-17","user_id":"John","event_id":"Transfer","transfer_id":"John","trans_amount":3100400}');
INSERT INTO json_transactions VALUES ('{"time_id":"20-APR-17","user_id":"John","event_id":"Transfer","transfer_id":"Tim","trans_amount":1200}');
INSERT INTO json_transactions VALUES ('{"time_id":"10-APR-17","user_id":"John","event_id":"Transfer","transfer_id":"John","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"15-APR-17","user_id":"John","event_id":"Transfer","transfer_id":"Tim","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"16-APR-17","user_id":"John","event_id":"Transfer","transfer_id":"Bob","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"17-APR-17","user_id":"John","event_id":"Transfer","transfer_id":"John","trans_amount":1500}');
INSERT INTO json_transactions VALUES ('{"time_id":"05-JAN-17","user_id":"John","event_id":"Withdrawal","trans_amount":2000}');
COMMIT;
Question: has the addition of new sets of key-value pairs broken our original queries?
SELECT * FROM json_transactions;
Still lists all our JSON transactions and our view still works:
SELECT * FROM transfers_view;
But the view does not include the new data.
As we have added a new set of key-value pairs to our JSON table, we need to update our view to include this new information
CREATE OR REPLACE VIEW transfers_view AS
SELECT
TO_DATE(j.transaction_doc.time_id, 'DD-MON-YYYY') as time_id,
j.transaction_doc.user_id as user_id,
j.transaction_doc.event_id as event_id,
j.transaction_doc.transfer_id as transfer_id,
TO_NUMBER(j.transaction_doc.trans_amount) as amount
FROM json_transactions j
WHERE j.transaction_doc.event_id = 'Transfer';
Note that we have converted the first column to a normal date format and the amount column is now a number format.
Does our original pattern matching query still work?
Pattern matching query still works!SELECT user_id, start_small, end_small, small_amount, avg_small_amount, start_big, big_amount
FROM transfers_view
MATCH_RECOGNIZE(
PARTITION BY user_id
ORDER BY time_id
MEASURES
FIRST(x.time_id) AS start_small,
LAST(x.time_id) AS end_small,
MAX(x.amount) AS small_amount,
TRUNC(AVG(x.amount),2) AS avg_small_amount,
FIRST(y.time_id) AS start_big,
y.amount AS big_amount
ONE ROW PER MATCH
PATTERN (X{3,} Y)
DEFINE
X AS (amount < 2000) AND
LAST(time_id) - FIRST(time_id) < 30,
Y AS (amount >= 1000000) AND
time_id - LAST(x.time_id) < 10);
Yes it works!
The changes we need to make are all contained within the DEFINE clause. Firstly we need to check if the transfer_id changes for each small transaction:
PREV(transfer_id) <> transfer_id
Secondly we need to check the total of all the small transactions:
SUM(x.amount) < 20000
The full code looks like this:
SELECT user_id, start_small, end_small, sum_small, start_big, big_amount
FROM transfers_view
MATCH_RECOGNIZE(
PARTITION BY user_id
ORDER BY time_id
MEASURES
FIRST(x.time_id) AS start_small,
LAST(x.time_id) AS end_small,
SUM(x.amount) AS sum_small,
FIRST(y.time_id) AS start_big,
y.amount AS big_amount
ONE ROW PER MATCH
PATTERN (X{3,} Y)
DEFINE
X AS (amount < 2000) AND
PREV(transfer_id) <> transfer_id AND
LAST(time_id) - FIRST(time_id) < 30,
Y AS (amount >= 1000000) AND
time_id - LAST(x.time_id) < 10 AND
SUM(x.amount) < 20000);
Let's look at a new scenario where we want to track the location of credit card transactions to determine if card has been illegally replicated. We will use the geo-spatial features that are part of the Oracle Database.
First let's build a JSON table to hold our sample data set
CREATE TABLE json_transactions_geo
(trans_geo_doc CLOB,
CONSTRAINT "VALID_GEO_JSON" CHECK (trans_geo_doc IS JSON) ENABLE
);
Now let's add some data
TRUNCATE TABLE json_transactions_geo;
INSERT INTO json_transactions_geo VALUES ('{"time_id":"25-JAN-17 08:30:23","user_id":"John","merchant_id":"Baskin Robins", "item_ref":"Ice Cream","trans_amount":10.00, "x":-71.48923,"y":42.72347,"l_x":-71.48923,"l_y":42.72347}');
INSERT INTO json_transactions_geo VALUES ('{"time_id":"25-JAN-17 09:35:10","user_id":"John","merchant_id":"Costco","item_ref":"Groceries","trans_amount":55.00,"x":-71.48923,"y":42.72347,"l_x":-71.48989,"l_y":42.72347}');
INSERT INTO json_transactions_geo VALUES ('{"time_id":"25-JAN-17 10:40:10","user_id":"John","merchant_id":"Starbucks","item_ref":"Coffee","trans_amount":5.00,"x":-71.48923,"y":42.72347,"l_x":-71.48854,"l_y":42.72347, }');
INSERT INTO json_transactions_geo VALUES ('{"time_id":"25-JAN-17 16:36:10","user_id":"John","merchant_id":"BestBuy","item_ref":"Apple Mac Pro","trans_amount":4150.00,"x":-91.48923,"y":42.72347,"l_x":-71.48923,"l_y":42.72347}');
COMMIT;
Let's quickly look at out credit card data...note that at this point we simply have numeric values for our geospatial coordinates. We will convert these to geospatial datatypes later
SELECT
TO_DATE(j.trans_geo_doc.time_id, 'DD-MON-YYYY, HH:MI:SS') as time_id,
j.trans_geo_doc.user_id as user_id,
j.trans_geo_doc.merchant_id as merchant_id,
j.trans_geo_doc.item_ref as item_ref,
to_number(j.trans_geo_doc.trans_amount) as trans_amount,
to_number(j.trans_geo_doc.x) as geo_x,
to_number(j.trans_geo_doc.y) as geo_y,
to_number(j.trans_geo_doc.l_x) as geo_lag_x,
to_number(j.trans_geo_doc.l_y) as geo_lag_y
FROM json_transactions_geo j;
To keep our pattern matching code simple and readable, let's create a view over our JSON data...
CREATE OR REPLACE VIEW VW_JSON_CC_TRANS AS
SELECT
TO_DATE(j.trans_geo_doc.time_id, 'DD-MON-YYYY, HH24:MI:SS') as day_id,
TO_CHAR(TO_TIMESTAMP(j.trans_geo_doc.time_id, 'DD-MON-YYYY, HH24:MI:SS'), 'HH24:MI:SS') as time_id,
TRUNC((TO_DATE(j.trans_geo_doc.time_id, 'DD-MON-RR, HH24:MI:SS') -
(TO_DATE(LAG(j.trans_geo_doc.time_id,1) over (order by j.trans_geo_doc.time_id), 'DD-MON-RR, HH24:MI:SS'))) * 24, 0) AS LAG_TIME,
j.trans_geo_doc.user_id as user_id,
j.trans_geo_doc.merchant_id as merchant_id,
j.trans_geo_doc.item_ref as item_ref,
to_number(j.trans_geo_doc.trans_amount) as trans_amount,
to_number(j.trans_geo_doc.x) as geo_x,
to_number(j.trans_geo_doc.y) as geo_y,
to_number(j.trans_geo_doc.l_x) as geo_lag_x,
to_number(j.trans_geo_doc.l_y) as geo_lag_y
FROM json_transactions_geo j;
We can now used the SDO_GEOM.SDO_DISTANCE function to calculate the distance between transactions...
SELECT
l.day_id as day_id,
l.time_id as time_id,
l.lag_time as mins,
l.user_id,
l.merchant_id,
l.item_ref,
l.trans_amount,
SDO_GEOM.SDO_DISTANCE(SDO_GEOMETRY(2001,8307,SDO_POINT_TYPE(l.geo_x, l.geo_y, NULL),NULL,NULL),
SDO_GEOMETRY(2001,8307,SDO_POINT_TYPE(l.geo_lag_x, l.geo_lag_y, NULL),NULL,NULL),
0.0001,'unit=km') AS DISTANCE_BETWEEN_TRANS
FROM VW_JSON_CC_TRANS l
ORDER BY l.day_id, time_id;
Your task now is to use this information in MATCH_RECOGNIZE query that looks for 3 low value transactions, followed by a large transaction where the distance between the last small transaction and the large transaction is too large (i.e. suspicious)
Using the previous examples you should be able to construct this query yourself.