Using Faceted Navigation

View All Tutorials Login to Run Tutorial

Tutorial Using Faceted Navigation
Description Shows how to use the XML-based Result Set Interface to do queries with summary information - or facets - and how to search using those facets. Applies to Oracle Database 11.2 or later.
Tags text facets faceted navigation xml RSI RSD
Area Oracle Text
Contributor Roger Ford (Oracle)
Created Thursday August 22, 2019
Modules 5

Module 1
Introduction
Faceted navigation uses the XML-based Result Set Interface (RSI) to run Oracle Text queries, rather than the typical SQL "select ... from ... where ..." query API.
Why would we want to do that? Consider what you might want to return as the result of a text query:
- A page full of hits - say the first 10 hits, ordered by score. Each hit would consist of some summary information (name, title, date maybe) and perhaps a snippet - a small portion of text with the search terms highlighted.
- The total count of hits, so you can say "Hits 1-10 of 625" or similar
- Summary information about categories, or "facets". For example "100 hits from Southern region, 525 hits from Northern region". These facets could then later be used to drill-down into the results - hence "faceted navigation".
This type of query is difficult or impossible from a single SQL query. Just getting a count of hits requires you to fetch the entire result set back, or do a separate "select count(*)" query as well.
Hence the Result Set Interface. You provide a Result Set Descriptor (RSD), which is a block of XML specifying what you want to get back (but not the search term - that's provided separately), and the RSI returns to you a Result Set, which is a block of XML describing the top-N hits and all the summary information.
This tutorial walks you through using the Result Set Interface on some short, sample data.

Module 2

Creating a table and index for use with faceted navigation

We're going to create a table which has some structured information (our "facets") and a text field. All but one row will contain the word 'record' which we'll later use as a full-text search term.

create table docs (
   source     varchar2(20), 
   recordtype varchar2(20),
   region     varchar2(20),
   text       clob );

And insert some data into that table:

insert into docs values (
   'Internal', 'Customer Info', 'South',      'This is record 1');
insert into docs values (
   'Internal', 'Customer Info', 'North',      'This record is record 2.');
insert into docs values (
   'External', 'Customer Info', 'South-East', 'This is record 3');
insert into docs values (
   'External', 'Customer Info', 'South-East', 'This is record 4');
insert into docs values (
   'External', 'Office Doc',    'North-West', 'This is record 5');
insert into docs values (
   'External', 'Office Doc',    'North-East', 'This is record 6');
insert into docs values (
   'External', 'Office Doc',    'North-East', 'No r-word in this one');

Now the setup necessary for the index. Facets must be defined as SDATA (Structured DATA) fields. They can either be embedded in the text, or they can be separate FILTER BY columns in the table as we're going to do here.

-- only needed if re-running: exec ctx_ddl.drop_section_group  ('mysg')
exec ctx_ddl.create_section_group('mysg', 'HTML_SECTION_GROUP')
exec ctx_ddl.add_sdata_column    ('mysg', 'source', 'source')
exec ctx_ddl.add_sdata_column    ('mysg', 'recordtype', 'recordtype')
exec ctx_ddl.add_sdata_column    ('mysg', 'region', 'region')

Now we'll create the actual index. Notice that the SDATA columns specified above MUST be specified as FILTER BY columns for SDATA fields to be created from them. The section group created above is specified in the PARAMETERS clause.

create index docsindex on docs(text) indextype is ctxsys.context
filter by source, recordtype, region
parameters ('section group mysg');

We now have an index with SDATA sections defined for our three facets, and we're ready to move onto the next module - using the Result Set Interface to query the table and fetch back facets.

Module 3
Querying facets using the Result Set Interface
As mentioned earlier, the Result Set Interface is an alternative to normal SQL queries in Oracle Text.
What we commonly want as the result of a text query is:
- A "page full" of results - say the top 10 hits ordered by score
- A count of the total number of hits that would result from the query
- Summary information, such as how many results (from the full set, not just our page full) fall into certain categories, or facets.
These types of queries are almost impossible to do as a single query in SQL without fetching every single hit to get the count and build the summary info.
So instead we provide the XML-based Result Set Interface. The developer provides us with Result Set Descriptor (RSD) which specifies all the information they want returned. We process the RSD and a query string, and return a Result Set (RS). Both the RSD and the RS are written in XML.
The application can then process the XML in order to display the results to the end-user. They can do that using native XML capabilities of the application's language, or by using SQL-XML capabilities built into Oracle Database, which extract strings or relational tables (for repeated info) from the XML text.
For this demo, we're going to use a temporary table to store the RSD and RS. In most situations, you would use a suitable variables in your programming language to store them, but in LiveSQL it's easier to use a temporary table.
So let's create a suitable table:
```
create table temp_rs (rsd clob, rs_text xmltype);
```
Rather than using xmltype, we could use a CLOB for the RSD as well, but then we'd need to cast it to xmltype using xmltype(clobname) each time we invoked an XML-specific operation on it.
Now let's construct our Result Set Descriptor:
(Note: only do this once. If you need to change or rerun it, delete from temp_rs first)
```
begin
  insert into temp_rs (rsd) values ('
<ctx_result_set_descriptor>
  <count />
  <hitlist start_hit_num="1" end_hit_num="2" order="score desc">
    <score />
    <rowid />
    <sdata name="source" />
  </hitlist>
  <group sdata="source"> <count exact="true"/> </group>
  <group sdata="recordtype"> <count exact="true"/> </group>
  <group sdata="region"> <count exact="true"/> </group>
</ctx_result_set_descriptor>
');
end;
/
```
Some explanation is needed there.
- <count> tells us we want the total hit count returned.
- <hitlist> specifies how many results we want, how they should be sorted, and what information (rowids and/or SDATA columns) should be returned for each result.
- <group> specified a facet and what should be returned for that facet.
  The count exact="true" part tells us that the facet count should be exact - otherwise we get an estimated value which does not account for any deleted records, nor for any SDATA clauses used in the query. That's important for the next part.
For this tutorial, we're only considering single-valued facets. For example REGION might contain NORTH, SOUTH, EAST and WEST. In 19c and later it's possible to use range buckets for numeric or date values (for example price less than $10, price $10 - $20, etc) but we're not going to cover that here.
Before we go any further, let's list our RSD.
```
select rsd from temp_rs;
```
(In SQL*Plus you would want to SET LONG 50000 and do select xmltype(rsd)..." in order to format the XML nicely).
Make sure there's one and only one row returned. If you have more than one, future parts of the tutorial will fail. If that's the case, drop the table, and go back to the start of this section (section 2).
Now having specified our Result Set Descriptor, we can run a query using it. We do this using the PL/SQL procedure CTX_QUER.RESULT_SET. We need to specify the index to search, the search term or query string, the Resul Set Descritor as an input, and a variable for the Result Set output. Our query is for the word "RECORD" which should return six of the seven rows in the table.
So this is the actual query procedure:
```
declare
   ResSetDesc clob;
   ResultSet  clob;
begin

   -- fetch the RSD from our temporary table (assumes only we are using the temp table and it was cleared before running the module)

   select rsd into ResSetDesc from temp_rs;

   -- and run our query using the RSD
   
   ctx_query.result_set(
      index_name => 'docsindex',
      query      => 'record',
      result_set_descriptor => ResSetDesc,
      result_set => ResultSet );

   -- ResSetOut now contains XML result set: write it to temporary table

   update temp_rs set rs_text = xmltype(ResultSet);

end;
/
```
After running that, you'll only see "Statement processed". But the Result Set output has been written to our temporary table. Let's check it. We'll use getClobVal to make sure it's formatted nicely:
```
select xmltype.getclobval(rs_text) from temp_rs;
```
Take a few minutes to look through that, and compare it to what we requested in our Result Set Descriptor earlier. Remember we only asked for the top 2 results in our hitlist section.
In particular note the group counts towards the end. We can see that from our entire result set (not just the top 2 hits) there were 4 records with SOURCE=External and 2 records with SOURCE=Internal.
These are our facet counts.
Module 4
Processing the Result Set XML
This section shows how you can use the SQL-XML capabilities of Oracle Database to process that block of XML back into SQL-like constructs such as VARCHAR2 values and relational tables.
First of all, we'll use the XMLTable table function to get the hitlist rows.
Let's take a look at the hitlist section of the Result Set output:
```
  <hitlist>
    <hit>
      <score>6</score>
      <rowid>ABLKUlAAkAAAAQcAAB</rowid>
      <sdata name="SOURCE">Internal</sdata>
    </hit>
    <hit>
      <score>3</score>
      <rowid>ABLKUlAAkAAAAQcAAA</rowid>
      <sdata name="SOURCE">Internal</sdata>
    </hit>
  </hitlist>
```
We want to fetch that info back as a relational table with two rows in it. We can do that using the XMLTable table function and passing it our Result Set from the temporary table.
XMLTable requires us to specify the XML path to each data item, and how that should appear as column in the table function (its name and datatype). Then since we've got ROWIDs from the original table DOCS, we might as well join our table function with the DOCS table so we can fetch back the original text, or any metadata columns in the original table which are not returned by SDATA values:
```
select score, d.text, rso.source from 
  xmltable ( '/ctx_result_set/hitlist/hit' 
    PASSING ( select rs_text from temp_rs )
    COLUMNS
      xrowid varchar2(18)  PATH 'rowid',
      score number         PATH 'score',
      source varchar2(128) PATH 'sdata[@name="SOURCE"]' ) rso,
  docs d
  where d.rowid = rso.xrowid
  order by score desc;
```
There you go - our XML hitlist has now become a SQL table.
Getting the total hit count
Getting the total hit count is easier - there's only a single value so we don't need XMLTable, we can use ExtractValue, passing it the path of the hitcount:
```
select extractvalue( (select rs_text from temp_rs ), '/ctx_result_set/count' ) as "Total Hits" from dual;
```
And finally we can get the facets and facet counts. We have three facet types, SOURCE, RECORDTYPE and REGION. Each of those have a section in the XML. For SOURCE, it looks like:
```
  <groups sdata="SOURCE">
    <group value="External">
      <count>4</count>
    </group>
    <group value="Internal">
      <count>2</count>
    </group>
  </groups>
```
It's another repeated field, so we need another XMLTable. And we need the same for each facet type:
```
select * from
  xmltable( '/ctx_result_set/groups[@sdata="SOURCE"]/group' 
    PASSING ( ( select rs_text from temp_rs ) )
    COLUMNS 
      source      VARCHAR2(30) PATH '@value',
      facet_count NUMBER       PATH 'count/text()') as rso;

select * from
  xmltable( '/ctx_result_set/groups[@sdata="RECORDTYPE"]/group' 
    PASSING ( ( select rs_text from temp_rs ) )
    COLUMNS 
      recordtype      VARCHAR2(30) PATH '@value',
      facet_count NUMBER       PATH 'count/text()') as rso;

select * from
  xmltable( '/ctx_result_set/groups[@sdata="REGION"]/group' 
    PASSING ( ( select rs_text from temp_rs ) )
    COLUMNS 
      region      VARCHAR2(30) PATH '@value',
      facet_count NUMBER       PATH 'count/text()') as rso;
```
And that's it.
We've shown how we can use the XML-based Result Set Interface to fetch back a hitlist and summary information in a single operation, and how to manipulate that XML back into relational form.
Module 5
"Drilling Down" with Facets - Facet Filtering
Once we've presented the results to our users, complete with facets, we want them to be able to pick facets of interest, and add them to the query.
The actual mechanism for displaying and allowing the selection of facets is beyond this tutorial, but let's assume there's some sort of checkbox system, and the user has indicated that they want to see only documents where the SOURCE facet has value "Internal".
We then want to re-run the query, but with SOURCE="Internal". We do that by adding an SDATA clause to the query parameter of ctx_query.result_set. We don't have to modify the Result Set Descriptor atat all. So our query procedure is now:
```
declare
   ResSetDesc clob;
   ResultSet  clob;
begin

   -- fetch the RSD from our temporary table (assumes only we are using the temp table and it was cleared before running the module)

   select rsd into ResSetDesc from temp_rs;

   -- and run our query using the RSD
   
   ctx_query.result_set(
      index_name => 'docsindex',
      query      => 'record AND SDATA(source="Internal")',
      result_set_descriptor => ResSetDesc,
      result_set => ResultSet );

   -- ResSetOut now contains XML result set: write it to temporary table

   update temp_rs set rs_text = xmltype(ResultSet);

end;
/
```
Notice that the 'query' parameter now includes an SDATA clause to restrict the result set. Also note that case is important in SDATA - it's an exact match, not the equivalent of a match in the CONTAINS operator.
Once again we can list the output from our query with:
```
select xmltype.getclobval(rs_text) from temp_rs;
```
We can see that both the records selected have "Internal" for the source, and that the facets for source now show only
```
  <groups sdata="SOURCE">
    <group value="Internal">
      <count>2</count>
    </group>
  </groups>
```
What if we want to select multiple facets? Easy enough, we can use AND and OR with the SDATA operators (though be careful to use parentheses properly!)
```
declare
   ResSetDesc clob;
   ResultSet  clob;
begin

   -- fetch the RSD from our temporary table (assumes only we are using the temp table and it was cleared before running the module)

   select rsd into ResSetDesc from temp_rs;

   -- and run our query using the RSD
   
   ctx_query.result_set(
      index_name => 'docsindex',
      query      => 'record AND ( ( SDATA(source="Internal") OR SDATA(source="External") ) AND SDATA(region="South-East") )',
      result_set_descriptor => ResSetDesc,
      result_set => ResultSet );

   -- ResSetOut now contains XML result set: write it to temporary table

   update temp_rs set rs_text = xmltype(ResultSet);

end;
/
select xmltype.getclobval(rs_text) from temp_rs;
```
So that query looked for External or Internal sources, but only from South-East region.
That's all for now - please try these techniques out with your own data.
Feedback on, or corrections to these tutorials are welcomed. Hit the "Feedback" button at the top of the page, or contact the author, Roger Ford

Additional Information

Database on OTN SQL and PL/SQL Discussion forums
Oracle Database
Download Oracle Database

Using Faceted Navigation

Introduction

Creating a table and index for use with faceted navigation

Querying facets using the Result Set Interface

Processing the Result Set XML

"Drilling Down" with Facets - Facet Filtering

Additional Information