Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle: how to do full text searches on an XMLType?

I have an app storing XML in an Oracle table as XMLType. I want to do full text searches on that data. The Oracle documentation, in Full-Text Search Over XML Data, recommends to use the contains SQL function, which requires the data to be indexed with a context index. The trouble is that it appears that context indexes are asynchronous, which doesn't fit the use case I have where I need to be able to search through data right after it was added.

Can I make that index somehow synchronous? If not, what other technique should I use to do full text searches on an XMLType?

like image 657
avernet Avatar asked Jun 14 '11 01:06

avernet


People also ask

How do I view XMLType in SQL Developer?

Once the data is in the table, you can preview it by mousing over the XML cell in the data grid. If you double-click and hit the 'Edit' button again, you get a text preview of the XML, or you can invoke the full XML editor again. SQL Developer also supports working with XML Schemas and the XML DB Repository.

What is the size of XMLType in Oracle?

XML Identifier Length Limit – Oracle XML DB supports only XML identifiers that are 4000 characters long or shorter.

What is XMLType datatype?

XMLType datatype can be used as the datatype of columns in tables and views. Variables of XMLType can be used in PL/SQL stored procedures as parameters, return values, and so on. You can also use XMLType in SQL, PL/SQL, and Java (through JDBC).

What is Oracle XMLType?

XMLType is a system-defined opaque type for handling XML data. It as predefined member functions on it to extract XML nodes and fragments. You can create columns of XMLType and insert XML documents into it.


2 Answers

It can't be made transactional (i.e. it won't update the index so that the change is visible to a subsequent statement within the transaction). The best you can do is make it update on commit (SYNC ON COMMIT), as in:

create index your_table_x
    on your_table(your_column)
    indextype is ctxsys.context
    parameters ('sync (on commit)');

Text indexes are complex things and I'd be surprised if you could achieve a transactional / ACID compliant text index (that is, transaction A inserting documents and have those visible in the index for that transaction and not visible to transaction B until commit).

like image 127
Gary Myers Avatar answered Nov 15 '22 08:11

Gary Myers


  1. You could update the index at a regular interval, in a cron-like kind of way. At worse, you can update the index after every update to the table, with sync_index on which the index is built. For instance: EXEC CTX_DDL.SYNC_INDEX('your_index'); I am not a big fan of this technique because of the complexity it introduces. In addition to the cron-like aspect, you have to deal with index fragmentation, which might require you to do full updates from time to time. Update: instead of updating the index at a regular interval, you can update it on commit, as suggested by Gary, which is really what you're looking for.

  2. You can do a simple text search on the XML document, as if you were doing a ctrl-f with the XML in a text editor. In many cases, this doesn't give you the expected result as users don't care if the string they are searching for happens to be used in an element name, attribute name, or namespace. But, if this method works for you, go for it: it is simple and fairly fast. For instance:

    select count(*) from your_table d
    where lower(d.your_column.getClobVal()) like '%gaga%';
    
  3. Using existsNode() in a where clause, as in the example below. There are two potential issues with this. First, without proper indexes, this is slower then the method #2, by a factor of about 2 in my testing, and I am not sure how to create an index on unstructured data that would be used by this query. Second, you'll be doing a case-sensitive search, which is often not what you want. And you can't just call XPath's lower-case(), as Oracle only supports XPath 1.0.

    select * from your_table 
    where existsNode(your_column, '//text()[contains(., "gaga")]') = 1;
    
like image 41
avernet Avatar answered Nov 15 '22 08:11

avernet