Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql CITEXT Datatype Issue JPA Hibernate

I'm having difficulty using the CITEXT datatype in PostgreSQL using JPA and Hibernate. CITEXT is supposed to provide a case insensitive text datatype but when used with JPA/Hibernate it doesn't behave in a case insensitive manner. Has anyone else had this issue or know a way around it? I have seen some mention (but very, very little) about a JDBC issue, but that went back at least a year and wasn't very clear.

I have a 'nickname' column defined as citext in postgres 9.1. I just did a test to see if it could find a row using a named query as such:

create table test(
    nickname citext
)

@NamedQuery(name = "Person.findByNickname", 
            query = "SELECT p 
                     FROM Person p 
                     WHERE p.nickname = :nickname")

Insert a nickname into the DB:

insert into test values('testNick')

Then run this code:

String nickname = "testNick";

Query q = em.createNamedQuery("Person.findByNickname");
q.setParameter("nickname", nickname);
if (q.getResultList().isEmpty()) {
    return (false);
}
return (true);

This returns 'true' (i.e. there is already a 'testNick' in the database).

If I make this assignment

String nickname = "testnick"; //(lower case 'N') 

and run it again it returns 'false'.

Since the column is CITEXT, it should return 'true' again. i.e. case insensitive text.

Using JPA and Hibernate. Anyone have any thoughts?

In the meantime I've changed the column back to varchar and created a functional index for lowercase. And I have to create a native query now to search using database functions. Would like to find out if there is a way I can not have to do this to maintain the database abstraction.

Regards.

like image 501
Bill Rosmus Avatar asked Aug 20 '12 02:08

Bill Rosmus


2 Answers

citext provides case-insensitive operators for use within the database, with other citext values.

What's happening

At a guess, your JPA implementation is explicitly specifying the type of the parameter as text when it creates the parameterized statement. citext doesn't define a citext = text operator, so PostgreSQL casts the citext to text and uses the case-sensitive text = text operator. Effectively, comparing citext to text is case sensitive.

Here's what I think is happening. Given the dummy data:

regress=# CREATE EXTENSION citext;
regress=# CREATE TABLE citest ( x citext );
regress=# INSERT INTO citest(x) VALUES ('FRED'), ('FrEd');
regress=# SELECT * FROM citest;
  x   
------
 FRED
 FrEd
(2 rows)

... a comparison of citext to an unknown string literal will be interpreted as citext=citext and be done case-insensitively:

regress=# SELECT * FROM citest WHERE x = 'FRED';
  x   
------
 FRED
 FrEd
(2 rows)

... but a comparison between citext and an explicitly text typed literal will convert the citext argument to text using citext's implicit cast to text, then do a text=text case sensitive comparison:

regress=# SELECT * FROM citest WHERE x = 'FRED'::text;
  x   
------
 FRED
(1 row)

Or rather, what Hibernate is doing will be closer to:

regress=# PREPARE blah(text) AS SELECT * FROM citest WHERE x = $1;
PREPARE
regress=# EXECUTE blah('FRED');
  x   
------
 FRED
(1 row)

where the type is specified as text when binding a parameter, since Hibernate "knows" that Strings are text.

In other words, you need to get Hibernate to, via PgJDBC, explicitly specify the citext datatype as the parameter type to your query, resulting in something like:

regress=# PREPARE blah(citext) AS SELECT * FROM citest WHERE x = $1;
PREPARE
regress=# EXECUTE blah('FRED');
  x   
------
 FRED
 FrEd
(2 rows)

Note the explicit citext type parameter to the prepared statement. That will be ... interesting ... to do, especially since PgJDBC doesn't know anything about the citext type. You'd have to write a custom data type handler for Hibernate that uses PgJDBC's setObject; even then you'll have operator consistency issues between Java and Pg (see below).

IMO you'll be much better off using traditional case sensitive types and lower(), ILIKE, etc.

It's also possible that Hibernate is relying on what PgJDBC tells it about column case sensitivity. At least as of 9.2-devel PgJDBC doesn't know anything about the citext type, so it'll always say "yup, that's case sensitive" when asked.

Tracing

It's hard to be sure that's what's happening without seeing the actual queries run by JPA. Try setting log_statement = 'all' in postgresql.conf. Then SIGHUP the postmaster, use pg_ctl reload, or restart Pg to cause the change to take effect.

Re-run your test and examine the logs. Test the queries you see in psql to observe the results. If you're unsure what's going on, update your question with them. If you update also include your Hibernate version and your PgJDBC version.

It's also possible that Hibernate is relying on what PgJDBC tells it about column case sensitivity. At least as of 9.2-devel PgJDBC doesn't know anything about the citext type, so it'll always say "yup, that's case sensitive" when asked.

Operator consistency difficulties

WARNING: The citext type cannot affect how Hibernate works with the text once it's come out of the database. It won't have any effect on the String.equals method, for example. You would need to tell Hibernate you want it to treat the text as case insensitive. Otherwise if you have a text or varchar primary/foreign key you can get situations where the Hibernate asks for the key "FRED", it gets "FrEd" back, and is quite confused because the DB returned a key that isn't equal - according to Hibernate - to the one it was asked for. Similar oddities will occur if you include citext-backed strings in equals and hashCode implementations in your entities.

Unfortunately JPA doesn't seem to specify annotation attributes in the @Column mapping for whether the column is case sensitive or not. Java doesn't have the concept of a case-insensitive string data type anyway, so it wouldn't do tons of good even if JPA did specify it.

You'll probably avoid confusing Hibernate too badly so long as you don't use citext for keys or include citext values in equals and hashCode.

like image 105
Craig Ringer Avatar answered Oct 12 '22 12:10

Craig Ringer


I am answering for the sake of future readers. The problem is that JDBC automatically casts String parameters to varchar, thus forcing comparison to be case-sensitive. This behavior can be changed by setting JDBC connection parameter "stringtype" to "unspecified".

If you are using JPA, put the following in your data source configuration:

<datasource jndi-name="java:jboss/datasources/testDS"
    pool-name="test" enabled="true"
    use-java-context="true" spy="true">
    <connection-url>jdbc:postgresql://localhost:5432/postgres</connection-url>
    <driver>postgresql</driver>
    <connection-property name="stringtype">unspecified</connection-property>
    <security>
        <user-name>postgres</user-name>
        <password>******</password>
    </security>
</datasource>
like image 37
Nikša Baldun Avatar answered Oct 12 '22 13:10

Nikša Baldun