Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distinct on Postgresql JSON data column

Trying to do distinct on a mode with rails.

2.1.1 :450 > u.profiles.select("profiles.*").distinct


Profile Load (0.9ms)  SELECT DISTINCT profiles.* FROM "profiles" INNER JOIN "integration_profiles" ON "profiles"."id" = "integration_profiles"."profile_id" INNER JOIN "integrations" ON "integration_profiles"."integration_id" = "integrations"."id" WHERE "integrations"."user_id" = $1  [["user_id", 2]]
PG::UndefinedFunction: ERROR:  could not identify an equality operator for type json
LINE 1: SELECT DISTINCT profiles.* FROM "profiles" INNER JOIN "integ...
                        ^
: SELECT DISTINCT profiles.* FROM "profiles" INNER JOIN "integration_profiles" ON "profiles"."id" = "integration_profiles"."profile_id" INNER JOIN "integrations" ON "integration_profiles"."integration_id" = "integrations"."id" WHERE "integrations"."user_id" = $1
ActiveRecord::StatementInvalid: PG::UndefinedFunction: ERROR:  could not identify an equality operator for type json
LINE 1: SELECT DISTINCT profiles.* FROM "profiles" INNER JOIN "integ...
                        ^
: SELECT DISTINCT profiles.* FROM "profiles" INNER JOIN "integration_profiles" ON "profiles"."id" = "integration_profiles"."profile_id" INNER JOIN "integrations" ON "integration_profiles"."integration_id" = "integrations"."id" WHERE "integrations"."user_id" = $1
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/rack-mini-profiler-0.9.1/lib/patches/sql_patches.rb:109:in `prepare'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/rack-mini-profiler-0.9.1/lib/patches/sql_patches.rb:109:in `prepare'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/postgresql_adapter.rb:834:in `prepare_statement'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/postgresql_adapter.rb:795:in `exec_cache'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/postgresql/database_statements.rb:139:in `block in exec_query'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/abstract_adapter.rb:442:in `block in log'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activesupport-4.0.4/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/abstract_adapter.rb:437:in `log'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/postgresql/database_statements.rb:137:in `exec_query'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/postgresql_adapter.rb:908:in `select'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/abstract/database_statements.rb:32:in `select_all'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/connection_adapters/abstract/query_cache.rb:63:in `select_all'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/querying.rb:36:in `find_by_sql'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/relation.rb:585:in `exec_queries'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/association_relation.rb:15:in `exec_queries'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/relation.rb:471:in `load'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/relation.rb:220:in `to_a'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/activerecord-4.0.4/lib/active_record/relation.rb:573:in `inspect'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/railties-4.0.4/lib/rails/commands/console.rb:90:in `start'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/railties-4.0.4/lib/rails/commands/console.rb:9:in `start'
    from /Users/mmahalwy/.rvm/gems/ruby-2.1.1/gems/railties-4.0.4/lib/rails/commands.rb:62:in `<top (required)>'
    from bin/rails:4:in `require'
    from bin/rails:4:in `<main>'2.1.1 :451 > 

Getting an error PG::UndefinedFunction: ERROR: could not identify an equality operator for type json

Converting to Hstore is not an option for me in this case. Any work arounds?

like image 414
Mohamed El Mahallawy Avatar asked May 07 '14 05:05

Mohamed El Mahallawy


People also ask

How do I get unique values from a column in PostgreSQL?

Removing duplicate rows from a query result set in PostgreSQL can be done using the SELECT statement with the DISTINCT clause. It keeps one row for each group of duplicates. The DISTINCT clause can be used for a single column or for a list of columns.

How do I use distinct in one column in SQL?

Adding the DISTINCT keyword to a SELECT query causes it to return only unique values for the specified column list so that duplicate rows are removed from the result set. Since DISTINCT operates on all of the fields in SELECT's column list, it can't be applied to an individual field that are part of a larger group.

Can we use distinct for more than one column?

The DISTINCT clause is used in the SELECT statement to remove duplicate rows from a result set. The DISTINCT clause keeps one row for each group of duplicates. The DISTINCT clause can be applied to one or more columns in the select list of the SELECT statement.

What is distinct on in PostgreSQL?

PostgreSQL also provides on an expression as DISTINCT ON that is used with the SELECT statement to remove duplicates from a query set result just like the DISTINCT clause.In addition to that it also keeps the “first row” of each row of duplicates in the query set result.


2 Answers

The reason behind this, is that in PostgreSQL (up to 9.3) there is no equality operator defined for json (i.e. val1::json = val2::json will always throw this exception) -- in 9.4 there will be one for the jsonb type.

One workaround is, you can cast your json field to text. But that won't cover all json equalitions. f.ex. {"a":1,"b":2} should be equal to {"b":2,"a":1}, but won't be equal if casted to text.

Another workaround is (if you have a primary key for that table -- which should be) you can use the DISTINCT ON (<expressions>) form:

u.profiles.select("DISTINCT ON (profiles.id) profiles.*")

Note: One known caveat for DISTINCT ON:

The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.

like image 199
pozs Avatar answered Oct 22 '22 02:10

pozs


Sorry I'm late on this answer, but it might help others.

As I understand your query, you're only getting possible duplicates on profiles because of the many-to-many join to integrations (which you're using to determine which profiles to access).

Because of that, you can use a new GROUP BY feature as of 9.1:

When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.

So in your case, you could get Ruby to create the query (sorry, I don't know the Ruby syntax you're using)...

SELECT profiles.* 
FROM "profiles" 
  INNER JOIN "integration_profiles" ON "profiles"."id" = "integration_profiles"."profile_id" 
  INNER JOIN "integrations" ON "integration_profiles"."integration_id" = "integrations"."id" 
WHERE "integrations"."user_id" = $1
GROUP BY "profiles"."id"

I only removed the DISTINCT from your SELECT clause and added the GROUP BY.

By referring ONLY to the id in the GROUP BY, you take advantage of that new feature because all the remaining profiles columns are "functionally dependent" on that id primary key.

Somehow, wonderfully that avoids the need for Postgres to do equality checks on the dependent columns (ie your json column in this case).

The DISTINCT ON solution is also great, and clearly sufficient in your case, but you can't use aggregate functions like array_agg with it. You CAN with this GROUP BY approach. Happy days! :)

like image 10
poshest Avatar answered Oct 22 '22 01:10

poshest