Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL calculate item frequency using multiple / dependent columns?

Tags:

sql

oracle

I'm completely new to SQL, and have read StackOverflow posts on SQL to try and figure this out, and other sources and unable to do this in SQL. Here goes...

I have a table of 3 columns and thousands of rows, with data for first 2 columns. The third column is currently empty and I need to populate the third column based on data already in the first and second columns.

Say I have states in the first column and fruit entries in the second column. I need to write an SQL statement(s) that calculates the number of different states where each fruit comes from, and then inserts this popularity number into the third column for every row. A popularity number of 1 in that row means that fruit only comes from one state, a popularity number of 4 means the fruit comes from 4 states. So my table is currently like:

state     fruit     popularity

hawaii    apple     
hawaii    apple     
hawaii    banana       
hawaii    kiwi      
hawaii    kiwi      
hawaii    mango        
florida   apple      
florida   apple        
florida   apple        
florida   orange      
michigan  apple     
michigan  apple     
michigan  apricot   
michigan  orange    
michigan  pear      
michigan  pear      
michigan  pear      
texas     apple     
texas     banana    
texas     banana    
texas     banana    
texas     grape     

And I need to figure out how to calculate and then update the third column, named popularity, which is the number of states that exports that fruit. The goal is to produce (sorry bad pun) the table below, where based on above table, "apple" appears in all 4 states, oranges and banana appear in 2 states, and kiwi, mango, pear, and grape only appear in 1 state, hence their corresponding popularity numbers.

state     fruit     popularity

hawaii    apple     4
hawaii    apple     4
hawaii    banana    2   
hawaii    kiwi      1
hawaii    kiwi      1
hawaii    mango     1   
florida   apple     4 
florida   apple     4   
florida   apple     4   
florida   orange    2  
michigan  apple     4
michigan  apple     4
michigan  apricot   1
michigan  orange    2
michigan  pear      1
michigan  pear      1
michigan  pear      1
texas     apple     4
texas     banana    2
texas     banana    2
texas     banana    2
texas     grape     1

My small programmer brain says to try and figure out a way to loop through the data in some kind of script, but reading up a little on SQL and databases, it seems like you don't write long and slow looping scripts in SQL. I'm not even sure if you can? but instead that there are better/faster ways to do this in SQL.

Anyone know how to, in SQL statement(s), calculate and update the third column for each row, which is here called popularity and corresponds to the number of states that each fruit comes from? Thanks for reading, very grateful for any help.

So far I have tried these SQL statements below, which output but don't quite get me what I need:

--outputs those fruits appearing multiple times in the table
SELECT fruit, COUNT(*)
  FROM table 
 GROUP BY fruit
HAVING COUNT(*) > 1
 ORDER BY COUNT(*) DESC

--outputs those fruits appearing only once in the table
SELECT fruit, COUNT(*)
  FROM table 
 GROUP BY fruit
HAVING COUNT(*) = 1

--outputs list of unique fruits in the table
SELECT COUNT (DISTINCT(fruit))
  FROM table
like image 852
Tim West Avatar asked Aug 01 '12 14:08

Tim West


People also ask

How do you check if something appears more than once in SQL?

To find duplicate values in SQL, you must first define your criteria for duplicates and then write the query to support the search. In order to see how many of these names appear more often than others, you could add an additional ORDER BY statement to the end of the query and order by DESC.

Can you group by multiple columns in SQL?

We can group the resultset in SQL on multiple column values. When we define the grouping criteria on more than one column, all the records having the same value for the columns defined in the group by clause are collectively represented using a single record in the query output.


2 Answers

If you want to simply update your table with the priority it would look like:

update my_table x
   set popularity = ( select count(distinct state) 
                        from my_table
                       where fruit = x.fruit )

If you want to select the data then you can use an analytic query:

select state, fruit
     , count(distinct state) over ( partition by fruit ) as popularity
  from my_table

This provides the number of distinct states, per fruit.

like image 114
Ben Avatar answered Oct 13 '22 11:10

Ben


I ran this and got (what I think) is what you want:

WITH t
  AS (SELECT 'hawaii' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'hawaii' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'hawaii' as STATE, 'banana' as fruit FROM dual
      UNION ALL
      SELECT 'hawaii' as STATE, 'kiwi' as fruit FROM dual
      UNION ALL
      SELECT 'hawaii' as STATE, 'kiwi' as fruit FROM dual
      UNION ALL
      SELECT 'hawaii' as STATE, 'mango' as fruit FROM dual
      UNION ALL
      SELECT 'florida' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'florida' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'florida' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'florida' as STATE, 'orange' as fruit FROM dual
      UNION ALL
      SELECT 'michigan' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'michigan' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'michigan' as STATE, 'apricot' as fruit FROM dual
      UNION ALL
      SELECT 'michigan' as STATE, 'orange' as fruit FROM dual
      UNION ALL
      SELECT 'michigan' as STATE, 'pear' as fruit FROM dual
      UNION ALL
      SELECT 'michigan' as STATE, 'pear' as fruit FROM dual
      UNION ALL
      SELECT 'michigan' as STATE, 'pear' as fruit FROM dual
      UNION ALL
      SELECT 'texas' as STATE, 'apple' as fruit FROM dual
      UNION ALL
      SELECT 'texas' as STATE, 'banana' as fruit FROM dual
      UNION ALL
      SELECT 'texas' as STATE, 'banana' as fruit FROM dual
      UNION ALL
      SELECT 'texas' as STATE, 'banana' as fruit FROM dual
      UNION ALL
      SELECT 'texas' as STATE, 'grape' as fruit FROM dual)
SELECT state,
       fruit,
       count(DISTINCT state) OVER (PARTITION BY fruit) AS popularity
  FROM t;

Returned

florida     apple   4
florida     apple   4
florida     apple   4
hawaii      apple   4
hawaii      apple   4
michigan    apple   4
michigan    apple   4
texas       apple   4
michigan    apricot 1
hawaii      banana  2
texas       banana  2
texas       banana  2
texas       banana  2
texas       grape   1
hawaii      kiwi    1
hawaii      kiwi    1
hawaii      mango   1
florida     orange  2
michigan    orange  2
michigan    pear    1
michigan    pear    1

Obviously, you'd only need to run:

SELECT state,
       fruit,
       count(DISTINCT state) OVER (PARTITION BY fruit) AS popularity
  FROM table_name;

Hope it helps...

like image 28
Ollie Avatar answered Oct 13 '22 11:10

Ollie