SQL for computing h-score (h-index)

Tags:

According to wikipedia:

A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np − h) papers have no more than h citations each.

Imagine we have SCIENTISTS, PAPERS, CITATIONS tables with 1-n relation between SCIENTISTS and PAPERS and 1-n relation between PAPERS and CITATION TABLES. How to write a SQL statement that would compute h-score for each scientist in SCIENTISTS table?

To present some research effort I did here is a SQL computing number of citations for each paper:

Click to copy

SELECT COUNT(CITATIONS.id) AS citations_count
FROM PAPERS
LEFT OUTER JOIN CITATIONS ON (PAPERS.id = CITATIONS.paper_id)
GROUP BY PAPERS.id
ORDER BY citations_count DESC;

284

asked Sep 13 '13 12:09

mnowotka

1 Answers

What the h-value is doing is counting the citations in two ways. Let's say a scientist has the following citation counts:

Click to copy

Let's the number that have that many or more citations, and the difference between the two:

Click to copy

10    1    9
 8    2    6
 5    3    2
 5    3    2
 2    5    -3
 1    6    -5

The number you want is where this is 0. In this case, the number is 4.

The fact that the number is 4 makes this hard, because it is not in the original data. That makes the calculation harder, because you need to generate a numbers table.

The following does this using SQL Server syntax for generating a table with 100 numbers:

Click to copy

with numbers as (
      select 1 as n
      union all
      select n+1
      from numbers
      where n < 100
     ),
     numcitations as (
      SELECT p.scientistid, p.id, COUNT(c.id) AS citations_count
      FROM PAPERS p LEFT OUTER JOIN
           CITATIONS c
           ON p.id = c.paper_id
      GROUP BY p.scientist, p.id
     ),
     hcalc as (
      select scientistid, numbers.n,
             (select count(*)
              from numcitations nc
              where nc.scientistid = s.scientistid and
                    nc.citations_count >= numbers.n
             ) as hval
      from numbers cross join
           (select scientistid from scientist) s
     )
select *
from hcalc
where hval = n;

EDIT:

There is a way to do this without using the numbers table. The h-score is the count of cases where the number of citations is greater than or equal to the citation count. This is much easier to calculate:

Click to copy

select scientistid, count(*)
from (SELECT p.scientistid, p.id, COUNT(c.id) AS citations_count,
             rank() over (partition by p.scientistid, p.id order by count(c.id) desc) as ranking
      FROM PAPERS p LEFT OUTER JOIN
           CITATIONS c
           ON p.id = c.paper_id
      GROUP BY p.scientist, p.id
     ) t
where ranking <= citations_count
group by scientistid;

181

answered Sep 19 '22 15:09

Gordon Linoff

Related questions
                            
                                PostgreSQL - order by an array
                            
                                How to select a column without its name in sql server?
                            
                                Linq Filter row differences in historical
                            
                                Sql Stored Procedure while loop
                            
                                SQL Convert unicode encoded varbinary to string
                            
                                Duplicate values in GROUP_CONCAT when using two many-to-many [duplicate]
                            
                                Need SARGABLE way to filter records and also specify a default value for NULLs
                            
                                Does ServiceStack.OrmLite.JoinSqlBuilder allow to build a simple query
                            
                                Syntax issue SQL Server. Combining Pivot, XML parse and JOIN
                            
                                Search for ASCII values in sql server table
                            
                                SQL Basics: How to get details from multiple tables in one query?
                            
                                How to insert data to multiple tables with foreign key dependencies involved (MySQL)
                            
                                how mysql update self table work
                            
                                Exception in Add Parameter To DbCommand
                            
                                Customize Sql Server Object Explorer Right Click Menu Items
                            
                                left join with at least one row with condition from right mysql
                            
                                Execute A Dynamic SQL statement Stored in a Column of a table
                            
                                Best practice for storing constant stream of data
                            
                                Dynamic pivot null to 0
                            
                                Counting the number of rows which do not match

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL for computing h-score (h-index)

Tags:

algorithm

sql

indexing

scoring

mnowotka

People also ask

1 Answers

Gordon Linoff

Recent Activity

Donate For Us