According to wikipedia:
A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np − h) papers have no more than h citations each.
Imagine we have SCIENTISTS, PAPERS, CITATIONS tables with 1-n relation between SCIENTISTS and PAPERS and 1-n relation between PAPERS and CITATION TABLES. How to write a SQL statement that would compute h-score for each scientist in SCIENTISTS table?
To present some research effort I did here is a SQL computing number of citations for each paper:
SELECT COUNT(CITATIONS.id) AS citations_count
FROM PAPERS
LEFT OUTER JOIN CITATIONS ON (PAPERS.id = CITATIONS.paper_id)
GROUP BY PAPERS.id
ORDER BY citations_count DESC;
The h-index is a measure of the number of publications published (productivity), as well as how often they are cited. h-index = the number of publications with a citation number greater than or equal to h. For example, 15 publications cited 15 times or more, is a h-index of 15.
To manually calculate your h-index, organize articles in descending order, based on the number of times they have been cited. In the below example, an author has 8 papers that have been cited 33, 30, 20, 15, 7, 6, 5 and 4 times. This tells us that the author's h-index is 6.
The h-index can be calculated automatically in Web of Science and Scopus or manually in other databases that provide citation information (e.g. SciFinder, PsychINFO, Google Scholar). The index is based on a list of publications ranked in descending order by the number of citations these publications received.
The h-index reflects both the number of publications and the number of citations per publication. For example a scientist with an h-index of 20 has 20 papers cited at least 20 times. The i10-index is the number of articles with at least 10 citations.
What the h-value is doing is counting the citations in two ways. Let's say a scientist has the following citation counts:
10
8
5
5
2
1
Let's the number that have that many or more citations, and the difference between the two:
10 1 9
8 2 6
5 3 2
5 3 2
2 5 -3
1 6 -5
The number you want is where this is 0. In this case, the number is 4.
The fact that the number is 4 makes this hard, because it is not in the original data. That makes the calculation harder, because you need to generate a numbers table.
The following does this using SQL Server syntax for generating a table with 100 numbers:
with numbers as (
select 1 as n
union all
select n+1
from numbers
where n < 100
),
numcitations as (
SELECT p.scientistid, p.id, COUNT(c.id) AS citations_count
FROM PAPERS p LEFT OUTER JOIN
CITATIONS c
ON p.id = c.paper_id
GROUP BY p.scientist, p.id
),
hcalc as (
select scientistid, numbers.n,
(select count(*)
from numcitations nc
where nc.scientistid = s.scientistid and
nc.citations_count >= numbers.n
) as hval
from numbers cross join
(select scientistid from scientist) s
)
select *
from hcalc
where hval = n;
EDIT:
There is a way to do this without using the numbers table. The h-score is the count of cases where the number of citations is greater than or equal to the citation count. This is much easier to calculate:
select scientistid, count(*)
from (SELECT p.scientistid, p.id, COUNT(c.id) AS citations_count,
rank() over (partition by p.scientistid, p.id order by count(c.id) desc) as ranking
FROM PAPERS p LEFT OUTER JOIN
CITATIONS c
ON p.id = c.paper_id
GROUP BY p.scientist, p.id
) t
where ranking <= citations_count
group by scientistid;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With