Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pull the citations for a paper from google scholar using R

Using google-scholar and R, I'd like to find out who is citing a particular paper.

The existing packages (like scholar) are oriented towards H-index analyses: statistics on a researcher.

I want to give a target-paper as input. An example url would be:

https://scholar.google.co.uk/scholar?oi=bibs&hl=en&cites=12939847369066114508

Then R should scrape these citations pages (google scholar paginates these) for the paper, returning an array of papers which cite the target (up to 500 or more citations). Then we'd search for keywords in the titles, tabulate journals and citing authors etc.

Any clues as to how to do that? Or is it down to literally scraping each page? (which I can do with copy and paste for one-off operations).

Seems like this should be a generally useful function for things like seeding systematic reviews as well, so someone adding this to a package might well increase their H :-)

like image 235
tim Avatar asked Mar 13 '15 10:03

tim


People also ask

How do I download RIS from Google Scholar?

(You will first need to activate this feature by going to Menu > Settings > Search results > Bibliography manager and selecting "show links to import citations to RefMan"). The reference will automatically download as a . ris file, which will be added to Zotero when you open it.

Does Google Scholar have a citation generator?

Google Scholar's Citation Generator Click the 'Cite' link under any Google Scholar result to see an automatically generated citation in MLA, APA, and Chicago styles. If the 'Cite' link isn't visible, click More under the result.


1 Answers

Alternatively, you could use a third party solution like SerpApi. It's a paid API with a free trial. We handle proxies, solve captchas, and parse all rich structured data for you.

Example python code (available in other libraries also):

from serpapi import GoogleSearch

params = {
  "api_key": "secret_api_key",
  "engine": "google_scholar",
  "hl": "en",
  "cites": "12939847369066114508"
}

search = GoogleSearch(params)
results = search.get_dict()

Example JSON output:

{
  "position": 1,
  "title": "Lavaan: An R package for structural equation modeling and more. Version 0.5–12 (BETA)",
  "result_id": "HYlMgouq9VcJ",
  "type": "Pdf",
  "link": "https://users.ugent.be/~yrosseel/lavaan/lavaanIntroduction.pdf",
  "snippet": "Abstract In this document, we illustrate the use of lavaan by providing several examples. If you are new to lavaan, this is the first document to read … 3.1 Entering the model syntax as a string literal … 3.2 Reading the model syntax from an external file …",
  "publication_info": {
    "summary": "Y Rosseel - Journal of statistical software, 2012 - users.ugent.be",
    "authors": [
      {
        "name": "Y Rosseel",
        "link": "https://scholar.google.com/citations?user=0R_YqcMAAAAJ&hl=en&oi=sra",
        "serpapi_scholar_link": "https://serpapi.com/search.json?author_id=0R_YqcMAAAAJ&engine=google_scholar_author&hl=en",
        "author_id": "0R_YqcMAAAAJ"
      }
    ]
  },
  "resources": [
    {
      "title": "ugent.be",
      "file_format": "PDF",
      "link": "https://users.ugent.be/~yrosseel/lavaan/lavaanIntroduction.pdf"
    }
  ],
  "inline_links": {
    "serpapi_cite_link": "https://serpapi.com/search.json?engine=google_scholar_cite&q=HYlMgouq9VcJ",
    "cited_by": {
      "total": 10913,
      "link": "https://scholar.google.com/scholar?cites=6338159566757071133&as_sdt=2005&sciodt=0,5&hl=en",
      "cites_id": "6338159566757071133",
      "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=2005&cites=6338159566757071133&engine=google_scholar&hl=en"
    },
    "related_pages_link": "https://scholar.google.com/scholar?q=related:HYlMgouq9VcJ:scholar.google.com/&scioq=&hl=en&as_sdt=2005&sciodt=0,5",
    "versions": {
      "total": 27,
      "link": "https://scholar.google.com/scholar?cluster=6338159566757071133&hl=en&as_sdt=2005&sciodt=0,5",
      "cluster_id": "6338159566757071133",
      "serpapi_scholar_link": "https://serpapi.com/search.json?as_sdt=2005&cluster=6338159566757071133&engine=google_scholar&hl=en"
    },
    "cached_page_link": "https://scholar.googleusercontent.com/scholar?q=cache:HYlMgouq9VcJ:scholar.google.com/&hl=en&as_sdt=2005&sciodt=0,5"
  }
},
...

Check out the documentation for more details.

Disclaimer: I work at SerpApi.

like image 108
Milos Djurdjevic Avatar answered Sep 27 '22 21:09

Milos Djurdjevic