Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python module for searching patent databases, ie USPTO or EPO

For my work i have to find potential customers in biomedical research and industry.

I wrote some pretty handy programs using the module biopython, which has a nice interface for searching NCBI. I have also used the clinical_trials module, to search clinicaltrials.gov.

I now want to search patent databases, like EPO or USPTO, but i haven't been able to find even the slightest trace of python module. But maybe i'm missing something obvious?

Since google has a patent search option, i was wondering if there might be a python module for searching google which could be adapted to only searching patents?

like image 242
Misconstruction Avatar asked Feb 22 '13 15:02

Misconstruction


People also ask

What is the best patent search engine?

Google Patents is a free tool that allows users to search the complete texts of patents from all across the globe, including 7 million patents in the U.S. alone.

How do I run a patent search?

Start at uspto.gov/patft. Next, under the heading Related USPTO Services, click on Tools to Help Searching by Patent Classification. You can now start searching. Patent searches may also be done at google.com/patents and at a number of other free sites.


1 Answers

You can parse at least the USPTO using any XML parsing tool such as the lxml python module.

There is a great paper on doing just this by Gabe Fierro, available here: Extracting and Formatting Patent Data from USPTO XML (no paywall)

Gabe also participated in some useful discussion on doing this here on this google group.

Finally, if you know what you're looking for and have plenty of disk space you can also get the bulk data stored locally for processing. USPTO bulk downloads here.

Any more specific questions please let me know! I've trod some of this ground before :)

Also, the Google Patent search API is deprecated but you can now do those same searches through the main Google search API using URL tags (I don't have them handy but you can find them with a search via Google patents which will be responded to by google.com).

UPDATE: At home now, the flag you want to use the google custom search API for patent searching is &tbm=pts - please note that the google custom search engine and getting a code for same is hugely beneficial for patent searching because the JSON delivered has a nice data structure with patent-specific fields.

Example Code:

import requests
import urllib
import time
import json

access_token = <get yours by signing up for google custom search engine api>
cse_id = <get yours by signing up for google custom search engine api>

# Build url
start=1
search_text = "+(inassignee:\"Altera\" | \"Owner name: Altera\") site:www.google.com/patents/"
# &tbm=pts sets you on the patent search
url = 'https://www.googleapis.com/customsearch/v1?key='+access_token+'&cx='+cse_id+'&start='+str(start)+'&num=10&tbm=pts&q='+ urllib.quote(search_text)

response = requests.get(url)

response.json()
f = open('Sample_patent_data'+str(int(time.time()))+'.txt', 'w')
f.write(json.dumps(response.json(), indent=4))
f.close()

This will (once you add the free API access info) grab the first ten entries of patents owned by Altera (as an example) and save the resulting JSON to a text file. Pull up your favorite web JSON editor and take a look at the JSON file. In particular I recommend looking in ['items'][] and the sub ['pagemap']. Just by parsing this JSON you can get titles, thumbnails, snippets, title, link, even citations (when relevant).

like image 65
Ezekiel Kruglick Avatar answered Nov 15 '22 17:11

Ezekiel Kruglick