How can I search through Stack Overflow questions from a script?

Tags:

Given a string of keywords, such as "Python best practices", I would like to obtain the first 10 Stack Overflow questions that contain that keywords, sorted by relevance (?), say from a Python script. My goal is to end up with a list of tuples (title, URL).

How can I accomplish this? Would you consider querying Google instead? (How would you do it from Python?)

738

asked Oct 13 '08 05:10

Federico A. Ramponi

2 Answers

>>> from urllib import urlencode
>>> params = urlencode({'q': 'python best practices', 'sort': 'relevance'})
>>> params
'q=python+best+practices&sort=relevance'
>>> from urllib2 import urlopen
>>> html = urlopen("http://stackoverflow.com/search?%s" % params).read()
>>> import re
>>> links = re.findall(r'<h3><a href="([^"]*)" class="answer-title">([^<]*)</a></h3>', html)
>>> links
[('/questions/5119/what-are-the-best-rss-feeds-for-programmersdevelopers#5150', 'What are the best RSS feeds for programmers/developers?'), ('/questions/3088/best-ways-to-teach-a-beginner-to-program#13185', 'Best ways to teach a beginner to program?'), ('/questions/13678/textual-versus-graphical-programming-languages#13886', 'Textual versus Graphical Programming Languages'), ('/questions/58968/what-defines-pythonian-or-pythonic#59877', 'What defines &#8220;pythonian&#8221; or &#8220;pythonic&#8221;?'), ('/questions/592/cxoracle-how-do-i-access-oracle-from-python#62392', 'cx_Oracle - How do I access Oracle from Python? '), ('/questions/7170/recommendation-for-straight-forward-python-frameworks#83608', 'Recommendation for straight-forward python frameworks'), ('/questions/100732/why-is-if-not-someobj-better-than-if-someobj-none-in-python#100903', 'Why is if not someobj: better than if someobj == None: in Python?'), ('/questions/132734/presentations-on-switching-from-perl-to-python#134006', 'Presentations on switching from Perl to Python'), ('/questions/136977/after-c-python-or-java#138442', 'After C++ - Python or Java?')]
>>> from urlparse import urljoin
>>> links = [(urljoin('http://stackoverflow.com/', url), title) for url,title in links]
>>> links
[('http://stackoverflow.com/questions/5119/what-are-the-best-rss-feeds-for-programmersdevelopers#5150', 'What are the best RSS feeds for programmers/developers?'), ('http://stackoverflow.com/questions/3088/best-ways-to-teach-a-beginner-to-program#13185', 'Best ways to teach a beginner to program?'), ('http://stackoverflow.com/questions/13678/textual-versus-graphical-programming-languages#13886', 'Textual versus Graphical Programming Languages'), ('http://stackoverflow.com/questions/58968/what-defines-pythonian-or-pythonic#59877', 'What defines &#8220;pythonian&#8221; or &#8220;pythonic&#8221;?'), ('http://stackoverflow.com/questions/592/cxoracle-how-do-i-access-oracle-from-python#62392', 'cx_Oracle - How do I access Oracle from Python? '), ('http://stackoverflow.com/questions/7170/recommendation-for-straight-forward-python-frameworks#83608', 'Recommendation for straight-forward python frameworks'), ('http://stackoverflow.com/questions/100732/why-is-if-not-someobj-better-than-if-someobj-none-in-python#100903', 'Why is if not someobj: better than if someobj == None: in Python?'), ('http://stackoverflow.com/questions/132734/presentations-on-switching-from-perl-to-python#134006', 'Presentations on switching from Perl to Python'), ('http://stackoverflow.com/questions/136977/after-c-python-or-java#138442', 'After C++ - Python or Java?')]

Converting this to a function should be trivial.

EDIT: Heck, I'll do it...

def get_stackoverflow(query):
    import urllib, urllib2, re, urlparse
    params = urllib.urlencode({'q': query, 'sort': 'relevance'})
    html = urllib2.urlopen("http://stackoverflow.com/search?%s" % params).read()
    links = re.findall(r'<h3><a href="([^"]*)" class="answer-title">([^<]*)</a></h3>', html)
    links = [(urlparse.urljoin('http://stackoverflow.com/', url), title) for url,title in links]

    return links

157

answered Nov 15 '22 02:11

itsadok

Since Stackoverflow already has this feature you just need to get the contents of the search results page and scrape the information you need. Here is the URL for a search by relevance:

https://stackoverflow.com/search?q=python+best+practices&sort=relevance

If you View Source, you'll see that the information you need for each question is on a line like this:

<h3><a href="/questions/5119/what-are-the-best-rss-feeds-for-programmersdevelopers#5150" class="answer-title">What are the best RSS feeds for programmers/developers?</a></h3>

So you should be able to get the first ten by doing a regex search for a string of that form.

answered Nov 15 '22 04:11

Paige Ruten

Related questions
                            
                                ValueError when trying to use pipenv install
                            
                                How to select only few columns in scikit learn column selector pipeline?
                            
                                How to call Rust async method from Python?
                            
                                Sum of an array while ignoring one minimum and one maximum
                            
                                How to speed up pandas apply for string matching
                            
                                NameError: name 'defaultParams' is not defined while running the .exe converted using Pyinstaller
                            
                                Flip a boolean value without referencing it twice
                            
                                Python3 virtualenv installation borked: No module named 'virtualenv.seed.via_app_data'
                            
                                How do I automerge dependabot updates (config version 2)?
                            
                                FastAPI - ENUM type models not populated
                            
                                Why is my function partially doing what it’s supposed to do?
                            
                                Split dataframe with all values in one row
                            
                                Cannot import name '_png' from 'matplotlib'
                            
                                When using f.read() the iteration loops per letter
                            
                                N-Queens II using backtracking is slow
                            
                                Why is Python creating a complex number here?
                            
                                Share/percent across a list of columns in a pandas agg
                            
                                How to fix Unterminated expression in f-string; missing close brace in python?
                            
                                Why is 10 to the power not equal to scientific notation for large numbers in Python?
                            
                                Sorting a dict on __iter__

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I search through Stack Overflow questions from a script?

Tags:

python

search

scripting

stackexchange-api

Federico A. Ramponi

People also ask

2 Answers

itsadok

Paige Ruten

Recent Activity

Donate For Us