Heroku and Web scraping

Tags:

I have a nokigiri web scraper that publishes to a database that I'm trying to publish to heroku. I have a sinatra application frontend that I want to have pull in from the database. I'm new to Heroku and web development, and don't know the best way to handle something like this.

Do I have to place the web scraper script that uploads to the database under a sinatra route (like mywebsite.com/scraper ) and just make it so obscure that no one visits it? In the end, I'd like to have the sinatra part be a rest api that pulls from the database.

Thanks for all input

817

asked Jul 12 '13 00:07

John Lamburger

1 Answers

There are two approaches you can take.

The first one is to use One-off dynos by running the scraper through the console using heroku run YOURCMD. Just make sure scraper don't write to disk but uses database.

More information: https://devcenter.heroku.com/articles/one-off-dynos

The second is differentiating between scraper and web process in a way that you have web process for normal UI interaction and a scraper process which web process can spawn/talk to. If you take this route it's up to you how to protect it from rest of the world (auth/url obfuscation etc.).

More information: https://devcenter.heroku.com/articles/background-jobs-queueing

174

answered Sep 20 '22 11:09

XLII

Related questions
                            
                                Combining RSpec filters?
                            
                                Issue installing Ruby rvm (error while running configure) [closed]
                            
                                How can I print the runtime stack trace of a Ruby 1.9 process?
                            
                                Getting started with MacRuby and Xcode 4.2
                            
                                How can I detect whether a directory is writeable in Ruby
                            
                                Issue installing the Nokogiri gem in Mac OS X 10.5.8
                            
                                Omission of curly braces for a hash in an array
                            
                                Bad File Descriptor in Ruby Daemons
                            
                                Debugging a Ruby segfault
                            
                                How can I debug plugins that are being silently ignored?
                            
                                logging warnings (not errors) in a rails application, and managing them
                            
                                Rails 3: Should I explicitly save an object in an after_create callback?
                            
                                Delayed Job creating Airbrakes every time it raises an error
                            
                                Ruby on Rails Database Deployment with Gerrit
                            
                                Best practice or workaround for RSpec specs faking class constants
                            
                                Check whether I am in a delayed_job process or not
                            
                                Using ActiveRecord interface for Models backed by external API in Ruby on Rails
                            
                                Monitoring Ruby script, using Monit - Including RVM
                            
                                Ruby file handle management (too many open files)
                            
                                How do I get rbenv to keep debugging symbols?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Heroku and Web scraping

Tags:

ruby

web-services

heroku

sinatra

api

John Lamburger

People also ask

1 Answers

XLII

Recent Activity

Donate For Us