Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set up rselenium for R?

Tags:

r

rselenium

"everything was better back then"...

since firefox 49 (?) you can't use the rselenium package not straightforward anymore. I have searched the whole internet to find a SIMPLE How To Manual for setting up rselenium but did not find anything relevant and uptodate.

Can someone provide me and all the others out there who have no clue a simple How To manual? Like:

  1. download XY
  2. open AB

so I can run code like the following

require(RSelenium)

remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4444L, 
browserName = "firefox")
remDr$open()
like image 523
DataAdventurer Avatar asked Feb 26 '17 12:02

DataAdventurer


People also ask

How to set up rselenium for the R programming language?

In this article, we will discuss how to set up RSelenium for the R programming language. Step 1: Install Rstudio onto your system. To install Rstudio on your system, head to the Rstudio website and download the latest version. You can also download the latest version from the RStudio website based on your operating system.

How do rselenium and rvest interact with each other?

The way I think about Rselenium and rvest interacting is that RSelenium is used to first load the page we want to scrape and then download the HTML from that page. From then on, we can rely on typical scraping tools and concepts provided by rvest.

What is rselenium and how to use it?

RSelenium automates a web browser and lets us scrape content that is dynamically altered by JavaScript for example. In this RSelenium tutorial, we will be going over two examples of how it can be used. For example #1, we want to get some latitude and longitude coordinates for some street addresses we have in our data set.

How do I use selenium with R?

To use Selenium in R, you'll obviously need the R language loaded on your system; I also recommend using the RStudio IDE. (Need to learn R basics?


1 Answers

  1. Download Docker at https://www.docker.com/products/docker-desktop

  2. Run docker pull selenium/standalone-chrome-debug in terminal (or cmd for windows)

  3. In Docker Desktop's Dashboard, go to the "images" tab on the left. After that, you should see something like this: enter image description here Click Run

  4. A popup will appear. There, click on "Optional Settings" enter image description here

  5. Type 4445 on Ports. Click on the "plus" sign, type 5901 on the other input that will be created on Ports. It should look like the image below. After that, click Run. enter image description here

  6. Now, if you click on the Containers / Apps tab on the left, there should be something like this: enter image description here

  7. In Rs console, go:

    install.packages("RSelenium")
    library(RSelenium)
    
    remDr <- remoteDriver(
            remoteServerAdd = "localhost",
            port = 4445L,
            browser = "chrome"
    )
    
    remDr$open()
    

Every time you want RSelenium to work you will need to run the Docker container as you did in steps 3 and 5 above.

The steps also allow you to use VNC to watch what happens and debug. If you need to learn a bit about it go to https://www.realvnc.com/pt/connect/download/viewer/ More details are out of the scope of this topic.

Well, I think this can take you to a point where you can now follow these instructions of RSelenium's basic usage vignette: https://cran.r-project.org/web/packages/RSelenium/vignettes/basics.html

You should also read about security related to exposed ports and how to handle it. These videos from R Consortium may help you out from here on: https://www.youtube.com/watch?v=OxbvFiYxEzI and https://www.youtube.com/watch?v=JcIeWiljQG4

I hope it may help you as you would have helped me some time ago.

like image 137
Rômulo Barros Avatar answered Oct 12 '22 23:10

Rômulo Barros