Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filling and submit search with rvest in R

Tags:

r

rvest

I am learning how to fill forms and submit with rvest in R, and I got stucked when I want to search for ggplot tag in stackoverflow. This is my code:

url<-"https://stackoverflow.com/questions"

(session<-html_session("https://stackoverflow.com/questions"))

(form<-html_form(session)[[2]])
(filled_form<-set_values(form, tagQuery = "ggplot"))
searched<-submit_form(session, filled_form)

I've got the error:

Submitting with '<unnamed>'
Error in parse_url(url) : length(url) == 1 is not TRUE

Follow this question (rvest error on form submission) I tried several things to solve this, but I couldnt:

filled_form$fields[[13]]$name<-"submit"
filled_form$fields[[14]]$name<-"submit"
filled_form$fields[[13]]$type<-"button"
filled_form$fields[[14]]$type<-"button"

Any help guys

like image 374
Laura Avatar asked Jan 25 '21 14:01

Laura


1 Answers

The search query is in html_form(session)[[1]]
As there is no submit button in this form :

<form> 'search' (GET /search)
  <input text> 'q': 

this workaround seems to work :

<form> 'search' (GET /search)
  <input text> 'q': 
  <input submit> '': 

Giving the following code sequence :

library(rvest)
url<-"https://stackoverflow.com/questions"
(session<-html_session("https://stackoverflow.com/questions"))
(form<-html_form(session)[[1]])

fake_submit_button <- list(name = NULL,
                           type = "submit",
                           value = NULL,
                           checked = NULL,
                           disabled = NULL,
                           readonly = NULL,
                           required = FALSE)
attr(fake_submit_button, "class") <- "input"

form[["fields"]][["submit"]] <- fake_submit_button
(filled_form<-set_values(form, q = "ggplot"))


searched<-submit_form(session, filled_form)

the problem is that the reply has a captcha :

searched$url
[1] "https://stackoverflow.com/nocaptcha?s=7291e7e6-9b8b-4b5f-bd1c-0f6890c23573"

enter image description here

You won't be able to handle this with rvest, but after clicking manually on the captcha you get the query you're looking for :

https://stackoverflow.com/search?q=ggplot

Probably much easier to use my other answer with:

read_html(paste0('https://stackoverflow.com/search?tab=newest&q=',search))
like image 83
Waldi Avatar answered Nov 10 '22 13:11

Waldi