Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Submit POST form when rvest doesn't recognize submit button

Tags:

r

httr

rvest

I would like to submit the following form (the form appears after you click on link "Kliknite na ..."): http://www1.biznet.hr/HgkWeb/do/extlogon

I have to enter one parameter, named "OIB" and submit the form by clicking "Trazi".

Here is my code:

library(httr)
library(rvest)

sess <- html_session("http://www1.biznet.hr/HgkWeb/do/extlogon")
search_page <- sess %>%
  follow_link(1)
form <- html_form(search_page)[[6]]
fill_form <- set_values(form, 'clanica.cla_oib' = '94989605030')
firma_i <- submit_form(search_page, fill_form, submit = 'submit')

Last line produces an error:

Error: Unknown submission name 'submit'. Possible values: clanica.asTextDatumGasenjaTo, clanica.asTextUdr_id

I don't understand why rvest recognize this two parameters as submit buttons when they don't contain submit name or type. And why rvest doesn't recognize submit button "Trazi" as submit parameter? In, short, how to change filled form to execute the form?

like image 212
Mislav Avatar asked Aug 24 '18 17:08

Mislav


People also ask

How can I send form data without submit button?

The form can be submitted without using submit button by implementing a specific event attribute or by clicking the link. This task can be done by using the OnClick event attribute or by using the form. submit() method in Javascript.

WHAT TO DO IF submit button is not working?

Sometimes the problem is caused by old versions of the Javascript files, cached by your browser and can be fixed by clearing the browser cache. You can use the browser console of your browser for debugging. After the Javascript error is fixed, the submit button will automatically be enabled.

Do forms need a button to submit?

If you don't have any submit button it is acceptable after all it is an element of form tag and if it is not required you may not add it with in form . This will not broke any web standard.

What happens when submit button is clicked?

The form will be submitted to the server and the browser will redirect away to the current address of the browser and append as query string parameters the values of the input fields.


1 Answers

The problem is that some of the input miss the type attr, and rvest does not check this appropriately.

To illustrate the problem:

library(httr)
library(rvest)
#> Loading required package: xml2

sess <- html_session("http://www1.biznet.hr/HgkWeb/do/extlogon")
search_page <- sess %>%
  follow_link(1)
#> Navigating to /HgkWeb/do/extlogon;jsessionid=88295900F3F932C85A25BB18F326BE28
form <- html_form(search_page)[[6]]
fill_form <- set_values(form, 'clanica.cla_oib' = '94989605030')

Some of the fields do not have the type attribute:

sapply(fill_form$fields, function(x) '['(x, 'type'))
#> $clanica.limitSearchToActiveCompany.type
#> [1] "radio"
#> 
#> $clanica.limitSearchToActiveCompany.type
#> [1] "radio"
#> 
#> $joinBy.useInnerJoin.type
#> [1] "checkbox"
#> 
#> $nazivTvrtke.type
#> [1] "text"
#> 
#> $nazivZapocinjeSaPredanomVrijednoscu.type
#> [1] "checkbox"
#> 
#> $clanica.cla_jmbp.type
#> [1] "text"
#> 
#> $clanica.cla_mbs.type
#> [1] "text"
#> 
#> $clanica.cla_oib.type
#> [1] "text"
#> 
#> $asTextKomoraId.NA
#> NULL
#> 
#> $clanica.asTextOpc_id.NA
#> NULL
#> 
#> $clanica.cla_opcina.type
#> [1] "hidden"
#> 
#> $clanica.asTextNas_id.NA
#> NULL
#> 
#> $clanica.cla_naselje.type
#> [1] "hidden"
#> 
#> $clanica.pos_id.NA
#> NULL
#> 
#> $clanica.postaNaziv.type
#> [1] "hidden"
#> 
#> $clanica.cla_ulica.type
#> [1] "text"
#> 
#> $clanica.asTextDatumUpisaFrom.type
#> [1] "text"
#> 
#> $clanica.asTextDatumUpisaTo.type
#> [1] "text"
#> 
#> $clanica.asTextDatumGasenjaFrom.type
#> [1] "text"
#> 
#> $clanica.asTextDatumGasenjaTo.type
#> [1] "text"
#> 
#> $clanica.asTextUdr_id.NA
#> NULL
#> 
#> $clanica.asTextVel_id.NA
#> NULL
#> 
#> $nkd2007.type
#> [1] "text"
#> 
#> $nkd2007PretrazivanjePoGlavnojDjelatnosti.type
#> [1] "radio"
#> 
#> $nkd2007PretrazivanjePoGlavnojDjelatnosti.type
#> [1] "radio"
#> 
#> $submit.type
#> [1] "submit"
#> 
#> $org.apache.struts.taglib.html.CANCEL.type
#> [1] "submit"
#> 
#> $orderBy.order1.NA
#> NULL
#> 
#> $orderBy.order2.NA
#> NULL
#> 
#> $limit.type
#> [1] "text"
#> 
#> $searchForRowCount.type
#> [1] "checkbox"
#> 
#> $joinBy.gfiGodina.NA
#> NULL
#> 
#> $joinBy.gfiBrojZaposlenihFrom.type
#> [1] "text"
#> 
#> $joinBy.gfiBrojZaposlenihTo.type
#> [1] "text"
#> 
#> $joinBy.gfiUkupniPrihodFrom.type
#> [1] "text"
#> 
#> $joinBy.gfiUkupniPrihodTo.type
#> [1] "text"

This messes up the internal function submit_request and specifically the Filter() in it.


It's referenced here, and a fix is proposed in this PR, but it hasn't been merged since Jul 2016, so don't hold your breath.

The fix in the PR basically check if a type attr is present:

  # form.R, row 280
  is_submit <- function(x) 'type' %in% names(x) &&
                           tolower(x$type) %in% c("submit", "image", "button")

For a quick fix you can change the data you have, overriding the NULL attr, with a random type:

fill_form$fields <- lapply(fill_form$fields, function(x) {
  null_type = is.null(x$type)
  if (null_type) x$type = 'text'
  x
})


firma_i <- submit_form(search_page, fill_form, submit = 'submit')
firma_i
#> <session> http://www1.biznet.hr/HgkWeb/do/fullSearchPost
#>   Status: 200
#>   Type:   text/html;charset=UTF-8
#>   Size:   4366

Created on 2018-08-27 by the reprex package (v0.2.0).

like image 53
GGamba Avatar answered Sep 19 '22 18:09

GGamba