I am dealing and playing around with "rvest". Getting the data with "read_html" is ok.
library(rvest)
# suppressMessages(library(dplyr))
library(stringr)
library(XML)
# get house data
houseurl <- "http://boekhoff.de/immobilien/gepflegtes-zweifamilienhaus-in-ellwuerden/"
house <- read_html(houseurl)
house
I have some problems processing the data. My problems are commented in the source.
## eleminating <br>-tags in address
# using the following commands causes error using "html_nodes"
str_extract_all(house,"<br>") ## show all linebreaks
# replacing <br> in whitespace " ",
house <- str_replace_all(house,"<br>", " ")
now reading out details but it seems, that doesn't work
houseattribut <- house %>%
html_nodes(css = "div.col-2 li p.data-left") %>%
html_text(trim=TRUE)
# shows "Error in UseMethod("xml_find_all") : ... "
# but all attributes are shown on screen
houseattribut
Without replacing the "br"-tags manually its working, but the "html_text" tightened the strings together
housedetails <- house %>%
html_nodes(css = "div.col-2 li p.data-right") %>%
html_text()
housedetails
# the same error shows "Error in UseMethod("xml_find_all") : ... "
# but all details are shown on screen
housedetails[4]
# in the source there is: "Ellwürder Straße 17<br>26954 Nordenham"
# at <br>-tag should be a whitespace
Any hints what I'm doing wrong?
The problem is that when you use read_html, house is a xml_document, after you use str_replace_all it became a chr, so, when you try to filter nodes again, its not more a xml_documentand it gives you the error.
You need to convert it again to xml_document or apply the replace node by node.
Something like that:
house <- read_html(str_replace_all(house,"<br>", " "))
Full code:
library(rvest)
#> Loading required package: xml2
library(stringr)
houseurl <- "http://boekhoff.de/immobilien/gepflegtes-zweifamilienhaus-in-ellwuerden/"
house <- read_html(houseurl)
house <- read_html(str_replace_all(house,"<br>", " "))
housedetails <- house %>%
html_nodes(css = "div.col-2 li p.data-right") %>%
html_text()
housedetails[4]
#> [1] "Ellwürder Straße 17 26954 Nordenham"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With