Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Webscraping instruction for an R user

Tags:

r

web-scraping

I am a statistician/data scientist, R user, runner, and a beginner in the realm of webscraping.

I recently completed a race in Tampa, FL and the results are posted online. I would like to use some web scraping methods in R to pull this data for some fun analytics.

My experience with webscraping in R is very limited. Thus far I have depended on using "SelectorGadet" in a Google Chrome browser to identify the tags of the data I am trying to fetch.

For this one I am not having any luck. Any guidance on getting started would be appreciated.

URL: https://results2.xacte.com/#/e/2534/placings

some of the code I have tried:

library(rvest)
library(dplyr)

link = "https://results2.xacte.com/#/e/2573/placings"
page = read_html(link)

test = page %>% html_nodes("#place_1st .md-ink-ripple") %>% html_text

test returns: character(0)

If anyone can guide me in a way to pull some data into an object or dataframe it would be greatly appreciated. I am hoping to pull name, time, pace, etc. from this page. Any code to get me started would be enough for me to start to work to pull more data and analyze.

Thank You

like image 411
Omar123456789 Avatar asked Nov 14 '25 22:11

Omar123456789


1 Answers

This is not a good page to learn how to scrape. This page uses javascript to display the page so using basic rvest will not work. Look at the LiveHTML() function.

Or if you use the browser tools and look at the network, you should see a file named "agegroup" that should contain the information you are looking for. If you copy that files loop, you can download the file, clean it up and convert from JSON to a data frame.

#download the file
download.file("https://results.xacte.com/json/agegroup?categoryId=9828&eventId=2534&limit=250&offset=0&subeventId=6322&callback=angular.callbacks._2", "runner.txt")

#read the databack in 
data <- read_file("runner.txt")

#remove the javascript parts at the beginning ana end
data1<- sub("angular.callbacks._2\\(", "", data)
data1 <-sub("\\);", "", data1)

#Convert the Json to a data frame
df <- jsonlite::fromJSON(data2)$aaData

#number of records 
jsonlite::fromJSON(data2)$iTotalRecords

In the above script I modified the limit from a default of 25 names to 250. It looks like there were over 3500 participants so you have room to increase that number if you want everyone. There are many columns with the various split times etc.

Note the times are in raw form, so you will have to convert from milliseconds to minutes.

Hope this helps.

like image 188
Dave2e Avatar answered Nov 19 '25 13:11

Dave2e



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!