After importing a table from Wikipedia, I have a list of values of the following form:
> tbl[2:6]
$`Internet
Explorer`
[1] "30.71%" "30.78%" "31.23%" "32.08%" "32.70%" "32.85%" "32.04%" "32.31%" "32.12%" "34.07%" "34.81%"
[12] "35.75%" "37.45%" "38.65%" "40.63%" "40.18%" "41.66%" "41.89%" "42.45%" "43.58%" "43.87%" "44.52%"
$Chrome
[1] "36.52%" "36.42%" "35.72%" "34.77%" "34.21%" "33.59%" "33.81%" "32.76%" "32.43%" "31.23%" "30.87%"
[12] "29.84%" "28.40%" "27.27%" "25.69%" "25.00%" "23.61%" "23.16%" "22.14%" "20.65%" "19.36%" "18.29%"
I am trying to get rid of the percentage signs, in order to convert the data to numeric form.
Is there a quicker way to clean this data than going for a vectorization? My current code follows:
data <- lapply(tbl[2:6], FUN = function(x) as.numeric(gsub("%", "", x)))
The data eventually become a data frame, but I could not get gsub
to work properly across all elements of a data frame. Is there a way to gsub() each element of a data frame?
The code for the project is online, with results. Thanks in advance!
Well I think you could do it the following way, but I don't know if it is better or cleaner than yours :
df <- data.frame(tbl)
df[,-1] <- as.numeric(gsub("%", "", as.matrix(df[,-1])))
Which gives :
R> head(df)
Date Internet.Explorer Chrome Firefox Safari Opera Mobile
1 January 2013 30.71 36.52 21.42 8.29 1.19 14.13
2 December 2012 30.78 36.42 21.89 7.92 1.26 14.55
3 November 2012 31.23 35.72 22.37 7.83 1.39 13.08
4 October 2012 32.08 34.77 22.32 7.81 1.63 12.30
5 September 2012 32.70 34.21 22.40 7.70 1.61 12.03
6 August 2012 32.85 33.59 22.85 7.39 1.63 11.78
R> sapply(df, class)
Date Internet.Explorer Chrome Firefox
"factor" "numeric" "numeric" "numeric"
Safari Opera Mobile
"numeric" "numeric" "numeric"
Like juba I'm uncertain if this way is "better or cleaner" but...to act on all elements of a data frame, you can use apply:
# start with data frame, not list
url <- "http://en.wikipedia.org/wiki/Usage_share_of_web_browsers"
# Get the eleventh table.
tbl <- readHTMLTable(url, which = 11, stringsAsFactors = F)
# use apply on the non-date columns
tbl[, 2:7] <- apply(tbl[, 2:7], 2, function(x) as.numeric(gsub("%", "", x)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With