I'm munging data, specifically, I've opened this pdf http://pubs.acs.org/doi/suppl/10.1021/ja105035r/suppl_file/ja105035r_si_001.pdf and scraped the data from table s4,
1a 1b 1a 1b
1 5.27 4.76 5.09 4.75
2 2.47 2.74 2.77 2.80
4 1.14 1.38 1.12 1.02
6 7.43 7.35 7.22-7.35a 7.25-7.36a
7 7.38 7.34 7.22-7.35a 7.25-7.36a
8 7.23 7.20 7.22-7.35a 7.25-7.36a
9(R) 4.16 3.89 4.12b 4.18b
9(S) 4.16 3.92 4.12b 4.18b
10 1.19 0.91 1.21 1.25
pasted it into notepad and saved it as a txt file.
s4 <- read.table("s4.txt", header=TRUE, stringsAsFactors=FALSE)
gives,
X1a X1b X1a.1 X1b.1
1 5.27 4.76 5.09 4.75
2 2.47 2.74 2.77 2.80
4 1.14 1.38 1.12 1.02
6 7.43 7.35 7.22-7.35a 7.25-7.36a
7 7.38 7.34 7.22-7.35a 7.25-7.36a
8 7.23 7.20 7.22-7.35a 7.25-7.36a
in order to use the data I need to change it all to numeric and remove the letters, thanks to this link R regex gsub separate letters and numbers I can use the following code,
gsub("([[:alpha:]])","",s4[,3])
I can get rid of the extraneous letters.
What I want to do now, and the point of the question, is to change the ranges,
"7.22-7.35" "7.22-7.35" "7.22-7.35"
with their means,
"7.29"
Could I use gsub for this? (or would I need to strsplit across the hyphen, combine into a vector and return the mean?).
You need a single regex in strsplit
for this task (removing letters and splitting):
s4[] <- lapply(s4, function(x) {
if (is.numeric(x)) x
else sapply(strsplit(as.character(x), "-|[[:alpha:]]"),
function(y) mean(as.numeric(y)))
})
The result:
> s4
X1a X1b X1a.1 X1b.1
1 5.27 4.76 5.090 4.750
2 2.47 2.74 2.770 2.800
4 1.14 1.38 1.120 1.020
6 7.43 7.35 7.285 7.305
7 7.38 7.34 7.285 7.305
8 7.23 7.20 7.285 7.305
Here's an approach that seems to work right on the sample data:
df[] <- lapply(df, function(col){
col <- gsub("([[:alpha:]])","", col)
col <- ifelse(grepl("-", col), mean(as.numeric(unlist(strsplit(col[grepl("-", col)], "-")))), col)
as.numeric(col)
})
> df
# X1a X1b X1a.1 X1b.1
#1 5.27 4.76 5.090 4.750
#2 2.47 2.74 2.770 2.800
#4 1.14 1.38 1.120 1.020
#6 7.43 7.35 7.285 7.305
#7 7.38 7.34 7.285 7.305
#8 7.23 7.20 7.285 7.305
Disclaimer: It only works right if the ranges in each column are all the same (as in the sample data)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With