I have the a dataframe(df) which is imported from web. I am interested with the following column(colname) of df. The elements of colname are recognized as "factors". A sample from df is like below which also includes "NA"s:
colname
57 +0.10
55
NA
57,5 +2.00
56,5 +0.50
56,5
58
I would like to split the colname by "+" and get 3 numeric columns as below. The desired output is:
colname1 colname2 total
57.00 0.10 57.10
55.00 0.00 55.00
NA NA NA
57.50 2.00 59.50
56.50 0.50 57.00
56.50 0.00 56.50
58.00 0.00 58.00
which is also a data frame and the all there columns are numeric. However, I am stuck with this problem. Whatever I do, I can't get the desired result. The errors are caused by mainly "NA"s and "factor" data type. I will be very glad for any help Thanks a lot.
I would replace the "," to '." using sub
. (read.table/read.csv
have dec
option as well). Using cSplit
from splitstackshape
, split the columns to two by specifying the sep as ,
. The output will be data.table
. Create the "Total" column by using the rowSums
. If you want to return NA
for rows that are all NAs
, it is possible (one option is showed in the 2nd solution)
df$colname <- sub(',', '.', df$colname)
library(splitstackshape)
dt <- cSplit(df, 'colname', '+')
dt[, Total:=rowSums(.SD,na.rm=TRUE)][]
Or using base R
, split the column ("colname") using strsplit
. Output will be a "list". Convert the "character" to "numeric", pad NAs
to get the length same in all the list elements and rbind
(df2 <- do.call(...,)
). Create the "Total" column by rowSums
, change the element to NA
for those that are NAs
in both columns.
lst <- lapply(strsplit(df$colname, '[+]'), as.numeric)
df2 <- do.call(rbind.data.frame,
lapply(lst, `length<-`, max(sapply(lst, length))))
names(df2) <- paste0('colname', 1:2)
df2$Total <- (NA^!rowSums(!is.na(df2)))*rowSums(df2, na.rm=TRUE)
df2
# colname1 colname2 Total
#1 57.0 0.1 57.1
#2 55.0 NA 55.0
#3 NA NA NA
#4 57.5 2.0 59.5
#5 56.5 0.5 57.0
#6 56.5 NA 56.5
#7 58.0 NA 58.0
Or in this case, eval(parse(
could also be used, which will avoid the step of changing 0
to NA
df2$Total <- unname(sapply(df$colname,
function(x) eval(parse(text=x))))
If you need to replace the NA
to 0
in "colname2"
df2$colname2[with(df2, is.na(colname2) & !is.na(colname1))] <- 0
df2
# colname1 colname2 Total
#1 57.0 0.1 57.1
#2 55.0 0.0 55.0
#3 NA NA NA
#4 57.5 2.0 59.5
#5 56.5 0.5 57.0
#6 56.5 0.0 56.5
#7 58.0 0.0 58.0
df <- structure(list(colname = structure(c(4L, 1L, NA, 5L, 3L, 2L,
6L), .Label = c("55", "56,5", "56,5 +0.50", "57 +0.10", "57,5 +2.00",
"58"), class = "factor")), .Names = "colname", row.names = c(NA,
-7L), class = "data.frame")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With