Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loop over strings in r

Tags:

loops

r

I'd like to know what is wrong with my code rather than a solution. I wish to loop over some strings my data is as follows:

id    source    transaction

 1     a > b       6 > 0
 2     J > k       5
 3     b > c       4 > 0

I have a list and wish to go over this list and find the rows that contains that element and compute average.

mylist <- c ("a", "b") 

So my desired output will for one of the element in the list is

source  avg
a        6 
b        2      

I do not know who to loop over the list and send them to a csv file. I tried this

mylist <- c( "a", "b" )

for(i in mylist)
{

  KeepData <- df [grepl(i, df$source), ]
   KeepData <- cSplit(KeepData, "transaction", ">", "long")

  avg<- mean(KeepData$transactions)
  result <- list(i,avg )

  write.table(result ,file="C:/Users.csv", append=TRUE,sep=",",col.names=FALSE,row.names=FALSE)

}

But It gives me "NA" result with the following warning

Warning messages: 1: In mean.default(KeepData$transactions) :
argument is not numeric or logical: returning NA 2: In mean.default(KeepData$transactions) : argument is not numeric or logical: returning NA

like image 718
MFR Avatar asked Sep 05 '25 02:09

MFR


1 Answers

We can use cSplit to split the 'source' and convert the dataset to 'long' format, then specify the 'i', grouped by 'source', get the mean of 'transaction' (using data.table methods)

library(splitstackshape)
cSplit(df1, "source", " > ", "long")[source %in% mylist, .(avg = mean(transaction)), source]
#   source avg
#1:      a   6
#2:      b   5

Or another option is separate_rows from tidyr to convert to 'long' format, then use the dplyr methods to summarise after grouping by 'source'

library(tidyr)
library(dplyr)
separate_rows(df1, source) %>%
        filter(source %in% mylist) %>%
        group_by(source) %>% 
        summarise(avg  = mean(transaction))

Update

For the new dataset ('df2'), we need to split both the columns to 'long' format, and then get the mean of 'transaction' grouped by 'source'

cSplit(df2, 2:3,  " > ", "long")[source %in% my_list, .(avg = mean(transaction)), source]
#   source avg
#1:      a   6
#2:      b   2

The for loop can be modified to

for(i in mylist) {
   KeepData <-  cSplit(df2, 2:3,  ">", "long")
   KeepData <- KeepData[grepl(i, source)]
   avg<- mean(KeepData$transaction)
   result <- list(i,avg )
   print(result)
   write.table(result ,file="C:/Users.csv", 
             append=TRUE,sep=",",col.names=FALSE,row.names=FALSE)
 }
#[[1]]
#[1] "a"

#[[2]]
#[1] 6

#[[1]]
#[1] "b"

#[[2]]
#[1] 2

data

df1 <- structure(list(id = 1:3, source = c("a > b", "J > k", "b > c"
 ), transaction = c(6L, 5L, 4L)), .Names = c("id", "source", "transaction"
), class = "data.frame", row.names = c(NA, -3L))


df2 <- structure(list(id = 1:3, source = c("a > b", "J > k", "b > c"
), transaction = c("6 > 0", "5", "4 > 0")), .Names = c("id", 
"source", "transaction"), class = "data.frame", row.names = c(NA, 
-3L))
like image 165
akrun Avatar answered Sep 07 '25 20:09

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!