Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Return a data frame from function

I have the following code inside a function

Myfunc<- function(directory, MyFiles, id = 1:332) {
# uncomment the 3 lines below for testing
#directory<-"local"
#id=c(2, 4)
#MyFiles<-c(f2.csv,f4.csv)
idd<-id

df2 <- data.frame()

for(i in 1:length(idd)) {
  EmptyVector <- read.csv(MyFiles[i])  
  comp_cases[i]<-sum(complete.cases(EmptyVector))
  print(comp_cases[[i]])
  id=idd[i]
  ret2=comp_cases[[i]]
  df2<-rbind(df2,data.frame(id,ret2))
 }
print(df2)
return(df2)
}

This works when I try to run it in R by selecting the code inside the function and commenting out the return. I get a nice data frame like from the print statement:

> df2
 id ret2
1 2  994
2 4  7112

However, when I try to return the dataframe df2 from the function it only returns the 1st row, ignoring all other values. My problem is that it works within the function for various values I have tried (opening multiple files with various combinations) and not when I try to return the data frame. Can someone help please. Thanks a lot in advance.

like image 351
user3127034 Avatar asked Jun 13 '14 19:06

user3127034


People also ask

How do you create a data frame from a function?

The DataFrame() function of pandas is used to create a dataframe. df variable is the name of the dataframe in our example.

How do I return multiple data frames from a function in Python?

In Python, you can return multiple values by simply return them separated by commas. In Python, comma-separated values are considered tuples without parentheses, except where required by syntax.

What is the correct function to return the entire DataFrame?

Pandas DataFrame all() Method By specifying the column axis ( axis='columns' ), the all() method returns True if ALL values in that axis are True.

How do I get data frames from a specific column?

If you have a DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets or other advanced methods such as loc and iloc .


2 Answers

This one is actually pretty easy to get around by changing scope.

The issue is that you're creating the initial dataframe as a local variable initially, then you're just swapping out the rows, so you'll wind up with only the first and last results in the dataframe.

When I create a for loop with R and want to add the results of successive queries etc. to some initial dataframe, I do this:

function(<some_args>){ 
main_dataframe <<- do something to generate the first set of results from 
whatever you want to iterate, like 1:10, a given list, etc. and create the 
initial dataframe from the first iteration and use the global assignment 
('<<-'), not '<-' or '='

main_dataframe <<- do_something(whatever_you're_iterating_over[1])

for (i in 2:length(whatever_you're_iterating_over)) {
next_dataframe = do_something(whatever_you're_iterating_over[i])

main_dataframe <<- rbind(main_dataframe, next_dataframe)
    }
}

The scoping will allow each iteration to create a dataframe that you can append to the original without losing any of the iterations in between the first and the last.

like image 90
uevencodebro Avatar answered Nov 12 '22 18:11

uevencodebro


If I understand you correctly, you are trying to create a dataframe with the number of complete cases for each id. Supposing your files are names with the id-numbers like you specified (e.g. f2.csv), you can simplify your function as follows:

myfunc <- function(directory, id = 1:332) {
  y <- vector()
  for(i in 1:length(id)){
    x <- id
    y <- c(y, sum(complete.cases(
      read.csv(as.character(paste0(directory,"/","f",id[i],".csv"))))))
  }
  df <- data.frame(x, y)
  colnames(df) <- c("id","ret2")
  return(df)
}

You can call this function like this:

myfunc("name-of-your-directory",25:87)

An explanation of the above code. You have to break down your problem into steps:

  1. You need a vector of the id's, that's done by x <- id
  2. For each id you want the number of complete cases. In order to get that, you have to read the file first. That's done by read.csv(as.character(paste0(directory,"/","f",id[i],".csv"))). To get the number of complete cases for that file, you have to wrap the read.csv code inside sum and complete.cases.
  3. Now you want to add that number to a vector. Therefore you need an empty vector (y <- vector()) to which you can add the number of complete cases from step 2. That's done by wrapping the code from step 2 inside y <- c(y, "code step 2"). With this you add the number of complete cases for each id to the vector y.
  4. The final step is to combine these two vectors into a dataframe with df <- data.frame(x, y) and assign some meaningfull colnames.

By including the steps 1, 2 and 3 (except the y <- vector() part) in a for-loop, you can iterate over the list of specified id's. Creating the empty vector with y <- vector() has to be done before the for-loop, so that the for-loop can add values to y.

like image 26
Jaap Avatar answered Nov 12 '22 19:11

Jaap