Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can create a function using variables in a dataframe

I'm sure the question is a bit dummy (sorry)... I'm trying to create a function using differents variables I have stored in a Dataframe. The function is like that:

mlr_turb <- function(Cond_in, Flow_in, pH_in, pH_out, Turb_in, nm250_i, nm400_i, nm250_o, nm400_o){

     Coag = (+0.032690 + 0.090289*Cond_in + 0.003229*Flow_in - 0.021980*pH_in - 0.037486*pH_out 
             +0.016031*Turb_in  -0.026006*nm250_i +0.093138*nm400_o - 0.397858*nm250_o - 0.109392*nm400_o)/0.167304

    return(Coag)
    }

m4_turb <- mlr_turb(dataset)  

The problem is when I try to run my function in a dataframe (with the same name of variables). It doesn't detect my variables and shows this message:

Error in mlr_turb(dataset) : 
  argument "Flow_in" is missing, with no default

But, actually, there is, also all the variables.

I think I missplace or missing some order in the function that gives it the possibility to take the variables from the dataset. I have searched a lot about that but I have not found any answer...

like image 215
Mireia Plà Avatar asked Apr 14 '20 13:04

Mireia Plà


People also ask

How do you create a variable in a Dataframe in Python?

Syntax to add multiple variables to a dataframe One quick note on the syntax: If you want to add multiple variables, you can do this with a single call to the assign method. Just type the name of your dataframe, call the method, and then provide the name-value pairs for each new variable, separated by commas.

How do you create a variable in a data frame?

To create a dataframe, we need to import pandas. Dataframe can be created using dataframe() function. The dataframe() takes one or two parameters. The first one is the data which is to be filled in the dataframe table.

How do you use variables in pandas?

In order to do reference of a variable in query, you need to use @ . Instead of filter value we are referring the column which we want to use for subetting or filtering. {0} takes a value of variable myvar1. Incase you want to pass multiple columns as variables in query.

What are variables in a Dataframe?

А variable can contain a DataFrame. A variable is a placeholder/container for data values. Then, you can assign many types of objects to a certain variable. One type of object that you can assign to (and, thus, store in) a variable, is a pandas DataFrame.


2 Answers

No dumb questions!

I think you're looking for do.call. This function allows you to unpack values into a function as arguments. Here's a really simple example.

# a simple function that takes x, y and z as arguments 
myFun <- function(x, y, z){
  result <- (x + y)/z
  return(result)
}

# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
                     y=(1:5)*pi,
                     z=(11:15))

# unpack the values into the function using do.call
do.call('myFun', myData)

Output:

[1] 0.3765084 0.6902654 0.9557522 1.1833122 1.3805309
like image 109
Muon Avatar answered Sep 23 '22 16:09

Muon


You meet a standard problem when writing R that is related to the question of standard evaluation (SE) vs non standard evaluation (NSE). If you need more elements, you can have a look at this blog post I wrote

I think the most convenient way to write function using variables is to use variable names as arguments of the function.

Let's take again @Muon example.

# a simple function that takes x, y and z as arguments 
myFun <- function(x, y, z){
  result <- (x + y)/z
  return(result)
}

The question is where R should find the values behind names x, y and z. In a function, R will first look within the function environment (here x,y and z are defined as parameters) then it will look at global environment and then it will look at the different packages attached.

In myFun, R expects vectors. If you give a column name, you will experience an error. What happens if you want to give a column name ? You must say to R that the name you gave should be associated to a value in the scope of a dataframe. You can for instance do something like that:

myFun <- function(df, col1 = "x", col2 = "y", col3 = "z"){
  result <- (df[,col1] + df[,col2])/df[,col3]
  return(result)
}

You can go far further in that aspect with data.table package. If you start writing functions that need to use variables from a dataframe, I recommend you to start having a look at this package

like image 44
linog Avatar answered Sep 24 '22 16:09

linog