Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

For loop for dataframes in R

Tags:

dataframe

r

I am trying to do a function of decumulation with a for loop in R because the financial information provided by the company is accumulated for different concepts (this means that the info of January is only of January, the info of February is the sum of January and February, the one of March is the sum of January, February and March, etc.).

For example, let's say that I have the next dataframe:

Concepts <- c("Concept1", "Concept2", "Concept3")
January <- c(5,10,16)
February <- c(9,14,20)
March <- c(16,20,23)

df <- data.frame(Concepts, January, February, March)

This will give me the next dataframe:

Concepts  January  February  March
Concept1    5         9        16 
Concept2    10        14       20
Concept3    16        20       23 

What I need to achieve is the next dataframe (Notice that February is the difference between February and January, and March is the difference between February and March):

Concepts  January  February  March
Concept1    5         4        7 
Concept2    10        4        6
Concept3    16        4        3

To achieve the second dataframe, I first created an empty dataframe with the same amount of rows of df, then with a for loop cbind the first two rows of the dataframe (because they do not need any manipulation) and with the index add the next ones after calculated the difference. The above in code is as follows:

df <- data.frame(Concepts, January, February, March)
df2 <- data.frame(matrix(nrow=nrow(df),ncol=ncol(df))) #Empty Dataframe with the same number  of rows

for(i in 1:ncol(df)) {
  if(i == 1){
    df2 <- cbind(df2, df[ , i])
  } else if (i == 2){
    df2 <- cbind(df2, df[, i])
  } else {
    diference <- df[,i] - df[,i-1]
    df2 <- cbind(df2,diference)
  }

I get the following error:

error in [.data.table(df, , i) : j (the 2nd argument inside [...]) is a single symbol but column name 'i' is not found. Perhaps you intended DT[, ..i]. This difference to data.frame is deliberate and explained in FAQ 1.1.

I would love to receive a correction to my code or some alternative that allows me to calculate the above for a dataframe of many years.

like image 998
Sergio Chavez Villa Avatar asked Oct 27 '22 09:10

Sergio Chavez Villa


People also ask

Why not use for loops R?

For loops are not as important in R as they are in other languages because R is a functional programming language. This means that it's possible to wrap up for loops in a function, and call that function instead of using the for loop directly.

How does the for loop work in R?

For loop in R Programming Language is useful to iterate over the elements of a list, dataframe, vector, matrix, or any other object. It means, the for loop can be used to execute a group of statements repeatedly depending upon the number of elements in the object.


1 Answers

First note that if you apply base function diff to the months columns, you will get one column less but transposed.

apply(df[-1], 1, diff)
#         [,1] [,2] [,3]
#February    4    4    4
#March       7    6    4

So transpose it to get the right orientation.

t(apply(df[-1], 1, diff))
#     February March
#[1,]        4     7
#[2,]        4     6
#[3,]        4     4

And cbind it with the first 2 columns. Since the first argument is a subset of a data.frame, the method called is cbind.data.frame and the result is also a df.

cbind(df[1:2], t(apply(df[-1], 1, diff)))
#  Concepts January February March
#1 Concept1       5        4     7
#2 Concept2      10        4     6
#3 Concept3      15        4     4
like image 155
Rui Barradas Avatar answered Dec 22 '22 01:12

Rui Barradas