Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a new data frame column by picking a value in others columns according to an index column

Tags:

dataframe

r

Here is (a small part of) a data frame "df" with :

11 variables "v1" to "v11"

and an index column "indx" (with 1 <= indx <= 11).

"indx" was obtained through a previous step on another data frame and was then merged to "df" :

> df
    v1 v2  v3  v4  v5 v6  v7 v8 v9 v10 v11 indx
1  223  0  95 605  95  0   0  0  0 189   0   10
2   32  0   0  32   0 26   0  0  0  32   0    6
3    0  0 127  95  64 32   0  0  0 350   0   10
4  141  0 188   0 361  0   0  0  0 145   0    3
5   32  0 183   0 127  0   0  0  0 246   0    3
6   67  0 562   0   0  0   0  0  0 173   0    3
7   64  0 898   0   6  0   0  0  0   0   0    3
8    0  0  16   0  32  0   0  0  0  55   0   10
9    0  0 165   0   0  0 312  0  0 190   0   10
10   0  0 210   0   0  0 190  0  0  11   0    7

I need to build a new column "vsel" which value is "v(indx)"

(that is, for the 1rst row : vsel=189 because indx=10 and v10=189)

I successfully obtained this result by using a "for" loop :

> df
    v1 v2  v3  v4  v5 v6  v7 v8 v9 v10 v11 indx vsel
1  223  0  95 605  95  0   0  0  0 189   0   10  189
2   32  0   0  32   0 26   0  0  0  32   0    6   26
3    0  0 127  95  64 32   0  0  0 350   0   10  350
4  141  0 188   0 361  0   0  0  0 145   0    3  188
5   32  0 183   0 127  0   0  0  0 246   0    3  183
6   67  0 562   0   0  0   0  0  0 173   0    3  562
7   64  0 898   0   6  0   0  0  0   0   0    3  898
8    0  0  16   0  32  0   0  0  0  55   0   10   55
9    0  0 165   0   0  0 312  0  0 190   0   10  190
10   0  0 210   0   0  0 190  0  0  11   0    7  190

The code is :

df$vsel = NA
for (i in seq(1:nrow(df))   )
{
  r = df[i,]
  ind = r$indx
  df[i,"vsel"] = r[ind]
}

... I would like to avoid this loop (as it is rather slow when the data frame is big).

There is probably a (faster) R-type way :

maybe with apply(df, 1, ...) ?

or ddply ?

Thanks for any help …

like image 967
Phil Avatar asked Aug 03 '12 13:08

Phil


People also ask

How do you add a new column to a DataFrame based on another column?

Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.

How will you create a new column whose value is calculated from two other columns?

Create a new column by assigning the output to the DataFrame with a new column name in between the [] . Operations are element-wise, no need to loop over rows. Use rename with a dictionary or function to rename row labels or column names.

How will you create a new DataFrame with selected columns from an existing DataFrame?

You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.


2 Answers

Matrix indexing to the rescue! R has a way of doing exactly what you are describing. It is simple and powerful but surprisingly little-known.

df$vsel <- df[cbind(1:nrow(df), df$indx)]
like image 98
Aaron left Stack Overflow Avatar answered Oct 02 '22 19:10

Aaron left Stack Overflow


You can do that :

f <- function(i){df[i,df[i,]$indx]}
temp <- sapply(FUN=f,X=1:length(df[,1]))
cbind(df,vsel=temp)
like image 40
Pop Avatar answered Oct 02 '22 17:10

Pop