Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently transform multiple columns of a data frame

Tags:

r

I have a data frame, and I want to transform all columns (say, take the logs or whatever) with columns that match a certain name. So in the example below, I want to take the log of X.1 and X.2, but not Y or Z.1.

df <- data.frame(
  Y = sample(0:1, 10, replace = TRUE),
  X.1 = sample(1:10),
  X.2 = sample(1:10),
  Z.1 = sample(151:160)
)

# option 1, won't work for dozens of fields
df$X.1 <- log(df$X.1)
df$X.2 <- log(df$X.2)

Is there a good, efficient way to do this when the dataframe is several gigabtyes?

like image 300
chmullig Avatar asked Jul 03 '13 16:07

chmullig


People also ask

How do you change the datatype of multiple columns?

We can use ALTER TABLE ALTER COLUMN statement to change the datatype of the column. The syntax to change the datatype of the column is the following. In the syntax, Tbl_name: Specify the table name that contains the column that you want to change.

How do I convert multiple columns to string?

Convert All Columns to Strings If you want to change the data type for all columns in the DataFrame to the string type, you can use df. applymap(str) or df. astype(str) methods.


2 Answers

In the case of functions that will return a data.frame:

cols <- c("X.1","X.2")
df[cols] <- log(df[cols])

Otherwise you will need to use lapply or a loop over the columns. These solutions will be slower than the solution above, so only use them if you must.

df[cols] <- lapply(df[cols], function(x) c(NA,diff(x)))
for(col in cols) {
  df[col] <- c(NA,diff(df[col]))
}
like image 91
Joshua Ulrich Avatar answered Sep 20 '22 04:09

Joshua Ulrich


vars <- c("X.1", "X.2")

df[vars] <- lapply(df[vars], log)
like image 20
Hong Ooi Avatar answered Sep 20 '22 04:09

Hong Ooi