Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R lapply(): Change all columns within all data frames in a list to numeric, then convert all values to percentages

Tags:

r

lapply

Question:

I am a little stumped as to how I can batch process as.numeric() (or any other function for that matter) for columns in a list of data frames.

I understand that I can view specific data frames or colunms within this list by using:

> my.list[[1]] 
# or columns within this data frame using:
> my.list[[1]][1]

But my trouble comes when I try to apply this into an lapply() function to change all of the data from integer to numeric.

# Example of what I am trying to do
> my.list[[each data frame in list]][each column in data frame] <- 
as.numberic(my.list[[each data frame in list]][each column in data frame])

If you can assist me in any way, or know of any resources that can help me out I would appreciate it.

Background:

My data frames are structured as the below example, where I have 5 habitat types and information on how much area an individual species home range extends to n :

# Example data
spp.1.data <- data.frame(Habitat.A = c(100,45,0,9,0), Habitat.B =  c(0,0,203,45,89), Habitat.C = c(80,22,8,9,20), Habitat.D = c(8,59,77,83,69), Habitat.E = c(23,15,99,0,10))

I have multiple data frames with the above structure which I have assigned to a list object:

all.spp.data <- list(spp.1.data, spp.2.data, spp.1.data...n)

I am then trying to coerce all data frames to as.numeric() so I can create data frames of % habitat use i.e:

# data, which is now numeric as per Phil's code ;)

 data.numeric <- lapply(data, function(x) {
  x[] <- lapply(x, as.numeric)
  x
   })

> head(data.numeric[[1]])
  Habitat.A Habitat.B Habitat.C Habitat.D Habitat.E
1       100         0        80         8        23
2        45         0        22        59        15
3         0       203         8        77        99
4         9        45         9        83         0
5         0        89        20        69        10

EDIT: I would like to sum every row, in all data frames

# Add row at the end of each data frame populated by rowSums()

 f <- function(i){
      data.numeric[[i]]$Sums <- rowSums(data.numeric[[i]])
      data.numeric[[i]]
  }

data.numeric.SUM <- lapply(seq_along(data.numeric), f)
head(data.numeric.SUM[[1]])

 Habitat.A Habitat.B Habitat.C Habitat.D Habitat.E     Sums
1       100         0        80         8        23   211
2        45         0        22        59        15   141
3         0       203         8        77        99   387
4         9        45         9        83         0   146
5         0        89        20        69        10   188

EDIT: This is the code I used to convert values within the data frames to % habitat used

# Used Phil's logic to convert all numbers in percentages

data.numeric.SUM.perc <- lapply(data.numeric.SUM, 
function(x) {
x[] <- (x[]/x[,6])*100
x
})

 Perc.Habitat.A Perc.Habitat.B Perc.Habitat.C Perc.Habitat.D Perc.Habitat.E
1             47             32              0              6              0
2              0              0             52             31             47
3             38             16              2              6             11
4              4             42             20             57             37
5             11             11             26              0              5
6            100            100            100            100            100

This is still not the most condensed way to do this, but it did the trick for me.

Thank you, Phil, Val and Leo P, for helping with this problem.

like image 410
CarlaBirdy Avatar asked Jun 14 '17 10:06

CarlaBirdy


People also ask

How do I convert all columns to numeric in R?

To convert columns of an R data frame from integer to numeric we can use lapply function. For example, if we have a data frame df that contains all integer columns then we can use the code lapply(df,as. numeric) to convert all of the columns data type into numeric data type.

Does Lapply return a list?

lapply returns a list as it's output. In the output list there is one component for each component of the input list and it's value is the result of applying the function to the input component.

Can you use Lapply on a Dataframe?

If, instead of a list, you had a data frame of stock returns, could you still use lapply() ? Yes! Perhaps surprisingly, data frames are actually lists under the hood, and an lapply() call would apply the function to each column of the data frame.

Can you use Lapply on a Dataframe in R?

The output of lapply() is a list. lapply() can be used for other objects like data frames and lists. lapply() function does not need MARGIN. A very easy example can be to change the string value of a matrix to lower case with tolower function.


2 Answers

I'd do this a bit more explicitly:

all.spp.data <- lapply(all.spp.data, function(x) {
  x[] <- lapply(x, as.numeric)
  x
})

As a personal preference, this clearly conveys to me that I'm looping over each column in a data frame, and looping over each data frame in a list.

like image 85
Phil Avatar answered Sep 23 '22 13:09

Phil


If you really want to do it all with lapply, here's a way to go:

lapply(all.spp.data,function(x) do.call(cbind,lapply(1:nrow(x),function(y) as.numeric(x[,y]))))

This uses a nested lapply call. The first one references the single data.frames to x. The second one references the column index for each x to y. So in the end I can reference each column by x[,y].

Since everything will be split up in single vectors, I'm calling do.call(cbind, ... ) to bring it back to a matrix. If you prefer you could add data.frame() around it to bring it back into the original type.

like image 24
Val Avatar answered Sep 22 '22 13:09

Val