I have a dataframe <code>df</code> that has many cols and say 100 rows. How do I take all the level values from the columns with names "alpha", "gamma" and "zeta" and store the 300 of them in a single vector?

I've found that converting to a matrix first makes getting to levels a bit easier. <pre class="prettyprint"><code>as.vector(as.matrix(df[,c("alpha", "gamma", "zeta")])) </code></pre> Of course, you could have just done <code>stringsAsFactors=FALSE</code> when you read the data in initially.

Values from multiple dataframe columns into one vector

2 Answers

You have an accepted answer, but here's what I think is happening: You have a combination of factor and character columns. In that case, unlist doesn't work directly, but if they were all factor or if they were all character, there would be no problem:

Some sample data:

mydf <- data.frame(A = LETTERS[1:3], B = LETTERS[4:6], C = LETTERS[7:9],
                   D = LETTERS[10:12], E = LETTERS[13:15])
df <- mydf
df$E <- as.character(df$E)
colsOfInterest <- c("A", "B", "E")

Case 1, all columns are factors

unlist(mydf[colsOfInterest], use.names = FALSE)
# [1] A B C D E F M N O
# Levels: A B C D E F M N O

Case 2, column E = characters, other columns factors

unlist(df[colsOfInterest], use.names = FALSE)
# [1] "1" "2" "3" "1" "2" "3" "M" "N" "O"

unlist(lapply(df[colsOfInterest], as.character), use.names = FALSE)
# [1] "A" "B" "C" "D" "E" "F" "M" "N" "O"

For a problem at the scale described here, the benchmarks show that converting to character first and using unlist is actually the fastest approach if you don't care for retaining factors. Note that the result of fun1() won't be correct if some columns are factors and some are characters. Here's a benchmark on a 100 row data.frame:

library(microbenchmark)    
microbenchmark(fun1(), fun2(), fun3())
# Unit: microseconds
#    expr      min        lq    median       uq      max neval
#  fun1()  572.606  587.3595  595.4845  606.175 3439.055   100
#  fun2()  327.570  334.6265  341.2550  350.449 3443.758   100
#  fun3() 1037.020 1055.6215 1064.1745 1086.197 3929.981   100

Of course, here we're talking microseconds, but the results scale too.

For reference, here's what was used for benchmarking. Change "nRow" and "nCol" if you want to test on a different sized data.frame extracting different numbers of columns.

nRow <- 100
nCol <- 30
set.seed(1)
mydf <- data.frame(matrix(sample(LETTERS, nRow*nCol, replace = TRUE), nrow = nRow))
colsOfInterest <- sample(nCol, sample(nCol*.7, 1))
length(colsOfInterest)
# [1] 17

library(microbenchmark)    
fun1 <- function() unlist(mydf[colsOfInterest], use.names = FALSE)
fun2 <- function() unlist(lapply(mydf[colsOfInterest], as.character), use.names = FALSE)
fun3 <- function() as.vector(as.matrix(mydf[colsOfInterest]))
microbenchmark(fun1(), fun2(), fun3())

answered Oct 26 '22 06:10

A5C1D2H2I1M1N2O1R2T1

I've found that converting to a matrix first makes getting to levels a bit easier.

as.vector(as.matrix(df[,c("alpha", "gamma", "zeta")]))

Of course, you could have just done stringsAsFactors=FALSE when you read the data in initially.

answered Oct 26 '22 07:10

Neal Fultz

Related questions
                            
                                correlation of one variable to all the other in R
                            
                                Adjusting the width of the datatable using DT in R
                            
                                str_replace A1-A9 to A01-A09 and so on
                            
                                Adding convex hull to ggplot map
                            
                                Python equivalent of R list()
                            
                                R ggplot2 ggrepel - label a subset of points while being aware of all points
                            
                                Garbage collection of seemingly PROTECTed pairlist
                            
                                Strings as variable references in an R formula
                            
                                guidelines for testing a statistical function in R?
                            
                                Can anyone help me write a R data frame as a SAS data set?
                            
                                Is it possible/advisable to skip roxygen in favor of roxygen2? [closed]
                            
                                geom_smooth() - and scaling the y axis, losing data from smoothing
                            
                                How to do one-way ANOVA in R with unequal sample sizes?
                            
                                Installation directory of R and the usage of .libPath()
                            
                                How to save glm result without data or only with coeffients for prediction?
                            
                                stacking columns in data.frame into one column in R
                            
                                combine month and day into one date column
                            
                                Update formula in R
                            
                                R: How to save lists into csv?
                            
                                Merging or overlaying xyplots in a lattice panel

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Values from multiple dataframe columns into one vector

Tags:

dataframe

r

Benoît Pointet

People also ask