Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting specific columns from a data frame

Tags:

dataframe

r

r-faq

I have an R data frame with 6 columns, and I want to create a new dataframe that only has three of the columns.

Assuming my data frame is df, and I want to extract columns A, B, and E, this is the only command I can figure out:

 data.frame(df$A,df$B,df$E) 

Is there a more compact way of doing this?

like image 925
Aren Cambre Avatar asked Apr 10 '12 02:04

Aren Cambre


People also ask

How do I select only certain columns in a DataFrame?

Selecting columns based on their name This is the most basic way to select a single column from a dataframe, just put the string name of the column in brackets. Returns a pandas series. Passing a list in the brackets lets you select multiple columns at the same time.


1 Answers

You can subset using a vector of column names. I strongly prefer this approach over those that treat column names as if they are object names (e.g. subset()), especially when programming in functions, packages, or applications.

# data for reproducible example # (and to avoid confusion from trying to subset `stats::df`) df <- setNames(data.frame(as.list(1:5)), LETTERS[1:5]) # subset df[c("A","B","E")] 

Note there's no comma (i.e. it's not df[,c("A","B","C")]). That's because df[,"A"] returns a vector, not a data frame. But df["A"] will always return a data frame.

str(df["A"]) ## 'data.frame':    1 obs. of  1 variable: ## $ A: int 1 str(df[,"A"])  # vector ##  int 1 

Thanks to David Dorchies for pointing out that df[,"A"] returns a vector instead of a data.frame, and to Antoine Fabri for suggesting a better alternative (above) to my original solution (below).

# subset (original solution--not recommended) df[,c("A","B","E")]  # returns a data.frame df[,"A"]             # returns a vector 
like image 82
Joshua Ulrich Avatar answered Oct 21 '22 09:10

Joshua Ulrich