I have an R data frame with 6 columns, and I want to create a new dataframe that only has three of the columns.
Assuming my data frame is df
, and I want to extract columns A
, B
, and E
, this is the only command I can figure out:
data.frame(df$A,df$B,df$E)
Is there a more compact way of doing this?
Selecting columns based on their name This is the most basic way to select a single column from a dataframe, just put the string name of the column in brackets. Returns a pandas series. Passing a list in the brackets lets you select multiple columns at the same time.
You can subset using a vector of column names. I strongly prefer this approach over those that treat column names as if they are object names (e.g. subset()
), especially when programming in functions, packages, or applications.
# data for reproducible example # (and to avoid confusion from trying to subset `stats::df`) df <- setNames(data.frame(as.list(1:5)), LETTERS[1:5]) # subset df[c("A","B","E")]
Note there's no comma (i.e. it's not df[,c("A","B","C")]
). That's because df[,"A"]
returns a vector, not a data frame. But df["A"]
will always return a data frame.
str(df["A"]) ## 'data.frame': 1 obs. of 1 variable: ## $ A: int 1 str(df[,"A"]) # vector ## int 1
Thanks to David Dorchies for pointing out that df[,"A"]
returns a vector instead of a data.frame, and to Antoine Fabri for suggesting a better alternative (above) to my original solution (below).
# subset (original solution--not recommended) df[,c("A","B","E")] # returns a data.frame df[,"A"] # returns a vector
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With