Suppose, you have a data.frame like this:
x <- data.frame(v1=1:20,v2=1:20,v3=1:20,v4=letters[1:20])
How would you select only those columns in x that are numeric?
We can use select_if() function to get numeric columns by calling the function with the dataframe name and isnumeric() function that will check for numeric columns.
Selecting columns based on their name This is the most basic way to select a single column from a dataframe, just put the string name of the column in brackets. Returns a pandas series. Passing a list in the brackets lets you select multiple columns at the same time.
To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.
EDIT: updated to avoid use of ill-advised sapply
.
Since a data frame is a list we can use the list-apply functions:
nums <- unlist(lapply(x, is.numeric))
Then standard subsetting
x[ , nums] ## don't use sapply, even though it's less code ## nums <- sapply(x, is.numeric)
For a more idiomatic modern R I'd now recommend
x[ , purrr::map_lgl(x, is.numeric)]
Less codey, less reflecting R's particular quirks, and more straightforward, and robust to use on database-back-ended tibbles:
dplyr::select_if(x, is.numeric)
Newer versions of dplyr, also support the following syntax:
x %>% dplyr::select(where(is.numeric))
The dplyr package's select_if(
) function is an elegant solution:
library("dplyr")
select_if(x, is.numeric)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With