I have a data frame which contains both numeric and non-numeric columns, say
df <- data.frame(v1=1:20,v2=1:20,v3=1:20,v4=letters[1:20],v5=letters[1:20])
To select only the non-numeric columns I would use
fixCol <- !sapply(df,is.numeric)
But now I also want to include a specific numeric column, say v2. My data frame is very big and the order of the columns changes, so I cannot index it using a number, I really want to use the name 'v2'. I tried
fixCol$v2 = TRUE
but that gives me the warning In fixCol$FR = TRUE : Coercing LHS to a list
which makes it impossible to subset my original data frame to get only fixCol
df[,fixCol]
gives: Error in .subset(x, j) : invalid subscript type 'list'
In the end my goal is to scale all numeric columns of my data frame except this one specified column, using something like this
scaleCol = !fixCol
df_scaled = cbind(df[,fixCol], sapply(df[,scaleCol],scale))
How can I best do this?
To select columns that are only of numeric datatype from a Pandas DataFrame, call DataFrame. select_dtypes() method and pass np. number or 'number' as argument for include parameter.
3.1 Subset by Column Name Let's use the same df[] notation and subset() function to subset the data frame by column name in R. To subset columns use select argument with values as column names to subset() .
We can use a OR condition (|
) to get a logical index and then subset the columns of 'df'.
df1 <- df[!sapply(df, is.numeric)|names(df)=='v2']
head(df1,2)
# v2 v4 v5
#1 1 a a
#2 2 b b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With