I've got a data frame like this one
1 1 1 K 1 K K
2 1 2 K 1 K K
3 8 3 K 1 K K
4 8 2 K 1 K K
1 1 1 K 1 K K
2 1 2 K 1 K K
I want to remove all the columns with the same value, i.e K, so my result will be like this
1 1 1 1
2 1 2 1
3 8 3 1
4 8 2 1
1 1 1 1
2 1 2 1
I try to iterate in a for by columns but I didn't get anything. Any ideas?
Use drop() method to delete rows based on column value in pandas DataFrame, as part of the data cleansing, you would be required to drop rows from the DataFrame when a column value matches with a static value or on another column value.
To select columns with more than one value regardless of type:
uniquelength <- sapply(d,function(x) length(unique(x)))
d <- subset(d, select=uniquelength>1)
?
(Oops, Roman's question is right -- this could knock out your column 5 as well)
Maybe (edit: thanks to comments!)
isfac <- sapply(d,inherits,"factor")
d <- subset(d,select=!isfac | uniquelength>1)
or
d <- d[,!isfac | uniquelength>1]
Here's a solution that'll work to remove any replicated columns (including, e.g., pairs of replicated character, numeric, or factor columns). That's how I read the OP's question, and even if it's a misreading, it seems like an interesting question as well.
df <- read.table(text="
1 1 1 K 1 K K
2 1 2 K 1 K K
3 8 3 K 1 K K
4 8 2 K 1 K K
1 1 1 K 1 K K
2 1 2 K 1 K K")
# Need to run duplicated() in 'both directions', since it considers
# the first example to be **not** a duplicate.
repdCols <- as.logical(duplicated(as.list(df), fromLast=FALSE) +
duplicated(as.list(df), fromLast=TRUE))
# [1] FALSE FALSE FALSE TRUE FALSE TRUE TRUE
df[!repdCols]
# V1 V2 V3 V5
# 1 1 1 1 1
# 2 2 1 2 1
# 3 3 8 3 1
# 4 4 8 2 1
# 5 1 1 1 1
# 6 2 1 2 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With