How do I search for a string in a data.frame? As a minimal example, how do I find the locations (columns and rows) of 'horse' in this data.frame?
> df = data.frame(animal=c('goat','horse','horse','two', 'five'), level=c('five','one','three',30,'horse'), length=c(10, 20, 30, 'horse', 'eight'))
> df
animal level length
1 goat five 10
2 horse one 20
3 horse three 30
4 two 30 horse
5 five horse eight
... so row 4 and 5 have the wrong order. Any output that would allow me to identify that 'horse' has shifted to the level
column in row 5 and to the length
column in row 4 is good. Maybe:
> magic_function(df, 'horse')
col row
'animal', 2
'animal', 3
'length', 4
'level', 5
Here's what I want to use this for: I have a very large data frame (around 60 columns, 20.000 rows) in which some columns are messed up for some rows. It's too large to eyeball in order to identify the different ways that order can be wrong, so searching would be nice. I will use this info to move data to the correct columns for these rows.
Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.
Pandas str. find() method is used to search a substring in each string present in a series. If the string is found, it returns the lowest index of its occurrence. If string is not found, it will return -1.
To get the nth row in a Pandas DataFrame, we can use the iloc() method. For example, df. iloc[4] will return the 5th row because row numbers start from 0.
Method 1: Use isin() function In this scenario, the isin() function check the pandas column containing the string present in the list and return the column values when present, otherwise it will not select the dataframe columns.
What about:
which(df == "horse", arr.ind = TRUE)
# row col
# [1,] 2 1
# [2,] 3 1
# [3,] 5 2
# [4,] 4 3
Another way around:
l <- sapply(colnames(df), function(x) grep("horse", df[,x]))
$animal
[1] 2 3
$level
[1] 5
$length
[1] 4
If you want the output to be matrix:
sapply(l,'[',1:max(lengths(l)))
animal level length
[1,] 2 5 4
[2,] 3 NA NA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With