Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find string in data.frame

How do I search for a string in a data.frame? As a minimal example, how do I find the locations (columns and rows) of 'horse' in this data.frame?

> df = data.frame(animal=c('goat','horse','horse','two', 'five'), level=c('five','one','three',30,'horse'), length=c(10, 20, 30, 'horse', 'eight'))
> df
  animal level length
1   goat  five     10
2  horse   one     20
3  horse three     30
4    two    30  horse
5   five horse  eight

... so row 4 and 5 have the wrong order. Any output that would allow me to identify that 'horse' has shifted to the level column in row 5 and to the length column in row 4 is good. Maybe:

> magic_function(df, 'horse')
col       row
'animal', 2
'animal', 3
'length', 4
'level',  5

Here's what I want to use this for: I have a very large data frame (around 60 columns, 20.000 rows) in which some columns are messed up for some rows. It's too large to eyeball in order to identify the different ways that order can be wrong, so searching would be nice. I will use this info to move data to the correct columns for these rows.

like image 256
Jonas Lindeløv Avatar asked Sep 12 '16 12:09

Jonas Lindeløv


People also ask

How do you find a string in a DataFrame?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.

How do I find a particular string in pandas?

Pandas str. find() method is used to search a substring in each string present in a series. If the string is found, it returns the lowest index of its occurrence. If string is not found, it will return -1.

How do I find a row in a data frame?

To get the nth row in a Pandas DataFrame, we can use the iloc() method. For example, df. iloc[4] will return the 5th row because row numbers start from 0.

How do I check if a string is in pandas column?

Method 1: Use isin() function In this scenario, the isin() function check the pandas column containing the string present in the list and return the column values when present, otherwise it will not select the dataframe columns.


2 Answers

What about:

which(df == "horse", arr.ind = TRUE)
#      row col
# [1,]   2   1
# [2,]   3   1
# [3,]   5   2
# [4,]   4   3
like image 161
thothal Avatar answered Oct 14 '22 01:10

thothal


Another way around:

l <- sapply(colnames(df), function(x) grep("horse", df[,x]))

$animal
[1] 2 3

$level
[1] 5

$length
[1] 4

If you want the output to be matrix:

sapply(l,'[',1:max(lengths(l)))

     animal level length
[1,]      2     5      4
[2,]      3    NA     NA
like image 26
989 Avatar answered Oct 14 '22 00:10

989