Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partial animal string matching in R

I have a dataframe,

d<-data.frame(name=c("brown cat", "blue cat", "big lion", "tall tiger",
                     "black panther", "short cat", "red bird",
                     "short bird stuffed", "big eagle", "bad sparrow",
                     "dog fish", "head dog", "brown yorkie",
                     "lab short bulldog"), label=1:14)

I'd like to search the name column and if the words "cat", "lion", "tiger", and "panther" appear, I want to assign the character string feline to a new column and corresponding row species.

If the words "bird", "eagle", and "sparrow" appear, I want to assign the character string avian to a new column and corresponding row species.

If the words "dog", "yorkie", and "bulldog" appear, I want to assign the character string canine to a new column and corresponding row species.

Ideally, I'd store this in a list or something similar that I can keep at the beginning of the script, because as new variants of the species show up in the name category, it would be nice to have easy access to update what qualifies as a feline, avian, and canine.

This question is almost answered here (How to create new column in dataframe based on partial string matching other column in R), but it doesn't address the multiple name twist that is present in this problem.

like image 343
testname123 Avatar asked Apr 08 '14 22:04

testname123


People also ask

How do you match a partial string in R?

To do a Partial String Matching in R, use the charmatch() function. The charmatch() function accepts three arguments and returns the integer vector of the same length as input.

What is pattern matching in R?

R Functions for Pattern MatchingIf the regular expression, pattern, matches a particular element in the vector string, it returns the element's index. For returning the actual matching element values, set the option value to TRUE by value=TRUE .


1 Answers

There may be a more elegant solution than this, but you could use grep with | to specify alternative matches.

d[grep("cat|lion|tiger|panther", d$name), "species"] <- "feline"
d[grep("bird|eagle|sparrow", d$name), "species"] <- "avian"
d[grep("dog|yorkie", d$name), "species"] <- "canine"

I've assumed you meant "avian", and left out "bulldog" since it contains "dog".

You might want to add ignore.case = TRUE to the grep.

output:

#                 name label species
#1           brown cat     1  feline
#2            blue cat     2  feline
#3            big lion     3  feline
#4          tall tiger     4  feline
#5       black panther     5  feline
#6           short cat     6  feline
#7            red bird     7   avian
#8  short bird stuffed     8   avian
#9           big eagle     9   avian
#10        bad sparrow    10   avian
#11           dog fish    11  canine
#12           head dog    12  canine
#13       brown yorkie    13  canine
#14  lab short bulldog    14  canine
like image 73
ping Avatar answered Sep 23 '22 12:09

ping