Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complete partially filled in columns, based on established relationships between columns

Tags:

r

I'm looking to complete a two columns which are based on each other, however they are partially filled.

 title <- c("Mrs", "Ms", "", "Ms", "Mr", "Mr", "")
 gender <- c("female", "", "male", "female", "", "Male", "female")

 df <- as.data.frame(cbind(title, gender))

 df 

    title gender
 1   Mrs female
 2    Ms       
 3         male
 4    Ms female
 5    Mr       
 6    Mr   Male

In this example, we know that if title=Mrs or Ms, then gender should be filled in with female, and if title=Mr then gender should be filled in as male. On the flip side if only gender is filled in to be female, then title should be Ms, or for male title should be Mr.

To add to this, how would you be able to complete a partially filled table without having to establish the relationships beforehand. Refer to the example below:

c1 <- paste(rep(letters[1:12], 4))
c2 <- paste(rep(letters[13:24], 4))
df <- as.data.frame(cbind(c1, c2), stringsAsFactors=FALSE)

#replacing 8 strings in each column
df[sample(nrow(df), 8),]$c1 <- ""
df[sample(nrow(df), 8),]$c2 <- ""
df

For this we know that two letters (for example, i and u) are paired. However some of the data values are missing, where one column is partially completed or empty. How would I fill in the values which are partially completed in this example?

(I know I'm supposed to show how I've tried to do this, but I'm stumped and couldn't find anything)

like image 278
user3389288 Avatar asked May 25 '14 03:05

user3389288


1 Answers

I think this is what you want:

#Find those where there is no title
noTitle = which(df$title=="")
#And fill them in based on the gender
df$title[noTitle] = ifelse(grepl("[Ff]",df$gender[noTitle]), "Ms", "Mr")
#Do the same for gender
noGender = which(df$gender=="")
df$gender[noGender] = ifelse(grepl("[Ss]",df$title[noGender]), "female", "male")

If they were both empty then there would be a check to start with and a convert them as appropriate; something like:

#Find where both empty
Neither = intersect( which(df$title=""), which(df$gender=="") )
##Do something here
like image 189
ThatGuy Avatar answered Oct 08 '22 22:10

ThatGuy