I'm looking to complete a two columns which are based on each other, however they are partially filled.
title <- c("Mrs", "Ms", "", "Ms", "Mr", "Mr", "")
gender <- c("female", "", "male", "female", "", "Male", "female")
df <- as.data.frame(cbind(title, gender))
df
title gender
1 Mrs female
2 Ms
3 male
4 Ms female
5 Mr
6 Mr Male
In this example, we know that if title=Mrs
or Ms
, then gender should be filled in with female
, and if title=Mr
then gender should be filled in as male
. On the flip side if only gender is filled in to be female
, then title should be Ms
, or for male
title should be Mr
.
To add to this, how would you be able to complete a partially filled table without having to establish the relationships beforehand. Refer to the example below:
c1 <- paste(rep(letters[1:12], 4))
c2 <- paste(rep(letters[13:24], 4))
df <- as.data.frame(cbind(c1, c2), stringsAsFactors=FALSE)
#replacing 8 strings in each column
df[sample(nrow(df), 8),]$c1 <- ""
df[sample(nrow(df), 8),]$c2 <- ""
df
For this we know that two letters (for example, i
and u
) are paired. However some of the data values are missing, where one column is partially completed or empty. How would I fill in the values which are partially completed in this example?
(I know I'm supposed to show how I've tried to do this, but I'm stumped and couldn't find anything)
I think this is what you want:
#Find those where there is no title
noTitle = which(df$title=="")
#And fill them in based on the gender
df$title[noTitle] = ifelse(grepl("[Ff]",df$gender[noTitle]), "Ms", "Mr")
#Do the same for gender
noGender = which(df$gender=="")
df$gender[noGender] = ifelse(grepl("[Ss]",df$title[noGender]), "female", "male")
If they were both empty then there would be a check to start with and a convert them as appropriate; something like:
#Find where both empty
Neither = intersect( which(df$title=""), which(df$gender=="") )
##Do something here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With