I have criminal sentencing data that contains a text variable which contains phrases like "2 months jail", "14 months prison", "12 months community supervision." I would like to run a logistic regression to determine the odds that a particular defendant is sent to prison or jail, or if they were released to community supervision. So I want to create a binary variable that shows a 1 for someone sent to "jail"/"prison" and a 0 for those sent to another program
I have tried using library(qdap) but have not had any luck. I have also tried ifelse(df$text %in% "jail", "1", "0") but it only shows 1 observation when I know there are several thousand.
Small data sample:
data<-data.frame('caseid'=c(1,2,3),'text'=c("went to prison","went to jail","released"))
caseid text
1 1 went to prison
2 2 went to jail
3 3 released
Trying to create a binary variable - sentenced - to analyze logistically like:
caseid text sentenced
1 1 went to prison 1
2 2 went to jail 1
3 3 released 0
Thank you for any help you can offer!
You can do the following in base R
transform(data, sentenced = +grepl("(jail|prison)", text))
# caseid text sentenced
#1 1 went to prison 1
#2 2 went to jail 1
#3 3 released 0
Explanation: "(jail|prison)" matches "jail" or "prison", and the unary operator + turns the output of grepl into an integer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With