I’m trying to write an ifelse statement (if statement is fine as well) with unite, such that if the number of characters in the FPLACE column were equal to four, it would create a new column called “PLACE_ID” uniting FIPS and FPLACE (in that order, no spaces) and (since all the other characters would be equal to five) to unite the remaining values in the FPLACE column with FIPS_ST and FPLACE and then putting these values also into “PLACE_ID” column.
For the past hour, I’ve been trying some form of this:
ifelse(nchar(dat$FPLACE, type = "chars")==4,
dat%>%unite(PLACE_ID, FIPS, FPLACE, sep = "", remove = FALSE),
dat%>%unite(PLACE_ID, FIPS_ST, FPLACE, sep = "", remove = TRUE))
and starting simpler:
nchar(dat$FPLACE, type = "chars")==4,
which works, but then I try the code below and something goes wrong.
if(dat$FPLACE==nchar(4)){
print(
dat%>%unite(PLACE_ID, FIPS, FPLACE, sep = "", remove = FALSE))
}
Ideally, I could just use piping, but even this doesn’t work:
dat%>%nchar(.$FPLACE, type = "chars")==4
And I think there is something both fundamental and important here that’s hidden in my continual confusion. Why will dat%>%filter(variable=="something") work, but dat%>%nchar(.$variable)==4 won’t? I've also never figured out when you have to use the .$ vs when you don't. What's the rhyme and reason?
Thanks much!!!
dput here:
Show in New WindowClear OutputExpand/Collapse Output
structure(list(X1 = c(1, 2, 3), FSTATE = c("(01) Alabama", "(01) Alabama",
"(01) Alabama"), FCOUNTY = c(1, 1, 1), FPLACE = c(3220, 62328,
62328), FIPS_ST = c("01", "01", "01"), FIPS_COUNTY = c("001",
"001", "001"), FIPS = c("01001", "01001", "01001"), ORI9 = c("AL0040200",
"AL0040100", "AL0040300"), ORI7 = c("AL00402", "AL00401", "-1"
), NAME = c("AUTAUGAVILLE POLICE DEPARTMENT", "PRATTVILLE POLICE DEPARTMENT",
"PRATTVILLE FIRE DEPT ARSON INVESTIGATION BRANCH")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L), spec = structure(list(
cols = list(X1 = structure(list(), class = c("collector_double",
"collector")), FSTATE = structure(list(), class = c("collector_character",
"collector")), FCOUNTY = structure(list(), class = c("collector_double",
"collector")), FPLACE = structure(list(), class = c("collector_double",
"collector")), FIPS_ST = structure(list(), class = c("collector_character",
"collector")), FIPS_COUNTY = structure(list(), class = c("collector_character",
"collector")), FIPS = structure(list(), class = c("collector_character",
"collector")), ORI9 = structure(list(), class = c("collector_character",
"collector")), ORI7 = structure(list(), class = c("collector_character",
"collector")), NAME = structure(list(), class = c("collector_character",
"collector")), UA = structure(list(), class = c("collector_double",
"collector")), STATENAME = structure(list(), class = c("collector_character",
"collector")), COUNTYNAME = structure(list(), class = c("collector_character",
"collector")), UANAME = structure(list(), class = c("collector_character",
"collector")), PARTOF = structure(list(), class = c("collector_character",
"collector")), AGCYTYPE = structure(list(), class = c("collector_character",
"collector")), SUBTYPE1 = structure(list(), class = c("collector_character",
"collector")), SUBTYPE2 = structure(list(), class = c("collector_character",
"collector")), GOVID = structure(list(), class = c("collector_double",
"collector")), LG_NAME = structure(list(), class = c("collector_character",
"collector")), ADDRESS_NAME = structure(list(), class = c("collector_character",
"collector")), ADDRESS_STR1 = structure(list(), class = c("collector_character",
"collector")), ADDRESS_STR2 = structure(list(), class = c("collector_character",
"collector")), ADDRESS_CITY = structure(list(), class = c("collector_character",
"collector")), ADDRESS_STATE = structure(list(), class = c("collector_character",
"collector")), ADDRESS_ZIP = structure(list(), class = c("collector_double",
"collector")), REPORT_FLAG = structure(list(), class = c("collector_character",
"collector")), CSLLEA08_ID = structure(list(), class = c("collector_double",
"collector")), LEMAS_ID = structure(list(), class = c("collector_character",
"collector")), U_STATENO = structure(list(), class = c("collector_character",
"collector")), U_CNTY = structure(list(), class = c("collector_double",
"collector")), U_POPGRP = structure(list(), class = c("collector_character",
"collector")), U_TPOP = structure(list(), class = c("collector_double",
"collector")), LG_POPULATION = structure(list(), class = c("collector_double",
"collector")), CSLLEA_SUB = structure(list(), class = c("collector_character",
"collector")), COMMENT = structure(list(), class = c("collector_character",
"collector")), INTPTLAT = structure(list(), class = c("collector_double",
"collector")), INTPTLONG = structure(list(), class = c("collector_double",
"collector")), CONGDIST1 = structure(list(), class = c("collector_character",
"collector")), CONGDIST2_18 = structure(list(), class = c("collector_character",
"collector")), DISTNAME = structure(list(), class = c("collector_character",
"collector")), SOURCE_CSLLEA2008 = structure(list(), class = c("collector_double",
"collector")), SOURCE_UCR2010 = structure(list(), class = c("collector_double",
"collector")), SOURCE_UCR2011 = structure(list(), class = c("collector_double",
"collector")), SOURCE_UCR2012 = structure(list(), class = c("collector_double",
"collector")), SOURCE_NCIC2012 = structure(list(), class = c("collector_double",
"collector")), SOURCE_VENDOR = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
I'm not sure unite can be used that way, but you can do the following:
library(tidyverse)
dat %>%
mutate(PLACE_ID = ifelse(nchar(FPLACE, type="chars")==4,
paste0(FIPS, FPLACE),
paste0(FIPS_ST, FPLACE)))
Regarding your question about filtering, you filter in a dplyr pipe by returning a vector of logical values. For example:
dat %>% filter(nchar(FPLACE, type = "chars")==4)
This works, because nchar(dat$FPLACE, type = "chars")==4 returns a vector of logical values. You just need to place it inside filter and remove dat$, since the data frame is already passed into filter by the pipe and you don't need to (and should not) reference the data frame name explicitly.
More generally, you don't need to (and should not) use data.frame.name$ when referring to column names of a data frame in a dplyr pipe (i.e., when using functions like filter, mutate, group_by, and summarise). Just use the bare column names.
For example, see what happens if you do the following with the built-in mtcars data frame:
mtcars %>%
group_by(cyl) %>%
summarise(mean1 = mean(mtcars$mpg),
mean2 = mean(mpg))
cyl mean1 mean2 <dbl> <dbl> <dbl> 1 4 20.1 26.7 2 6 20.1 19.7 3 8 20.1 15.1
mean(mtcars$mpg)
[1] 20.09062
To calculate mean1, we used mtcars$mpg instead of the bare column name. This reaches outside the context of the pipe (the "environment" of the pipe in programming parlance) to the version of mtcars in the global environment, rather than using the grouped version mtcars that was passed into summarise by the pipe. Thus, we get the overall mean rather than the grouped mean we wanted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With