Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fill rows downward for a column with NAs (with R base or data.table)

I want to use the Census' county-adjacency data, but am was stuck on getting it into a nice form. The data comes in four columns: first county, first code, second county, second code. The first county column does not repeat, and instead takes value "" the way I have it read in now:

                     c1   cd1                    c2   cd2
1   Alamance County, NC 37001   Alamance County, NC 37001
2                          NA    Caswell County, NC 37033
3                          NA    Chatham County, NC 37037
4                          NA   Guilford County, NC 37081
5                          NA     Orange County, NC 37135
6                          NA   Randolph County, NC 37151
7                          NA Rockingham County, NC 37157
8  Alexander County, NC 37003  Alexander County, NC 37003
9                          NA   Caldwell County, NC 37027
10                         NA    Catawba County, NC 37035
11                         NA    Iredell County, NC 37097
12                         NA     Wilkes County, NC 37193
13 Alleghany County, NC 37005  Alleghany County, NC 37005
14                         NA       Ashe County, NC 37009
15                         NA      Surry County, NC 37171
16                         NA     Wilkes County, NC 37193
17                         NA    Grayson County, VA 51077
18     Anson County, NC 37007      Anson County, NC 37007
19                         NA Montgomery County, NC 37123
20                         NA   Richmond County, NC 37153

I happen to only be interested in the North Carolina part of the data found at that link, part of which is what you see above:

#
nc_cc <- structure(list(c1 = c("Alamance County, NC", "", "", "", "", "", "", "Alexander County, NC", "", "", "", "", "Alleghany County, NC", "", "", "", "", "Anson County, NC", "", ""), cd1 = c(37001L, NA, NA, NA, NA, NA, NA, 37003L, NA, NA, NA, NA, 37005L, NA, NA, NA, NA, 37007L, NA, NA), c2 = c("Alamance County, NC", "Caswell County, NC", "Chatham County, NC", "Guilford County, NC", "Orange County, NC", "Randolph County, NC", "Rockingham County, NC", "Alexander County, NC", "Caldwell County, NC", "Catawba County, NC", "Iredell County, NC", "Wilkes County, NC", "Alleghany County, NC", "Ashe County, NC", "Surry County, NC", "Wilkes County, NC", "Grayson County, VA", "Anson County, NC", "Montgomery County, NC", "Richmond County, NC" ), cd2 = c(37001L, 37033L, 37037L, 37081L, 37135L, 37151L, 37157L, 37003L, 37027L, 37035L, 37097L, 37193L, 37005L, 37009L, 37171L, 37193L, 51077L, 37007L, 37123L, 37153L)), .Names = c("c1", "cd1", "c2", "cd2"), row.names = c(NA, 20L), class = "data.frame")
#

I want a clean adjacency association out of this (and the county names are superfluous), so my desired output could take a variety of forms: a data.frame, a list, ...

The crude solution I've come up with (after a lot of thought) is this:

require(data.table)
DT <- data.table(nc_cc)
DT[,list(cd1=cd1[1],cd2),by=cumsum(!is.na(cd1))][,list(cd1,cd2)]

giving

      cd1   cd2
 1: 37001 37001
 2: 37001 37033
 3: 37001 37037
 4: 37001 37081
 5: 37001 37135
 6: 37001 37151
 7: 37001 37157
 8: 37003 37003
 9: 37003 37027
10: 37003 37035
11: 37003 37097
12: 37003 37193
13: 37005 37005
14: 37005 37009
15: 37005 37171
16: 37005 37193
17: 37005 51077
18: 37007 37007
19: 37007 37123
20: 37007 37153

I've tagged this with data.table because I used it in my solution above, and I suspect there's something nice that can be done with roll. Really, I've never understood the documentation for roll, so I'm hoping to learn something here... So: can this be done a better way?

EDIT: This question is asking the same thing, so I revise my question to: "Is there a better way to do this using data.table or base R (since I am averse to installing more packages)?"

like image 223
Frank Avatar asked Dec 20 '22 00:12

Frank


1 Answers

A pretty standard way of doing this is:

library(data.table)
dt = data.table(nc_cc)

dt[, cd1 := cd1[1], by = cumsum(!is.na(cd1))]
like image 55
eddi Avatar answered Apr 30 '23 22:04

eddi