I've looked extensively on stack overflow for a solution, but have yet to find one that works for me. I have a data frame that looks something like this:
id time latitude longitude
A 11:10 381746.0 6008345
A 11:11 381726.2 6008294
B 10:56 381703.0 6008214
B 10:57 381679.7 6008134
C 4:30 381654.4 6008083
C 4:31 381629.2 6008033
I would like to insert a new row at the END of each id. In this row, I would like 'id' and 'time' to be the same as the previous observation. I would like latitude and longitude to be '394681.4' and '6017550' (corresponding to the end location of all id's).
id time latitude longitude
A 11:10 381746.0 6008345
A 11:11 381726.2 6008294
A 11:11 394681.4 6017550
B 10:56 381703.0 6008214
B 10:57 381679.7 6008134
B 10:57 394681.4 6017550
C 4:30 381654.4 6008083
C 4:31 381629.2 6008033
C 4:32 394681.4 6017550
Can anyone think of a solution? Dplyr or data table solutions preferred.
The easiest way to add or insert a new row into a Pandas DataFrame is to use the Pandas . append() method. The . append() method is a helper method, for the Pandas concat() function.
You can use the df. loc() function to add a row to the end of a pandas DataFrame: #add row to end of DataFrame df.
You can append a row to the dataframe using concat() method. It concatenates two dataframe into one. To add one row, create a dataframe with one row and concatenate it to the existing dataframe.
Pandas DataFrame – Add or Insert Row. To append or add a row to DataFrame, create the new row as Series and use DataFrame. append() method.
A base R solution using the split-apply-combine concept.
do.call(rbind, lapply(split(df, df$id),
function(x) rbind(x,
within(x[nrow(x),], {latitude <- 394681.4; longitude <- 6017550}))))
which returns
id time latitude longitude
A.1 A 11:10 381746.0 6008345
A.2 A 11:11 381726.2 6008294
A.21 A 11:11 394681.4 6017550
B.3 B 10:56 381703.0 6008214
B.4 B 10:57 381679.7 6008134
B.41 B 10:57 394681.4 6017550
C.5 C 4:30 381654.4 6008083
C.6 C 4:31 381629.2 6008033
C.61 C 4:31 394681.4 6017550
split
breaks the data.frame into a list of data.frames, lapply
rbind
s the final row to each data.frame, and do.call
rbind
s the resulting list of data.frames. The final row of each data.frame is produced using within
which returns a modified version of the data.frame it is given. nrow
is used to select the final row. referencing @akrun's answer, x[nrow(x),]
could be replaced with tail(x, 1)
.
We can do this with data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'id', get the last row with tail
, assign the 'latitude' and 'longitude' with the new values, rbind
with the original dataset and order
by 'id'.
library(data.table)
rbind(setDT(df1), df1[, tail(.SD, 1) , by = id
][, c("latitude", "longitude") := .(394681.4, 6017550)
])[order(id)]
# id time latitude longitude
#1: A 11:10 381746.0 6008345
#2: A 11:11 381726.2 6008294
#3: A 11:11 394681.4 6017550
#4: B 10:56 381703.0 6008214
#5: B 10:57 381679.7 6008134
#6: B 10:57 394681.4 6017550
#7: C 4:30 381654.4 6008083
#8: C 4:31 381629.2 6008033
#9: C 4:31 394681.4 6017550
Or using dplyr
, with similar methodology
library(dplyr)
df1 %>%
group_by(id) %>%
summarise(time = last(time)) %>%
mutate(latitude = 394681.4, longitude = 6017550) %>%
bind_rows(df1, .) %>%
arrange(id)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With