I have a below mentioned dataframe in R which I'm fetching from database using RMySQL
on interval of 3 hours.
Query:
Select Form_No, Date, Name, Age from Test where Date(Date)>='2020-12-01';
DF:
Form_No Date Name Age
POS-1 2020-12-01 10:16:12 Alex 29
POS-2 2020-12-01 10:18:34 Mike 31
POS-3 2020-12-01 14:16:22 Paul 21
POS-4 2020-12-01 18:33:14 John 27
POS-5 2020-12-01 20:13:34 Cohe 45
It is oblivious that after every 3 hours when I run the script there are some additional entries might have created in dataframe and when I use the below-mentioned code to upload the data in googlesheet it override the previous information and update the new data.
The Problem is that for each rows there are some observations might have captured by the reviewer of the information.
The Code I am using is:
library(dplyr)
library(stringr)
library(RMySQL)
library(googlesheets)
connection<-dbConnect(MySQL(), user='User', password='Password',
dbname='Db name',
host='host info')
Query<- paste0("Select Form_No, Date, Name, Age from Test where Date(Date)>='2020-12-01';")
DF <- dbGetQuery(connection,Query)
setwd("/home/path")
write.csv(x = DF, file = "cust_data.csv", row.names = FALSE)
as<-gs_title("TargetSheet")
gs_upload("cust_data.csv", sheet_title = "TargetSheet", overwrite = TRUE)
I want to keep the sequence of Form_No and every details same while uploading the new batch of information.
For Example, If I had run the query at 06:00 AM and there were five entry as shown in DF
and all of them uploaded to google sheet, now if I run the script at 09:00 am then in my query there is possibility that along with those five entry there are some more entries.
I need to append those extra entries in google sheet now except for the five entries which are already uploaded last time.
To select more than one row in the data view, click one row, then hold the Control (Windows) or Command (Mac) key and select each of the other rows you wish to edit or remove. To select a continuous list, click one row, then hold the Shift key and click the last row.
Suppose you've saved your query into a Google Sheet named "DatabaseOutput":
You can read the data with googlesheets4::read_sheet
. To insure you get the correct column types every time, I recommend supplying the col_types =
argument. It is very important that read_sheet
returns the same column types as your SQL query returns. You can find more information about the argument on the readr
documentation page.
library(googledrive)
library(googlesheets4)
OldSheet <- read_sheet(drive_get("DatabaseOutput"),
col_types = "cTci")
OldSheet
## A tibble: 3 x 4
# Form_No Date Name Age
# <chr> <dttm> <chr> <dbl>
#1 POS-1 2020-12-01 10:16:12 Alex 29
#2 POS-2 2020-12-01 10:18:34 Mike 31
#3 POS-3 2020-12-01 14:16:22 Paul 21
Your SQL query and read from Google Sheets may disagree on exactly what time it is, so I recommend flooring the Date column with lubridate
:
library(dplyr)
library(lubridate)
DF <- as_tibble(DF) %>%
mutate(Date = as_datetime(floor(seconds(Date))))
OldSheet <- OldSheet %>%
mutate(Date = as_datetime(floor(seconds(Date))))
Now we can find rows which are not in the old data with dplyr::anti_join
. If this doesn't work, double check that the column types are all the same.
NewRows <- anti_join(DF,OldSheet)
NewRows
## A tibble: 2 x 4
# Form_No Date Name Age
# <chr> <dttm> <chr> <int>
#1 POS-4 2020-12-01 18:33:14 John 27
#2 POS-5 2020-12-01 20:13:34 Cohe 45
Now you can append the rows to the Google Sheet with sheet_append
:
sheet_append(drive_get("DatabaseOutput"),NewRows)
*Edited to add: I understand your question better now; you just want to append the new entries because someone else with access to your google sheet may have reviewed and made edits to previously uploaded entries. Most of my answer will remain the same as before.
Try using the googlesheets4 package instead of googlesheets.
This is what worked for me.
as = gs4_create("TargetSheet")
DF = as.data.frame(read.csv('cust_data.csv'))
lastRow = nrow(DF)
sheet_write(DF, ss = as, sheet = "TargetSheet")
### 3 Hours later and beyond ###
DF = as.data.frame(read.csv('cust_data.csv'))
newEntries = DF[(lastRow + 1):nrow(DF), ]
lastRow = nrow(DF)
sheet_append(ss = as, data = newEntries, sheet = "TargetSheet")
So for any additional entries, just keep running the last 4 lines of code every three hours. Another potential solution is basing each new query off the last known position from your previous upload, but I think my pure R solution should do just fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With