Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to append new data in googlesheet

I have a below mentioned dataframe in R which I'm fetching from database using RMySQL on interval of 3 hours.

Query:

Select Form_No, Date, Name, Age from Test where Date(Date)>='2020-12-01';

DF:

Form_No      Date                  Name            Age
POS-1        2020-12-01 10:16:12   Alex            29
POS-2        2020-12-01 10:18:34   Mike            31
POS-3        2020-12-01 14:16:22   Paul            21
POS-4        2020-12-01 18:33:14   John            27
POS-5        2020-12-01 20:13:34   Cohe            45

It is oblivious that after every 3 hours when I run the script there are some additional entries might have created in dataframe and when I use the below-mentioned code to upload the data in googlesheet it override the previous information and update the new data.

The Problem is that for each rows there are some observations might have captured by the reviewer of the information.

The Code I am using is:

library(dplyr)
library(stringr)
library(RMySQL)
library(googlesheets)

connection<-dbConnect(MySQL(), user='User', password='Password', 
                 dbname='Db name',
                 host='host info')

Query<- paste0("Select Form_No, Date, Name, Age from Test where Date(Date)>='2020-12-01';")
DF <- dbGetQuery(connection,Query)

setwd("/home/path")
write.csv(x = DF, file = "cust_data.csv", row.names = FALSE)

as<-gs_title("TargetSheet")
gs_upload("cust_data.csv", sheet_title = "TargetSheet", overwrite = TRUE)

I want to keep the sequence of Form_No and every details same while uploading the new batch of information.

For Example, If I had run the query at 06:00 AM and there were five entry as shown in DF and all of them uploaded to google sheet, now if I run the script at 09:00 am then in my query there is possibility that along with those five entry there are some more entries.

I need to append those extra entries in google sheet now except for the five entries which are already uploaded last time.

like image 524
Sophia Wilson Avatar asked Dec 06 '20 13:12

Sophia Wilson


People also ask

How do I update multiple cells in Google Sheets?

To select more than one row in the data view, click one row, then hold the Control (Windows) or Command (Mac) key and select each of the other rows you wish to edit or remove. To select a continuous list, click one row, then hold the Shift key and click the last row.


2 Answers

Suppose you've saved your query into a Google Sheet named "DatabaseOutput": enter image description here

You can read the data with googlesheets4::read_sheet. To insure you get the correct column types every time, I recommend supplying the col_types = argument. It is very important that read_sheet returns the same column types as your SQL query returns. You can find more information about the argument on the readr documentation page.

library(googledrive)
library(googlesheets4)
OldSheet <- read_sheet(drive_get("DatabaseOutput"),
                       col_types = "cTci")
OldSheet
## A tibble: 3 x 4
#  Form_No Date                Name    Age
#  <chr>   <dttm>              <chr> <dbl>
#1 POS-1   2020-12-01 10:16:12 Alex     29
#2 POS-2   2020-12-01 10:18:34 Mike     31
#3 POS-3   2020-12-01 14:16:22 Paul     21

Your SQL query and read from Google Sheets may disagree on exactly what time it is, so I recommend flooring the Date column with lubridate:

library(dplyr)
library(lubridate)
DF <- as_tibble(DF) %>% 
  mutate(Date = as_datetime(floor(seconds(Date))))
OldSheet <- OldSheet %>%
  mutate(Date = as_datetime(floor(seconds(Date))))

Now we can find rows which are not in the old data with dplyr::anti_join. If this doesn't work, double check that the column types are all the same.

NewRows <- anti_join(DF,OldSheet)
NewRows
## A tibble: 2 x 4
#  Form_No Date                Name    Age
#  <chr>   <dttm>              <chr> <int>
#1 POS-4   2020-12-01 18:33:14 John     27
#2 POS-5   2020-12-01 20:13:34 Cohe     45

Now you can append the rows to the Google Sheet with sheet_append:

sheet_append(drive_get("DatabaseOutput"),NewRows)

enter image description here

like image 189
Ian Campbell Avatar answered Oct 19 '22 00:10

Ian Campbell


*Edited to add: I understand your question better now; you just want to append the new entries because someone else with access to your google sheet may have reviewed and made edits to previously uploaded entries. Most of my answer will remain the same as before.

Try using the googlesheets4 package instead of googlesheets.

This is what worked for me.

as = gs4_create("TargetSheet")
DF = as.data.frame(read.csv('cust_data.csv'))
lastRow = nrow(DF)
sheet_write(DF, ss = as, sheet = "TargetSheet")

### 3 Hours later and beyond ###

DF = as.data.frame(read.csv('cust_data.csv'))
newEntries = DF[(lastRow + 1):nrow(DF), ]
lastRow = nrow(DF)
sheet_append(ss = as, data = newEntries, sheet = "TargetSheet")

So for any additional entries, just keep running the last 4 lines of code every three hours. Another potential solution is basing each new query off the last known position from your previous upload, but I think my pure R solution should do just fine.

like image 3
Econundrums Avatar answered Oct 19 '22 00:10

Econundrums