Create Vector of Currently Opened URLs in Firefox Using R




I would like to use R to extract all URLs that are currently opened in a web browser. Consider the following example:

  • I have opened the firefox browser.
  • In the firefox browser, I have opened the URLs https://www.google.de/ and https://www.amazon.com/.

How could I extract these two URLs from within R, to get the following output?

my_urls <- c("https://www.google.de/", "https://www.amazon.com/")
### [1] "https://www.google.de/"  "https://www.amazon.com/"

After some research, I'm suspecting that this may be possible with the RSelenium package, but unfortunately I couldn't figure out the appropriate R code.

2 Answers

Here is one way you can do this (on Windows but the same idea applicable to other platforms).

Firefox stores this info in a json recovery file in the user's profile directory. It should be straightforward to extract this data except Firefox saves it using a custom version of lz4 compression. I couldn't find a way to automatically uncompress this file using Firefox itself without causing a potential security issue so instead have to rely on a third party tool, dejsonlz4 located here on GitHub. Once you've downloaded and extracted the tool you can run the following. Just keep in mind there may be a small delay between opening / closing a tab and this information being written to the recovery file.


# Filepaths
recovery_filepath <- "C:/Users/{NAME}/appdata/Roaming/Mozilla/Firefox/Profiles/{PROFILE}/sessionstore-backups/recovery.jsonlz4"
filepath_to_tool <- "C:/Tools/dejsonlz4.exe"
output_file <- "rcvry.json"

# Uncompress recovery file
invisible(system(paste(filepath_to_tool, recovery_filepath, paste(dirname(recovery_filepath), output_file, sep = "/"))))

# Read uncompressed file
recovery_info <- read_json(paste(dirname(recovery_filepath), output_file, sep = "/"))

# Extract open tab information (expected result 2 pages)
recovery_info %>%
  pluck("windows", 1, "tabs") %>%
  map_df( ~ map_df(pluck(.x, "entries"),
                   ~ keep(.x, names(.) %in% c("url", "title")))[pluck(.x, "index"), ])

# A tibble: 2 x 2
  url                                                      title                                            
  <chr>                                                    <chr>                                            
1 https://stackoverflow.com/questions/61104900/create-vec~ webbrowser control - Create Vector of Currently ~
2 https://github.com/avih/dejsonlz4                        GitHub - avih/dejsonlz4: Decompress Mozilla Fire~
You can do it using RSQLite package.

Get the path of your Firefox profile.

Go to %APPDATA%\Mozilla\Firefox\Profiles\ in your explorer. You will see the folder of your Firefox profile.

enter image description here

Open the folder and copy the location of the profile folder

Set the db to the copied location adding 'places.sqlite' at the end. Once this is set, you don't have to change the db name next time.

db<- 'C:\\Users\\{user}\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\{profilefolder}\\places.sqlite'

Then, proceed with the following:


con <- dbConnect(drv=RSQLite::SQLite(), dbname=db)
tables <- dbListTables(con)

dt = dbGetQuery(con,'select * from moz_places' )

urls<- dt$url[dt$visit_count>0]


[1] "https://duckduckgo.com/"                                        
[1] "http://linkedin.com/"                                           
[2] "https://linkedin.com/"                                          
[3] "https://www.linkedin.com/"                                      
[4] "https://www.sciencedirect.com/"                                 
[5] "http://stackexchange.com/"                                      
[6] "https://stackexchange.com/"


If you want have the browsing history of present day, use this:

dt = dbGetQuery(con,'select * from moz_places' )

dt$last_visit_date<- (as.Date(as.POSIXct(dt$last_visit_date/1000000, 
urls<- dt$url[dt$visit_count>0 & dt$last_visit_date==Sys.Date()]
