I would like to use R to extract all URLs that are currently opened in a web browser. Consider the following example: <ul> <li>I have opened the firefox browser.</li> <li>In the firefox browser, I have opened the URLs https://www.google.de/ and https://www.amazon.com/.</li> </ul> How could I extract these two URLs from within R, to get the following output? <pre class="prettyprint"><code>my_urls <- c("https://www.google.de/", "https://www.amazon.com/") my_urls ### [1] "https://www.google.de/" "https://www.amazon.com/" </code></pre> After some research, I'm suspecting that this may be possible with the RSelenium package, but unfortunately I couldn't figure out the appropriate R code.

Here is one way you can do this (on Windows but the same idea applicable to other platforms). Firefox stores this info in a json recovery file in the user's profile directory. It should be straightforward to extract this data except Firefox saves it using a custom version of lz4 compression. I couldn't find a way to automatically uncompress this file using Firefox itself without causing a potential security issue so instead have to rely on a third party tool, <code>dejsonlz4</code> located here on GitHub. Once you've downloaded and extracted the tool you can run the following. Just keep in mind there may be a small delay between opening / closing a tab and this information being written to the recovery file. <pre class="prettyprint"><code>library(jsonlite) library(dplyr) library(purrr) # Filepaths recovery_filepath <- "C:/Users/{NAME}/appdata/Roaming/Mozilla/Firefox/Profiles/{PROFILE}/sessionstore-backups/recovery.jsonlz4" filepath_to_tool <- "C:/Tools/dejsonlz4.exe" output_file <- "rcvry.json" # Uncompress recovery file invisible(system(paste(filepath_to_tool, recovery_filepath, paste(dirname(recovery_filepath), output_file, sep = "/")))) # Read uncompressed file recovery_info <- read_json(paste(dirname(recovery_filepath), output_file, sep = "/")) # Extract open tab information (expected result 2 pages) recovery_info %>% pluck("windows", 1, "tabs") %>% map_df( ~ map_df(pluck(.x, "entries"), ~ keep(.x, names(.) %in% c("url", "title")))[pluck(.x, "index"), ]) # A tibble: 2 x 2 url title <chr> <chr> 1 https://stackoverflow.com/questions/61104900/create-vec~ webbrowser control - Create Vector of Currently ~ 2 https://github.com/avih/dejsonlz4 GitHub - avih/dejsonlz4: Decompress Mozilla Fire~ </code></pre>

You can do it using RSQLite package. Get the path of your Firefox profile. Go to %APPDATA%\Mozilla\Firefox\Profiles\ in your explorer. You will see the folder of your Firefox profile. <img src="https://i.stack.imgur.com/JJNET.jpg" alt="enter image description here"> Open the folder and copy the location of the profile folder Set the db to the copied location adding 'places.sqlite' at the end. Once this is set, you don't have to change the db name next time. <pre class="prettyprint"><code>db<- 'C:\\Users\\{user}\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\{profilefolder}\\places.sqlite' </code></pre> Then, proceed with the following: <pre class="prettyprint"><code>library(RSQLite) con <- dbConnect(drv=RSQLite::SQLite(), dbname=db) tables <- dbListTables(con) dt = dbGetQuery(con,'select * from moz_places' ) urls<- dt$url[dt$visit_count>0] urls </code></pre> Output: <pre class="prettyprint"><code>[1] "https://duckduckgo.com/" [1] "http://linkedin.com/" [2] "https://linkedin.com/" [3] "https://www.linkedin.com/" [4] "https://www.sciencedirect.com/" [5] "http://stackexchange.com/" [6] "https://stackexchange.com/" </code></pre> Edit: If you want have the browsing history of present day, use this: <pre class="prettyprint"><code>dt = dbGetQuery(con,'select * from moz_places' ) dt$last_visit_date<- (as.Date(as.POSIXct(dt$last_visit_date/1000000, origin="1970-01-01"))) urls<- dt$url[dt$visit_count>0 & dt$last_visit_date==Sys.Date()] urls </code></pre>

Create Vector of Currently Opened URLs in Firefox Using R

Tags:

r

firefox

I would like to use R to extract all URLs that are currently opened in a web browser. Consider the following example:

I have opened the firefox browser.
In the firefox browser, I have opened the URLs https://www.google.de/ and https://www.amazon.com/.

How could I extract these two URLs from within R, to get the following output?

my_urls <- c("https://www.google.de/", "https://www.amazon.com/")
my_urls
### [1] "https://www.google.de/"  "https://www.amazon.com/"

After some research, I'm suspecting that this may be possible with the RSelenium package, but unfortunately I couldn't figure out the appropriate R code.

373

asked Apr 08 '20 15:04

Joachim Schork

2 Answers

Here is one way you can do this (on Windows but the same idea applicable to other platforms).

Firefox stores this info in a json recovery file in the user's profile directory. It should be straightforward to extract this data except Firefox saves it using a custom version of lz4 compression. I couldn't find a way to automatically uncompress this file using Firefox itself without causing a potential security issue so instead have to rely on a third party tool, dejsonlz4 located here on GitHub. Once you've downloaded and extracted the tool you can run the following. Just keep in mind there may be a small delay between opening / closing a tab and this information being written to the recovery file.

library(jsonlite)
library(dplyr)
library(purrr)

# Filepaths
recovery_filepath <- "C:/Users/{NAME}/appdata/Roaming/Mozilla/Firefox/Profiles/{PROFILE}/sessionstore-backups/recovery.jsonlz4"
filepath_to_tool <- "C:/Tools/dejsonlz4.exe"
output_file <- "rcvry.json"

# Uncompress recovery file
invisible(system(paste(filepath_to_tool, recovery_filepath, paste(dirname(recovery_filepath), output_file, sep = "/"))))

# Read uncompressed file
recovery_info <- read_json(paste(dirname(recovery_filepath), output_file, sep = "/"))

# Extract open tab information (expected result 2 pages)
recovery_info %>%
  pluck("windows", 1, "tabs") %>%
  map_df( ~ map_df(pluck(.x, "entries"),
                   ~ keep(.x, names(.) %in% c("url", "title")))[pluck(.x, "index"), ])

# A tibble: 2 x 2
  url                                                      title                                            
  <chr>                                                    <chr>                                            
1 https://stackoverflow.com/questions/61104900/create-vec~ webbrowser control - Create Vector of Currently ~
2 https://github.com/avih/dejsonlz4                        GitHub - avih/dejsonlz4: Decompress Mozilla Fire~

184

answered Oct 23 '22 07:10

Ritchie Sacramento

You can do it using RSQLite package.

Get the path of your Firefox profile.

Go to %APPDATA%\Mozilla\Firefox\Profiles\ in your explorer. You will see the folder of your Firefox profile.

enter image description here

Open the folder and copy the location of the profile folder

Set the db to the copied location adding 'places.sqlite' at the end. Once this is set, you don't have to change the db name next time.

db<- 'C:\\Users\\{user}\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\{profilefolder}\\places.sqlite'

Then, proceed with the following:

library(RSQLite)

con <- dbConnect(drv=RSQLite::SQLite(), dbname=db)
tables <- dbListTables(con)

dt = dbGetQuery(con,'select * from moz_places' )

urls<- dt$url[dt$visit_count>0]
urls

Output:

[1] "https://duckduckgo.com/"                                        
[1] "http://linkedin.com/"                                           
[2] "https://linkedin.com/"                                          
[3] "https://www.linkedin.com/"                                      
[4] "https://www.sciencedirect.com/"                                 
[5] "http://stackexchange.com/"                                      
[6] "https://stackexchange.com/"

Edit:

If you want have the browsing history of present day, use this:

dt = dbGetQuery(con,'select * from moz_places' )

dt$last_visit_date<- (as.Date(as.POSIXct(dt$last_visit_date/1000000, 
                                         origin="1970-01-01")))
urls<- dt$url[dt$visit_count>0 & dt$last_visit_date==Sys.Date()]
urls

answered Oct 23 '22 05:10

Mohanasundaram

Related questions
                            
                                How to do faster list-column operations inside data.table
                            
                                str_extract_all: return all patterns found in string concatenated as vector
                            
                                How to name a list of a group_split in dplyr when grouped by more than one column
                            
                                Combining all data in a data frame per column and groups in R
                            
                                finding multiples close to a value in r
                            
                                R dplyr choose value from column with column name to choose in a separate column
                            
                                How does R check for system external dependencies when installing an R package?
                            
                                How to lookup and sum multiple columns in R
                            
                                Small multiple maps with geom_sf at the same spatial scale
                            
                                Counting the number of specific integers per column in an R matrix
                            
                                Control row stripe color in datatable output
                            
                                regression by group and retain all the columns in R
                            
                                Issue with sapply when determining if nested list has all NA values in R
                            
                                How to select multiple elements from multiple vectors in a list
                            
                                Selectively Rename R Data Frame Column Names Using a Key, Value Pair Dictionary
                            
                                st_write refuses to overwrite layers in geopackage in R
                            
                                Use citation() in R Markdown to automatically generate a bibliography of R packages
                            
                                Check whether certain function is used inside a nested function in R
                            
                                how to filter, then pipe and use sum function?
                            
                                Using dplyr, how should I create a column of strings repeating a character based on the value of another column?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With