Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create Vector of Currently Opened URLs in Firefox Using R

Tags:

r

firefox

I would like to use R to extract all URLs that are currently opened in a web browser. Consider the following example:

  • I have opened the firefox browser.
  • In the firefox browser, I have opened the URLs https://www.google.de/ and https://www.amazon.com/.

How could I extract these two URLs from within R, to get the following output?

my_urls <- c("https://www.google.de/", "https://www.amazon.com/")
my_urls
### [1] "https://www.google.de/"  "https://www.amazon.com/"

After some research, I'm suspecting that this may be possible with the RSelenium package, but unfortunately I couldn't figure out the appropriate R code.

like image 373
Joachim Schork Avatar asked Apr 08 '20 15:04

Joachim Schork


People also ask

How to create a vector in R?

How to Create Vector in R? Vectors are generally created using the c() function. Since, a vector must have elements of the same type, this function will try and coerce elements to the same type, if they are different. Coercion is from lower to higher types from logical to integer to double to character.

What is the default URL Method in R?

Method "default" currently uses method "internal" for file:// URLs and "libcurl" for all others. Which methods support which schemes has varied by R version – currently "internal" supports only file://; "wininet" supports file://, http:// and https://.

How to check vector’S type in R?

A vector’s type can be checked with the typeof () function. Another important property of a vector is its length. This is the number of elements in the vector and can be checked with the function length (). How to Create Vector in R? Vectors are generally created using the c () function.

How do I find the executable path of Firefox?

The executable path of Firefox may be obtained by right-clicking its shortcut on your desktop and selecting Properties. Under the Shortcut tab, seek for the target heading and the executable path is next to it. You may also manually search for firefox.exe in your installation directory and copy the location.


2 Answers

Here is one way you can do this (on Windows but the same idea applicable to other platforms).

Firefox stores this info in a json recovery file in the user's profile directory. It should be straightforward to extract this data except Firefox saves it using a custom version of lz4 compression. I couldn't find a way to automatically uncompress this file using Firefox itself without causing a potential security issue so instead have to rely on a third party tool, dejsonlz4 located here on GitHub. Once you've downloaded and extracted the tool you can run the following. Just keep in mind there may be a small delay between opening / closing a tab and this information being written to the recovery file.

library(jsonlite)
library(dplyr)
library(purrr)

# Filepaths
recovery_filepath <- "C:/Users/{NAME}/appdata/Roaming/Mozilla/Firefox/Profiles/{PROFILE}/sessionstore-backups/recovery.jsonlz4"
filepath_to_tool <- "C:/Tools/dejsonlz4.exe"
output_file <- "rcvry.json"

# Uncompress recovery file
invisible(system(paste(filepath_to_tool, recovery_filepath, paste(dirname(recovery_filepath), output_file, sep = "/"))))

# Read uncompressed file
recovery_info <- read_json(paste(dirname(recovery_filepath), output_file, sep = "/"))

# Extract open tab information (expected result 2 pages)
recovery_info %>%
  pluck("windows", 1, "tabs") %>%
  map_df( ~ map_df(pluck(.x, "entries"),
                   ~ keep(.x, names(.) %in% c("url", "title")))[pluck(.x, "index"), ])

# A tibble: 2 x 2
  url                                                      title                                            
  <chr>                                                    <chr>                                            
1 https://stackoverflow.com/questions/61104900/create-vec~ webbrowser control - Create Vector of Currently ~
2 https://github.com/avih/dejsonlz4                        GitHub - avih/dejsonlz4: Decompress Mozilla Fire~
like image 184
Ritchie Sacramento Avatar answered Oct 23 '22 07:10

Ritchie Sacramento


You can do it using RSQLite package.

Get the path of your Firefox profile.

Go to %APPDATA%\Mozilla\Firefox\Profiles\ in your explorer. You will see the folder of your Firefox profile.

enter image description here

Open the folder and copy the location of the profile folder

Set the db to the copied location adding 'places.sqlite' at the end. Once this is set, you don't have to change the db name next time.

db<- 'C:\\Users\\{user}\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\{profilefolder}\\places.sqlite'

Then, proceed with the following:

library(RSQLite)

con <- dbConnect(drv=RSQLite::SQLite(), dbname=db)
tables <- dbListTables(con)

dt = dbGetQuery(con,'select * from moz_places' )

urls<- dt$url[dt$visit_count>0]
urls

Output:

[1] "https://duckduckgo.com/"                                        
[1] "http://linkedin.com/"                                           
[2] "https://linkedin.com/"                                          
[3] "https://www.linkedin.com/"                                      
[4] "https://www.sciencedirect.com/"                                 
[5] "http://stackexchange.com/"                                      
[6] "https://stackexchange.com/"

Edit:

If you want have the browsing history of present day, use this:

dt = dbGetQuery(con,'select * from moz_places' )

dt$last_visit_date<- (as.Date(as.POSIXct(dt$last_visit_date/1000000, 
                                         origin="1970-01-01")))
urls<- dt$url[dt$visit_count>0 & dt$last_visit_date==Sys.Date()]
urls
like image 30
Mohanasundaram Avatar answered Oct 23 '22 05:10

Mohanasundaram