I would like to use R to extract all URLs that are currently opened in a web browser. Consider the following example:
How could I extract these two URLs from within R, to get the following output?
my_urls <- c("https://www.google.de/", "https://www.amazon.com/")
my_urls
### [1] "https://www.google.de/" "https://www.amazon.com/"
After some research, I'm suspecting that this may be possible with the RSelenium package, but unfortunately I couldn't figure out the appropriate R code.
How to Create Vector in R? Vectors are generally created using the c() function. Since, a vector must have elements of the same type, this function will try and coerce elements to the same type, if they are different. Coercion is from lower to higher types from logical to integer to double to character.
Method "default" currently uses method "internal" for file:// URLs and "libcurl" for all others. Which methods support which schemes has varied by R version – currently "internal" supports only file://; "wininet" supports file://, http:// and https://.
A vector’s type can be checked with the typeof () function. Another important property of a vector is its length. This is the number of elements in the vector and can be checked with the function length (). How to Create Vector in R? Vectors are generally created using the c () function.
The executable path of Firefox may be obtained by right-clicking its shortcut on your desktop and selecting Properties. Under the Shortcut tab, seek for the target heading and the executable path is next to it. You may also manually search for firefox.exe in your installation directory and copy the location.
Here is one way you can do this (on Windows but the same idea applicable to other platforms).
Firefox stores this info in a json recovery file in the user's profile directory. It should be straightforward to extract this data except Firefox saves it using a custom version of lz4 compression. I couldn't find a way to automatically uncompress this file using Firefox itself without causing a potential security issue so instead have to rely on a third party tool, dejsonlz4
located here on GitHub. Once you've downloaded and extracted the tool you can run the following. Just keep in mind there may be a small delay between opening / closing a tab and this information being written to the recovery file.
library(jsonlite)
library(dplyr)
library(purrr)
# Filepaths
recovery_filepath <- "C:/Users/{NAME}/appdata/Roaming/Mozilla/Firefox/Profiles/{PROFILE}/sessionstore-backups/recovery.jsonlz4"
filepath_to_tool <- "C:/Tools/dejsonlz4.exe"
output_file <- "rcvry.json"
# Uncompress recovery file
invisible(system(paste(filepath_to_tool, recovery_filepath, paste(dirname(recovery_filepath), output_file, sep = "/"))))
# Read uncompressed file
recovery_info <- read_json(paste(dirname(recovery_filepath), output_file, sep = "/"))
# Extract open tab information (expected result 2 pages)
recovery_info %>%
pluck("windows", 1, "tabs") %>%
map_df( ~ map_df(pluck(.x, "entries"),
~ keep(.x, names(.) %in% c("url", "title")))[pluck(.x, "index"), ])
# A tibble: 2 x 2
url title
<chr> <chr>
1 https://stackoverflow.com/questions/61104900/create-vec~ webbrowser control - Create Vector of Currently ~
2 https://github.com/avih/dejsonlz4 GitHub - avih/dejsonlz4: Decompress Mozilla Fire~
You can do it using RSQLite package.
Get the path of your Firefox profile.
Go to %APPDATA%\Mozilla\Firefox\Profiles\ in your explorer. You will see the folder of your Firefox profile.
Open the folder and copy the location of the profile folder
Set the db to the copied location adding 'places.sqlite' at the end. Once this is set, you don't have to change the db name next time.
db<- 'C:\\Users\\{user}\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\{profilefolder}\\places.sqlite'
Then, proceed with the following:
library(RSQLite)
con <- dbConnect(drv=RSQLite::SQLite(), dbname=db)
tables <- dbListTables(con)
dt = dbGetQuery(con,'select * from moz_places' )
urls<- dt$url[dt$visit_count>0]
urls
Output:
[1] "https://duckduckgo.com/"
[1] "http://linkedin.com/"
[2] "https://linkedin.com/"
[3] "https://www.linkedin.com/"
[4] "https://www.sciencedirect.com/"
[5] "http://stackexchange.com/"
[6] "https://stackexchange.com/"
Edit:
If you want have the browsing history of present day, use this:
dt = dbGetQuery(con,'select * from moz_places' )
dt$last_visit_date<- (as.Date(as.POSIXct(dt$last_visit_date/1000000,
origin="1970-01-01")))
urls<- dt$url[dt$visit_count>0 & dt$last_visit_date==Sys.Date()]
urls
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With