Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data structure for sorted browsing history

Suppose i want to implement the browser history functionality. If i visit the the url for this first time it goes into the history , if i visit the same page again it comes up in the history list. lets say that i only display the top 20 sites, but i can choose to see history say for the last month , last week and so on .

what is the best approach for this ? i would use hash map for inserting / checking if it is visited earlier , but how do i sort efficiently for recently visited, i don't want to use tree map or tree set . also, how can i store history of weeks and months. Is it written on disk when browser closes ? and when i click clear history , how is the data structure deleted ?

like image 933
user775093 Avatar asked Feb 19 '14 21:02

user775093


1 Answers

This is in Java-ish code.

You'll need two data structures: a hash map and a doubly linked list. The doubly linked list contains History objects (which contain a url string and a timestamp) in order sorted by timestamp; the hash map is a Map<String, History>, with urls as the key.

class History {
  History prev
  History next
  String url
  Long timestamp
  void remove() {
    prev.next = next
    next.prev = prev
    next = null
    prev = null
  }
}

When you add a url to the history, check to see if it's in the hash map; if it is then update its timestamp, remove it from the linked list, and add it to the end of the linked list. If it's not in the hash map then add it to the hash map and also add it to the end of the linked list. Adding a url (whether or not it's already in the hash map) is a constant time operation.

class Main {
  History first // first element of the linked list
  History last // last element of the linked list
  HashMap<String, History> map

  void add(String url) {
    History hist = map.get(url)
    if(hist != null) {
      hist.remove()
      hist.timestamp = System.currenttimemillis()
    } else {
      hist = new History(url, System.currenttimemillis())
      map.add(url, hist)
    }
    last.next = hist
    hist.prev = last
    last = hist
  }
}

To get the history from e.g. the last week, traverse the linked list backwards until you hit the correct timestamp.

If thread-safety is a concern, then use a thread-safe queue for urls to be added to the history, and use a single thread to process this queue; this way your map and linked list don't need to be thread-safe i.e. you don't need to worry about locks etc.

For persistence you can serialize / deserialize the linked list; when you deserialize the linked list, reconstruct the hash map by traversing it and adding its elements to the map. Then to clear the history you'd null the list and map in memory and delete the file you serialized the data to.

A more efficient solution in terms of memory consumption and IO (i.e. (de)serialization cost) is to use a serverless database like SQLite; this way you won't need to keep the history in memory, and if you want to get the history from e.g. the last week you'd just need to query the database rather than traversing the linked list. However, SQLite is essentially a treemap (specifically a B-Tree, which is optimized for data stored on disk).

like image 192
Zim-Zam O'Pootertoot Avatar answered Sep 28 '22 06:09

Zim-Zam O'Pootertoot