Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting Data from website

Tags:

cookies

matlab

So the website constantly changes the data that it displays, and I want to get that data every several seconds and log it in a spreadsheet. The problem is in order to get to the page, I have to have a cookie which I get when I log in. Unfortunately I only know how to program in MATLAB. MATLAB has a function for this, urlread, but it doesn't deal with cookies. What can I do to get to that page? Can anyone help me with this? Point me into a direction where a programing noob like me can succeed please.

like image 942
gevo12321 Avatar asked Sep 20 '11 17:09

gevo12321


2 Answers

You could use wget to download content while using HTTP cookies. I will be using StackOverflow.com as example target. Here are the steps to follow:

1) Obtain the wget command tool. For Mac or Linux, I think it is already available. On Windows, you can get it from the GnuWin32 project or from one of the many other ports (Cygwin, MinGW/MSYS, etc..).

2) Next we need to obtain an authenticated cookie by logging into the website in question. You can use your preferred browser for this.

In Internet Explorer, you can produce it using "File menu > Import and Export > Export Cookies". In Firefox, I used the Cookie Exporter extension to export cookies to text file. For Chrome, there should be similar extensions

Obviously you only need to do this step once, as long as the cookies have not yet expired!

3) Once you locate the cookie file exported, we can use wget to fetch the web page and provide it with this cookie. This of course can be performed from inside MATLAB using the SYSTEM function:

%# fetch page and save it to disk
url = 'http://stackoverflow.com/';
cmd = ['wget --cookies=on --load-cookies=./cookies.txt ' url];
system(cmd, '-echo');

%# process page: I am simply viewing it using embedded browser
web( ['file:///' strrep(fullfile(pwd,'index.html'),'\','/')] )

Parsing the web page is a whole other topic that I will not go into. Once you get the data you seek, you can interact with Excel spreadsheets using the XLSREAD and XLSWRITE functions.

4) Finally you can write this in a function, and make it execute on regular intervals using the TIMER function

like image 156
Amro Avatar answered Oct 23 '22 03:10

Amro


Try using the java.net.* classes.

You should be able to use them directly in the MATLAB workspace, as described here: http://www.mathworks.co.uk/help/techdoc/matlab_external/f4863.html

like image 41
Nzbuu Avatar answered Oct 23 '22 03:10

Nzbuu