Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterating through a daterange in python

Okay so I am relatively new to programming and this has me absolutely stumped. Im scraping data from a website and the data changes every week. I want to run my scraping process each time the data changes starting back on 09-09-2015 and running to current.

I know how to do this easily running thru every number like 0909 then 0910 then 0911 but that is not what I need as that will be requesting way too many requests from the server that are pointless.

Here is the format of the URL http://www.myexamplesite.com/?date=09092015

I know the simple:

for i in range(startDate, endDate):
    url = 'http://www.myexamplesite.com/?date={}'.format(i)
    driver.get(url)

But one thing i've never been able to figure out is manipulate pythons dateTime to accurately reflect the format the website uses.

i.e: 09092015 09162015 09232015 09302015 10072015 ... 09272017

If all else fails I only need to do this once so it wouldnt take too long to just ignore the loop altogether and just manually enter the date I wish to scrape from and then just append all of my dataframes together. Im mainly curious on how to manipulate the datetime function in this sense for future projects that may require more data.

like image 226
Stu Kruske Avatar asked Sep 27 '17 18:09

Stu Kruske


2 Answers

A good place to start are datetime, date and timedelta objects docs.

First, let's construct our starting date and ending date (today):

>>> from datetime import date, timedelta
>>> start = date(2015, 9, 9)
>>> end = date.today()
>>> start, end
(datetime.date(2015, 9, 9), datetime.date(2017, 9, 27))

Now let's define the unit of increment -- one day:

>>> day = timedelta(days=1)
>>> day
datetime.timedelta(1)

A nice thing about dates (date/datetime) and time deltas (timedelta) is they and can be added:

>>> start + day
datetime.date(2015, 9, 10)

We can also use format() to get that date in a human-readable form:

>>> "{date.day:02}{date.month:02}{date.year}".format(date=start+day)
'10092015'

So, when we put all this together:

from datetime import date, timedelta

start = date(2015, 9, 9)
end = date.today()
week = timedelta(days=7)

mydate = start
while mydate < end:
    print("{date.day:02}{date.month:02}{date.year}".format(date=mydate))
    mydate += week

we get a simple iteration over dates starting with 2015-09-09 and ending with today, incremented by 7 days (a week):

09092015
16092015
23092015
30092015
07102015
...
like image 117
randomir Avatar answered Oct 19 '22 09:10

randomir


Take a look here

https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior

You can see the table pictured here for formatting dates and times and the usage.

Of course, if the format of the dates changes in the future or you are parsing different strings, you will have to make code changes. There really is no way around that.

like image 29
Tyler Nichols Avatar answered Oct 19 '22 11:10

Tyler Nichols