Okay so I am relatively new to programming and this has me absolutely stumped. Im scraping data from a website and the data changes every week. I want to run my scraping process each time the data changes starting back on 09-09-2015 and running to current.
I know how to do this easily running thru every number like 0909 then 0910 then 0911 but that is not what I need as that will be requesting way too many requests from the server that are pointless.
Here is the format of the URL http://www.myexamplesite.com/?date=09092015
I know the simple:
for i in range(startDate, endDate):
url = 'http://www.myexamplesite.com/?date={}'.format(i)
driver.get(url)
But one thing i've never been able to figure out is manipulate pythons dateTime to accurately reflect the format the website uses.
i.e: 09092015 09162015 09232015 09302015 10072015 ... 09272017
If all else fails I only need to do this once so it wouldnt take too long to just ignore the loop altogether and just manually enter the date I wish to scrape from and then just append all of my dataframes together. Im mainly curious on how to manipulate the datetime function in this sense for future projects that may require more data.
A good place to start are datetime
, date
and timedelta
objects docs.
First, let's construct our starting date and ending date (today):
>>> from datetime import date, timedelta
>>> start = date(2015, 9, 9)
>>> end = date.today()
>>> start, end
(datetime.date(2015, 9, 9), datetime.date(2017, 9, 27))
Now let's define the unit of increment -- one day:
>>> day = timedelta(days=1)
>>> day
datetime.timedelta(1)
A nice thing about dates (date
/datetime
) and time deltas (timedelta
) is they and can be added:
>>> start + day
datetime.date(2015, 9, 10)
We can also use format()
to get that date in a human-readable form:
>>> "{date.day:02}{date.month:02}{date.year}".format(date=start+day)
'10092015'
So, when we put all this together:
from datetime import date, timedelta
start = date(2015, 9, 9)
end = date.today()
week = timedelta(days=7)
mydate = start
while mydate < end:
print("{date.day:02}{date.month:02}{date.year}".format(date=mydate))
mydate += week
we get a simple iteration over dates starting with 2015-09-09
and ending with today, incremented by 7 days (a week):
09092015
16092015
23092015
30092015
07102015
...
Take a look here
https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
You can see the table pictured here for formatting dates and times and the usage.
Of course, if the format of the dates changes in the future or you are parsing different strings, you will have to make code changes. There really is no way around that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With