python question here:
I'm running a sort function to sort some data by dates, and get incorrect output. I've prepared a short version of my code with some sample data to show the error (the full code is uninteresting and the full real data is proprietary).
Here is the code:
import operator
mylist = [['CustomerID_12345', 'TransactionID_1001', '12/31/2012'],
['CustomerID_12345', 'TransactionID_1002', '3/12/2013'],
['CustomerID_12345', 'TransactionID_1003', '1/7/2013'],
['CustomerID_12345', 'TransactionID_1004', '12/31/2012']]
sorted_list = sorted(mylist, key=operator.itemgetter(2))
print type(mylist)
print len(mylist)
for i in mylist:
print i
print "" # just for a line break for convenience
for i in sorted_list:
print i
and the output is:
<type 'list'>
4
['CustomerID_12345', 'TransactionID_1001', '12/31/2012']
['CustomerID_12345', 'TransactionID_1002', '3/12/2013']
['CustomerID_12345', 'TransactionID_1003', '1/7/2013']
['CustomerID_12345', 'TransactionID_1004', '12/31/2012']
['CustomerID_12345', 'TransactionID_1003', '1/7/2013']
['CustomerID_12345', 'TransactionID_1001', '12/31/2012']
['CustomerID_12345', 'TransactionID_1004', '12/31/2012']
['CustomerID_12345', 'TransactionID_1002', '3/12/2013']
the first block is the original data and the second is the output. Since I tried to sort by date it's easy to see the sort didn't work properly.
Can someone help explain the error and suggest how to correct it? Thanks in advance :)
This is because python treats them as strings and not as dates.
This is because '1' is less than '2' which is less than '3' Also '/' is less than digits so there is your problem.
Instead try to compare them as dates, use the datetime module.
Here is a sample:
from datetime import datetime
your_date = datetime.strptime('1/1/2013', "%m/%d/%Y")
my_date = datetime.strptime('12/3/2011', "%m/%d/%Y")
print your_date > my_date
[Out]: True
Sort by date:
from datetime import datetime
mylist = [['CustomerID_12345', 'TransactionID_1001', '12/31/2012'],
['CustomerID_12345', 'TransactionID_1002', '3/12/2013'],
['CustomerID_12345', 'TransactionID_1003', '1/7/2013'],
['CustomerID_12345', 'TransactionID_1004', '12/31/2012']]
sorted_list = sorted(mylist, key=lambda x: datetime.strptime(x[2],'%m/%d/%Y'))
for item in sorted_list:
print item
Or you can store the date as datetime in the first place. If they are strings for good reason then you can first add a datetime column:
for item in mylist:
item.append(datetime.strptime(item[2], '%m/%d/%Y'))
sorted_list = sorted(mylist, key=lambda x: x[3])
for item in sorted_list: print item[:3]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With