Overview: I have data something like this (each row is a string):
81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M 3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M 61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M
And I want to sort each row based on the first timestamp that is present in the each String, which for these four records is:
2016-07-14 01:28:59
2016-07-14 06:25:32
2016-07-14 08:26:45
2016-07-14 14:29:13
Now I know the sort()
method but I don't understand how can I use here to sort all the rows based on this (timestamp) quantity, and I do need to keep the final sorted data in the same format as some other service is going to use it.
I also understand I can make the key()
but I am not clear how that can be made to sort on the timestamp field.
In Python, there are two ways, sort() and sorted() , to sort lists ( list ) in ascending or descending order. If you want to sort strings ( str ) or tuples ( tuple ), use sorted() .
You can use Nested for loop with if statement to get the sort a list in Python without sort function. This is not the only way to do it, you can use your own logic to get it done.
Python does not guarantee that the sort() function will work if a list contains items of different data types. As long as the items can be compared using the < comparison operator, an attempt will be made to sort the list. Otherwise, an error or exception may be generated.
You can use the list method list.sort
which sorts in-place or use the sorted()
built-in function which returns a new list. the key
argument takes a function which it applies to each element of the sequence before sorting. You can use a combination of string.split(',')
and indexing to the second element, e.g. some_list[1], so:
In [8]: list_of_strings
Out[8]:
['81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M',
'3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M',
'B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M',
'61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M']
In [9]: sorted(list_of_strings, key=lambda s: s.split(',')[1])
Out[9]:
['3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M',
'61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M',
'B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M',
'81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M']
Or if you'd rather sort a list in place,
list_of_strings
Out[12]:
['81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M',
'3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M',
'B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M',
'61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M']
list_of_strings.sort(key=lambda s: s.split(',')[1])
list_of_strings
Out[14]:
['3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M',
'61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M',
'B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M',
'81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M']
If the format of the line in itself shall not be changed, maybe (I do not know the wider context of the solution) a simple shell transformation is fitting well (I know it is not a python solution).
So:
$ sort -t, -k2,2 sort_me_on_first_timestamp_field.txt
3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M
61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M
B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M
81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M
Looks quite OK to me. the -t option tells sort to use the comma as the delimiter, the -k2,2 requests sorting based on the second "field" (it starts counting at one). sometimes it is important to switch with -n to numerical sorting, but here with ISO datetime string of fixed length it should work with lexical sorting.
Again: If you are looking for a pure python solution, I suggest picking the suggested python based answer. This here only suggests a baseline alternative.
Update to "measure" some scenario on some machine - well:
On the "machine of the developer", sorting the sample 4 lines concatenated multiple times into files of 20, 200, 2000, ..., 2,000,000 lines take from 12 milli seconds to 1.7 seconds (for 2 million lines) to sort with the sort command writing to /dev/null and 2 seconds writing to a file.
A naive implementation of @juanpa.arrivillaga's proposed route sorting in-place:
#! /usr/bin/env python
FILE_PATH_IN = './fhf.txt'
NL, FS = '\n', ','
list_of_strings = open(FILE_PATH_IN).read().split(NL)[:-1]
list_of_strings.sort(key=lambda s: s.split(FS)[1])
with open(FILE_PATH_IN + ".out", "wt") as f:
f.write(NL.join(list_of_strings))
on the same machine takes approx. 3 seconds for the 2 million line case as the other variant (using sorted to generate a new list) does:
#! /usr/bin/env python
FILE_PATH_IN = './fhf.txt'
NL, FS = '\n', ','
list_of_strings = open(FILE_PATH_IN).read().split(NL)[:-1]
with open(FILE_PATH_IN + ".out", "wt") as f:
f.write(NL.join(sorted(list_of_strings, key=lambda s: s.split(',')[1])))
So suggested is, to use the pure python solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With