Sort A list of Strings Based on certain field

Tags:

Overview: I have data something like this (each row is a string):

81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M 3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M 61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M

And I want to sort each row based on the first timestamp that is present in the each String, which for these four records is:

2016-07-14 01:28:59

2016-07-14 06:25:32

2016-07-14 08:26:45

2016-07-14 14:29:13

Now I know the sort() method but I don't understand how can I use here to sort all the rows based on this (timestamp) quantity, and I do need to keep the final sorted data in the same format as some other service is going to use it.

I also understand I can make the key() but I am not clear how that can be made to sort on the timestamp field.

225

asked Jul 15 '16 05:07

Vraj Solanki

2 Answers

You can use the list method list.sort which sorts in-place or use the sorted() built-in function which returns a new list. the key argument takes a function which it applies to each element of the sequence before sorting. You can use a combination of string.split(',') and indexing to the second element, e.g. some_list[1], so:

In [8]: list_of_strings
Out[8]: 
['81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M',
 '3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M',
 'B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M',
 '61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M']

In [9]: sorted(list_of_strings, key=lambda s: s.split(',')[1])
Out[9]: 
['3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M',
 '61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M',
 'B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M',
 '81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M']

Or if you'd rather sort a list in place,

list_of_strings
Out[12]: 
['81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M',
 '3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M',
 'B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M',
 '61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M']

list_of_strings.sort(key=lambda s: s.split(',')[1])

list_of_strings
Out[14]: 
['3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M',
 '61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M',
 'B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M',
 '81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M']

178

answered Nov 12 '22 20:11

juanpa.arrivillaga

If the format of the line in itself shall not be changed, maybe (I do not know the wider context of the solution) a simple shell transformation is fitting well (I know it is not a python solution).

So:

$ sort -t, -k2,2 sort_me_on_first_timestamp_field.txt 
3B:3F:B9:0A:83:E6, 2016-07-14 01:28:59, 2016-07-14 01:29:01, -36, 33:33:33:33:33:31,null,^M 
61:01:55:16:B5:52, 2016-07-14 06:25:32, 2016-07-14 06:25:34, -56, 33:33:33:33:33:33,null,^M
B3:C0:6E:77:E5:31, 2016-07-14 08:26:45, 2016-07-14 08:26:47, -65, 33:33:33:33:33:32,null,^M 
81:0A:D7:19:25:7B, 2016-07-14 14:29:13, 2016-07-14 14:29:15, -69, 22:22:22:22:22:23,null,^M

Looks quite OK to me. the -t option tells sort to use the comma as the delimiter, the -k2,2 requests sorting based on the second "field" (it starts counting at one). sometimes it is important to switch with -n to numerical sorting, but here with ISO datetime string of fixed length it should work with lexical sorting.

Again: If you are looking for a pure python solution, I suggest picking the suggested python based answer. This here only suggests a baseline alternative.

Update to "measure" some scenario on some machine - well:

On the "machine of the developer", sorting the sample 4 lines concatenated multiple times into files of 20, 200, 2000, ..., 2,000,000 lines take from 12 milli seconds to 1.7 seconds (for 2 million lines) to sort with the sort command writing to /dev/null and 2 seconds writing to a file.

A naive implementation of @juanpa.arrivillaga's proposed route sorting in-place:

#! /usr/bin/env python
FILE_PATH_IN = './fhf.txt'
NL, FS = '\n', ','

list_of_strings = open(FILE_PATH_IN).read().split(NL)[:-1]
list_of_strings.sort(key=lambda s: s.split(FS)[1])
with open(FILE_PATH_IN + ".out", "wt") as f:
    f.write(NL.join(list_of_strings))

on the same machine takes approx. 3 seconds for the 2 million line case as the other variant (using sorted to generate a new list) does:

#! /usr/bin/env python
FILE_PATH_IN = './fhf.txt'
NL, FS = '\n', ','

list_of_strings = open(FILE_PATH_IN).read().split(NL)[:-1]
with open(FILE_PATH_IN + ".out", "wt") as f:
    f.write(NL.join(sorted(list_of_strings, key=lambda s: s.split(',')[1])))

So suggested is, to use the pure python solution.

answered Nov 12 '22 20:11

Dilettant

Related questions
                            
                                Define a feed_dict in c++ for Tensorflow models
                            
                                How can this be called Pass By Reference?
                            
                                How to make a Windows 10 computer go to sleep with a python script?
                            
                                Remove last two characters from column names of all the columns in Dataframe - Pandas
                            
                                Change URL to another URL using mitmproxy
                            
                                matplotlib how to specify time locator's start-ticking timestamp?
                            
                                Serving .json file to download
                            
                                SQLAlchemy func.count on boolean column
                            
                                Pretty Display JSON data from with Flask [duplicate]
                            
                                Google Sheets API "update" method Http Error 400
                            
                                MongoEngine delete document
                            
                                How to round float down to a given precision?
                            
                                python selenium send_keys CONTROL, 'c' not copying actual text
                            
                                Scheduling an asyncio coroutine from another thread
                            
                                How to assign sounds to channels in Pygame?
                            
                                Python Searching Nested Lists
                            
                                Rescaling to (0,1) certain columns from Pandas Python dataframe
                            
                                SMTP AUTH extension not supported by server - Sending emails through a private host
                            
                                Modify namespace of importing script in Python
                            
                                How to replace a function call in an existing method

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sort A list of Strings Based on certain field

Tags:

python

list

sorting

python-2.7

Vraj Solanki

People also ask

2 Answers

juanpa.arrivillaga

Dilettant

Recent Activity

Donate For Us