I have a text file includes over than 10 million lines. Lines like that:
37024469;196672001;255.0000000000
37024469;196665001;396.0000000000
37024469;196664001;396.0000000000
37024469;196399002;85.0000000000
37024469;160507001;264.0000000000
37024469;160506001;264.0000000000
As you seen, delimiter is ";". i would like to sort this text file by using python according to the second element. I couldnt use split function. Because it causes MemoryError. how can i manage it ?
Although there's no straightforward way to sort a text file, we can achieve the same net result by doing the following: 1) Use the FileSystemObject to read the file into memory; 2) Sort the file alphabetically in memory; 3) Replace the existing contents of the file with the sorted data we have in memory.
To sort CSV by multiple columns, use the sort_values() method. Sorting by multiple columns means if one of the columns has repeated values, then the sort order depends on the 2nd column mentioned under sort_values() method.
Don't sort 10 million lines in memory. Split this up in batches instead:
Run 100 100k line sorts (using the file as an iterator, combined with islice()
or similar to pick a batch). Write out to separate files elsewhere.
Merge the sorted files. Here is an merge generator that you can pass 100 open files and it'll yield lines in sorted order. Write to a new file line by line:
import operator
def mergeiter(*iterables, **kwargs):
"""Given a set of sorted iterables, yield the next value in merged order
Takes an optional `key` callable to compare values by.
"""
iterables = [iter(it) for it in iterables]
iterables = {i: [next(it), i, it] for i, it in enumerate(iterables)}
if 'key' not in kwargs:
key = operator.itemgetter(0)
else:
key = lambda item, key=kwargs['key']: key(item[0])
while True:
value, i, it = min(iterables.values(), key=key)
yield value
try:
iterables[i][0] = next(it)
except StopIteration:
del iterables[i]
if not iterables:
raise
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With