I have two lists in python list_a
and list_b
. The list_a
have some images links, and the list_b
too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a
, that means all items in list_b
are in list_a
. My initial idea is subtract all items: list_a - list_b = list_c
, where the list_c
are my surplus items. My code is:
list_a = [] list_b = [] list_c = [] arq_b = open('list_b.txt','r') for b in arq_b: list_b.append(b) arq_a = open('list_a.txt','r') for a in arq_a: if a not in arq_b: list_c.append(a) arq_c = open('list_c.txt','w') for c in list_c: arq_c.write(c)
I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022
items in my list_b.txt
and 78.616.777
in my list list_a.txt
. I dont't know the cost of this expression: if a not in arq_b
. But if i execute this code, i think wont finish in this year.
My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.
My question is, there's another way to make this operation more efficiently(Faster)?
list_a
is ordinate but the list_b
not.images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png
you can create one set of the first file contents, then just use difference
or symmetric_difference
depending on what you call a difference
with open("list_a.txt") as f: set_a = set(f) with open("list_b.txt") as f: diffs = set_a.difference(f)
if list_b.txt
contains more items than list_a.txt
you want to swap them or use set_a.symmetric_difference(f)
instead, depending on what you need.
difference(f)
works but still has to construct a new set
internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.
Try using sets:
with open("list_a.txt") as f: set_a = set(f) with open("list_b.txt") as f: set_b = set(f) set_c = set_a - set_b with open("list_c.txt","w") as f: for c in set_c: f.write(c)
The complexity of subtracting two sets is O(n) in the size of the set a.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With