How to remap ids to consecutive numbers quickly

Question

I have a large csv file with lines that looks like

stringa,stringb
stringb,stringc
stringd,stringa

I need to convert it so the ids are consecutively numbered from 0. In this case the following would work

0,1
1,2
3,0

My current code looks like:

import csv
names = {}
counter = 0
with open('foo.csv', 'rb') as csvfile:
     reader = csv.reader(csvfile)
     for row in reader:
         if row[0] in names:
             id1 =  row[0]
         else:
             names[row[0]] = counter
             id1 = counter
             counter += 1
         if row[1] in names:
             id2 = row[1]
         else:
             names[row[1]] = counter
             id2 = counter
             counter += 1
     print id1, id2

Python dicts use a lot of memory sadly and my input is large.

What can I do when the input is too large for the dict to fit in memory

I would also be interested if there is a better/faster way to solve this problem in general.

piRSquared · Accepted Answer

df = pd.DataFrame([['a', 'b'], ['b', 'c'], ['d', 'a']])

v = df.stack().unique()
v.sort()
f = pd.factorize(v)
m = pd.Series(f[0], f[1])

df.stack().map(m).unstack()

enter image description here

How to remap ids to consecutive numbers quickly

Tags:

python

pandas

dataframe

graffe

1 Answers

piRSquared

Recent Activity

Donate For Us

How to remap ids to consecutive numbers quickly

Tags:

python

pandas

dataframe

graffe

1 Answers

piRSquared

Related questions

Recent Activity

Donate For Us