Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a two column csv file to a dictionary in python

Tags:

python

pandas

I have the following csv:

Name1    Name2

JSMITH    J Smith
ASMITH    A Smith

How can I read it into a dictionary so that the output is

dict = {'JSMITH':'J Smith', 'ASMITH': 'A Smith'}

I have used:

df= pd.read_csv('data.csv')

data_dict = df.to_dict(orient='list')

but it gives me

{'Name1': ['JSMITH','ASMITH'],'Name2': ['J Smith', 'A Smith']}

I am then hoping to use it in a map function in pandas such as:

df2['Name'] = df2['Name'].replace(data_dict, regex=True)

Any help would be much appreciated!

like image 851
SOK Avatar asked Oct 23 '25 10:10

SOK


2 Answers

Trick if you always have only two columns:

dict(df.itertuples(False,None))

Or make it a pandas.Series and use to_dict:

df.set_index("Name1")["Name2"].to_dict()

Output:

{'ASMITH': 'A Smith', 'JSMITH': 'J Smith'}

Note that if you need a mapper to a pd.Series.replace, Series works just as fine as a dict.

s = df.set_index("Name1")["Name2"]
df["Name1"].replace(s, regex=True)

0    J Smith
1    A Smith
Name: Name1, dtype: object

Which also means that you can remove to_dict and cut some overhead:

large_df = df.sample(n=100000, replace=True)

%timeit large_df.set_index("Name1")["Name2"]
# 4.76 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit large_df.set_index("Name1")["Name2"].to_dict()
# 20.2 ms ± 976 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
like image 86
Chris Avatar answered Oct 25 '25 23:10

Chris


You can use zip and dict

dict(zip(df.Name1, df.Name2))
like image 36
Dishin H Goyani Avatar answered Oct 25 '25 23:10

Dishin H Goyani



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!