I am trying to convert a data frame to dictionary with four keys, which are all from columns. I also have multiple columns that I want to return values from using keys built from those four columns. I worked on a way with loops but end up in memory error. I am curious that is there any more efficient way for this?
The data frame looks like this:
Service Bill Weight Zone Resi UPS FedEx USPS DHL
1DEA 1 2 N 33.02 9999 9999 9999
1DEA 2 2 N 33.02 9999 9999 9999
1DEA 3 2 N 33.02 9999 9999 9999
I want to have a key for each of the carriers like this:
price[('1DEA', '1', '2', 'N', 'UPS')]=33.02
price[('1DEA', '1', '2', 'N', 'FedEx')]=9999
I have tried this:
price = {}
carriers = ['UPS', 'FedEx', 'USPS','DHL']
for carrier in carriers:
for row in rate_keys.to_dict('records'):
key = (row['Service'], row['Bill Weight'], row['Zone'],
row['Resi'], carrier)
rate_keys[key] = row[carrier]
To convert pandas DataFrame to Dictionary object, use to_dict() method, this takes orient as dict by default which returns the DataFrame in format {column -> {index -> value}} . When no orient is specified, to_dict() returns in this format.
To create a dictionary from two column values, we first create a Pandas series with the column for keys as index and the other column as values. And then we can apply Pandas' to_dict() function to get dictionary.
You can use df. to_dict() in order to convert the DataFrame to a dictionary.
For certain small, targeted purposes, a dict may be faster. And if that is all you need, then use a dict, for sure! But if you need/want the power and luxury of a DataFrame, then a dict is no substitute. It is meaningless to compare speed if the data structure does not first satisfy your needs.
Set the index to be all but the carrier columns, then stack.
df.set_index(['Service', 'Bill Weight', 'Zone', 'Resi']).stack().to_dict()
{('1DEA', 1, 2, 'N', 'DHL'): 9999.0,
('1DEA', 1, 2, 'N', 'FedEx'): 9999.0,
('1DEA', 1, 2, 'N', 'UPS'): 33.02,
('1DEA', 1, 2, 'N', 'USPS'): 9999.0,
('1DEA', 2, 2, 'N', 'DHL'): 9999.0,
('1DEA', 2, 2, 'N', 'FedEx'): 9999.0,
('1DEA', 2, 2, 'N', 'UPS'): 33.02,
('1DEA', 2, 2, 'N', 'USPS'): 9999.0,
('1DEA', 3, 2, 'N', 'DHL'): 9999.0,
('1DEA', 3, 2, 'N', 'FedEx'): 9999.0,
('1DEA', 3, 2, 'N', 'UPS'): 33.02,
('1DEA', 3, 2, 'N', 'USPS'): 9999.0}
Comprehension
{(*r[:4], c): v for r in df.values for c, v in zip(df.columns[4:], r[4:])}
{('1DEA', 1, 2, 'N', 'DHL'): 9999,
('1DEA', 1, 2, 'N', 'FedEx'): 9999,
('1DEA', 1, 2, 'N', 'UPS'): 33.02,
('1DEA', 1, 2, 'N', 'USPS'): 9999,
('1DEA', 2, 2, 'N', 'DHL'): 9999,
('1DEA', 2, 2, 'N', 'FedEx'): 9999,
('1DEA', 2, 2, 'N', 'UPS'): 33.02,
('1DEA', 2, 2, 'N', 'USPS'): 9999,
('1DEA', 3, 2, 'N', 'DHL'): 9999,
('1DEA', 3, 2, 'N', 'FedEx'): 9999,
('1DEA', 3, 2, 'N', 'UPS'): 33.02,
('1DEA', 3, 2, 'N', 'USPS'): 9999}
You probably shouldn't update rate_keys
while looping on it. I guess the last line of your example script should read
price[key] = row[carrier]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With