Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert pandas dataframe to dictionary with multiple keys

I am trying to convert a data frame to dictionary with four keys, which are all from columns. I also have multiple columns that I want to return values from using keys built from those four columns. I worked on a way with loops but end up in memory error. I am curious that is there any more efficient way for this?

The data frame looks like this:

    Service Bill Weight Zone    Resi    UPS FedEx   USPS    DHL
    1DEA           1       2    N      33.02    9999    9999    9999
    1DEA           2       2    N      33.02    9999    9999    9999
    1DEA           3       2    N      33.02    9999    9999    9999

I want to have a key for each of the carriers like this:

    price[('1DEA', '1', '2', 'N', 'UPS')]=33.02
    price[('1DEA', '1', '2', 'N', 'FedEx')]=9999

I have tried this:

    price = {}
    carriers = ['UPS', 'FedEx', 'USPS','DHL'] 
    for carrier in carriers:
        for row in rate_keys.to_dict('records'):
              key = (row['Service'], row['Bill Weight'], row['Zone'], 
              row['Resi'], carrier)
              rate_keys[key] = row[carrier]
like image 604
Nazanin Zinouri Avatar asked Sep 05 '18 19:09

Nazanin Zinouri


People also ask

How do I convert a Pandas DataFrame to a dictionary?

To convert pandas DataFrame to Dictionary object, use to_dict() method, this takes orient as dict by default which returns the DataFrame in format {column -> {index -> value}} . When no orient is specified, to_dict() returns in this format.

How do I convert a two column DataFrame to a dictionary in Pandas?

To create a dictionary from two column values, we first create a Pandas series with the column for keys as index and the other column as values. And then we can apply Pandas' to_dict() function to get dictionary.

How do you create a dictionary from a DataFrame column?

You can use df. to_dict() in order to convert the DataFrame to a dictionary.

Which is faster dictionary or DataFrame?

For certain small, targeted purposes, a dict may be faster. And if that is all you need, then use a dict, for sure! But if you need/want the power and luxury of a DataFrame, then a dict is no substitute. It is meaningless to compare speed if the data structure does not first satisfy your needs.


2 Answers

Set the index to be all but the carrier columns, then stack.

df.set_index(['Service', 'Bill Weight', 'Zone', 'Resi']).stack().to_dict()

{('1DEA', 1, 2, 'N', 'DHL'): 9999.0,
 ('1DEA', 1, 2, 'N', 'FedEx'): 9999.0,
 ('1DEA', 1, 2, 'N', 'UPS'): 33.02,
 ('1DEA', 1, 2, 'N', 'USPS'): 9999.0,
 ('1DEA', 2, 2, 'N', 'DHL'): 9999.0,
 ('1DEA', 2, 2, 'N', 'FedEx'): 9999.0,
 ('1DEA', 2, 2, 'N', 'UPS'): 33.02,
 ('1DEA', 2, 2, 'N', 'USPS'): 9999.0,
 ('1DEA', 3, 2, 'N', 'DHL'): 9999.0,
 ('1DEA', 3, 2, 'N', 'FedEx'): 9999.0,
 ('1DEA', 3, 2, 'N', 'UPS'): 33.02,
 ('1DEA', 3, 2, 'N', 'USPS'): 9999.0}

Comprehension

{(*r[:4], c): v for r in df.values for c, v in zip(df.columns[4:], r[4:])}

{('1DEA', 1, 2, 'N', 'DHL'): 9999,
 ('1DEA', 1, 2, 'N', 'FedEx'): 9999,
 ('1DEA', 1, 2, 'N', 'UPS'): 33.02,
 ('1DEA', 1, 2, 'N', 'USPS'): 9999,
 ('1DEA', 2, 2, 'N', 'DHL'): 9999,
 ('1DEA', 2, 2, 'N', 'FedEx'): 9999,
 ('1DEA', 2, 2, 'N', 'UPS'): 33.02,
 ('1DEA', 2, 2, 'N', 'USPS'): 9999,
 ('1DEA', 3, 2, 'N', 'DHL'): 9999,
 ('1DEA', 3, 2, 'N', 'FedEx'): 9999,
 ('1DEA', 3, 2, 'N', 'UPS'): 33.02,
 ('1DEA', 3, 2, 'N', 'USPS'): 9999}
like image 200
piRSquared Avatar answered Oct 07 '22 12:10

piRSquared


You probably shouldn't update rate_keys while looping on it. I guess the last line of your example script should read

price[key] = row[carrier]
like image 2
jpeg Avatar answered Oct 07 '22 13:10

jpeg