I am looking for a way in python to make a dictionary of dictionaries based on the desired structure dynamically.
I have the data bellow:
{'weather': ['windy', 'calm'], 'season': ['summer', 'winter', 'spring', 'autumn'], 'lateness': ['ontime', 'delayed']}
I give the structure I want them to be like:
['weather', 'season', 'lateness']
and finally get the data in this format:
{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}},
'windy': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}}}
This is the manual way that I thought for achieving this:
dtree = {}
for cat1 in category_cases['weather']:
dtree.setdefault(cat1, {})
for cat2 in category_cases['season']:
dtree[cat1].setdefault(cat2, {})
for cat3 in category_cases['lateness']:
dtree[cat1][cat2].setdefault(cat3, 0)
Can you think of a way to be able to just change the structure I wrote and having the desired result? Keep in mind that the structure might not be the same size every time.
Also if you think of another way except dictionaries that I can access the result, it will also work for me.
Dictionaries are sometimes found in other languages as “associative memories” or “associative arrays”. Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by keys, which can be any immutable type; strings and numbers can always be keys.
Python Dictionary get() Method The get() method returns the value of the item with the specified key.
Definition of decision tree. : a tree diagram which is used for making decisions in business or computer programming and in which the branches represent choices with associated risks, costs, results, or probabilities.
Now, based on this data set, Python can create a decision tree that can be used to decide if any new shows are worth attending to. How Does it Work? To make a decision tree, all data has to be numerical. We have to convert the non numerical columns 'Nationality' and 'Go' into numerical values.
To make a decision tree, all data has to be numerical. We have to convert the non numerical columns 'Nationality' and 'Go' into numerical values. Pandas has a map () method that takes a dictionary with information on how to convert the values. Means convert the values 'UK' to 0, 'USA' to 1, and 'N' to 2.
Despite having many benefits, decision trees are not suited to all types of data, e.g. continuous variables or imbalanced datasets. They are popular in data analytics and machine learning, with practical applications across sectors from health, to finance, and technology.
If you're not avert to using external packages, pandas.DataFrame
might be a viable candidate since it looks like you'll be using a table:
import pandas as pd
df = pd.DataFrame(
index=pd.MultiIndex.from_product([d['weather'], d['season']]),
columns=d['lateness'], data=0
)
Result:
ontime delayed
windy summer 0 0
winter 0 0
spring 0 0
autumn 0 0
calm summer 0 0
winter 0 0
spring 0 0
autumn 0 0
And you can easily make changes with indexing:
df.loc[('windy', 'summer'), 'ontime'] = 1
df.loc['calm', 'autumn']['delayed'] = 2
# Result:
ontime delayed
windy summer 1 0
winter 0 0
spring 0 0
autumn 0 0
calm summer 0 0
winter 0 0
spring 0 0
autumn 0 2
The table can be constructed dynamically if you will always use the last key for columns, assuming your keys are in the desired insertion order:
df = pd.DataFrame(
index=pd.MultiIndex.from_product(list(d.values())[:-1]),
columns=list(d.values())[-1], data=0
)
Since you're interested in pandas
, given your structure, I would also recommend giving a good read over on MultiIndex and Advance Indexing, just to get some idea on how to play around with your data. Here are some examples:
# Gets the sum of 'delayed' items in all of 'calm'
# Filters all the 'delayed' data in 'calm'
df.loc['calm', 'delayed']
# summer 5
# winter 0
# spring 0
# autumn 2
# Name: delayed, dtype: int64
# Apply a sum:
df.loc['calm', 'delayed'].sum()
# 7
# Gets the mean of all 'summer' (notice the `slice(None)` is required to return all of the 'calm' and 'windy' group)
df.loc[(slice(None), 'summer'), :].mean()
# ontime 0.5
# delayed 2.5
# dtype: float64
It definitely is very handy and versatile, but before you get too deep into it you might will definitely want to read up first, the framework might take some getting used to.
Otherwise, if you still prefer dict
, there's nothing wrong with that. Here's a recursive function to generate based on the given keys (assuming your keys are in the desired insertion order):
def gen_dict(d, level=0):
if level >= len(d):
return 0
key = tuple(d.keys())[level]
return {val: gen_dict(d, level+1) for val in d.get(key)}
gen_dict(d)
Result:
{'calm': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}},
'windy': {'autumn': {'delayed': 0, 'ontime': 0},
'spring': {'delayed': 0, 'ontime': 0},
'summer': {'delayed': 0, 'ontime': 0},
'winter': {'delayed': 0, 'ontime': 0}}}
I think this might work for you.
def get_output(category, order, i=0):
output = {}
for key in order[i:i+1]:
for value in category[key]:
output[value] = get_output(category, order, i+1)
if output == {}:
return 0
return output
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With