I have 2 CSV files: 'Data' and 'Mapping': <ul> <li>'Mapping' file has 4 columns: <code>Device_Name</code>, <code>GDN</code>, <code>Device_Type</code>, and <code>Device_OS</code>. All four columns are populated.</li> <li>'Data' file has these same columns, with <code>Device_Name</code> column populated and the other three columns blank. </li> <li>I want my Python code to open both files and for each <code>Device_Name</code> in the Data file, map its <code>GDN</code>, <code>Device_Type</code>, and <code>Device_OS</code> value from the Mapping file.</li> </ul> I know how to use dict when only 2 columns are present (1 is needed to be mapped) but I don't know how to accomplish this when 3 columns need to be mapped. Following is the code using which I tried to accomplish mapping of <code>Device_Type</code>: <pre class="prettyprint"><code>x = dict([]) with open("Pricing Mapping_2013-04-22.csv", "rb") as in_file1: file_map = csv.reader(in_file1, delimiter=',') for row in file_map: typemap = [row[0],row[2]] x.append(typemap) with open("Pricing_Updated_Cleaned.csv", "rb") as in_file2, open("Data Scraper_GDN.csv", "wb") as out_file: writer = csv.writer(out_file, delimiter=',') for row in csv.reader(in_file2, delimiter=','): try: row[27] = x[row[11]] except KeyError: row[27] = "" writer.writerow(row) </code></pre> It returns <code>Attribute Error</code>. After some researching, I think I need to create a nested dict, but I don't have any idea how to do this.

A nested dict is a dictionary within a dictionary. A very simple thing. <pre class="prettyprint"><code>>>> d = {} >>> d['dict1'] = {} >>> d['dict1']['innerkey'] = 'value' >>> d['dict1']['innerkey2'] = 'value2' >>> d {'dict1': {'innerkey': 'value', 'innerkey2': 'value2'}} </code></pre> You can also use a <code>defaultdict</code> from the <code>collections</code> package to facilitate creating nested dictionaries. <pre class="prettyprint"><code>>>> import collections >>> d = collections.defaultdict(dict) >>> d['dict1']['innerkey'] = 'value' >>> d # currently a defaultdict type defaultdict(<type 'dict'>, {'dict1': {'innerkey': 'value'}}) >>> dict(d) # but is exactly like a normal dictionary. {'dict1': {'innerkey': 'value'}} </code></pre> <hr> You can populate that however you want. I would recommend in your code something like the following: <pre class="prettyprint"><code>d = {} # can use defaultdict(dict) instead for row in file_map: # derive row key from something # when using defaultdict, we can skip the next step creating a dictionary on row_key d[row_key] = {} for idx, col in enumerate(row): d[row_key][idx] = col </code></pre> <hr> According to your comment: <blockquote> may be above code is confusing the question. My problem in nutshell: I have 2 files a.csv b.csv, a.csv has 4 columns i j k l, b.csv also has these columns. i is kind of key columns for these csvs'. j k l column is empty in a.csv but populated in b.csv. I want to map values of j k l columns using 'i` as key column from b.csv to a.csv file </blockquote> My suggestion would be something like this (without using defaultdict): <pre class="prettyprint"><code>a_file = "path/to/a.csv" b_file = "path/to/b.csv" # read from file a.csv with open(a_file) as f: # skip headers f.next() # get first colum as keys keys = (line.split(',')[0] for line in f) # create empty dictionary: d = {} # read from file b.csv with open(b_file) as f: # gather headers except first key header headers = f.next().split(',')[1:] # iterate lines for line in f: # gather the colums cols = line.strip().split(',') # check to make sure this key should be mapped. if cols[0] not in keys: continue # add key to dict d[cols[0]] = dict( # inner keys are the header names, values are columns (headers[idx], v) for idx, v in enumerate(cols[1:])) </code></pre> Please note though, that for parsing csv files there is a csv module.

UPDATE: For an arbitrary length of a nested dictionary, go to this answer. Use the defaultdict function from the collections. High performance: "if key not in dict" is very expensive when the data set is large. Low maintenance: make the code more readable and can be easily extended. <pre class="prettyprint"><code>from collections import defaultdict target_dict = defaultdict(dict) target_dict[key1][key2] = val </code></pre>

How do you create nested dict in Python?

Tags:

python

dictionary

mapping

nested

python-2.7

I have 2 CSV files: 'Data' and 'Mapping':

'Mapping' file has 4 columns: Device_Name, GDN, Device_Type, and Device_OS. All four columns are populated.
'Data' file has these same columns, with Device_Name column populated and the other three columns blank.
I want my Python code to open both files and for each Device_Name in the Data file, map its GDN, Device_Type, and Device_OS value from the Mapping file.

I know how to use dict when only 2 columns are present (1 is needed to be mapped) but I don't know how to accomplish this when 3 columns need to be mapped.

Following is the code using which I tried to accomplish mapping of Device_Type:

x = dict([]) with open("Pricing Mapping_2013-04-22.csv", "rb") as in_file1:     file_map = csv.reader(in_file1, delimiter=',')     for row in file_map:        typemap = [row[0],row[2]]        x.append(typemap)  with open("Pricing_Updated_Cleaned.csv", "rb") as in_file2, open("Data Scraper_GDN.csv", "wb") as out_file:     writer = csv.writer(out_file, delimiter=',')     for row in csv.reader(in_file2, delimiter=','):          try:               row[27] = x[row[11]]          except KeyError:               row[27] = ""          writer.writerow(row)

It returns Attribute Error.

After some researching, I think I need to create a nested dict, but I don't have any idea how to do this.

405

asked May 02 '13 08:05

atams

2 Answers

A nested dict is a dictionary within a dictionary. A very simple thing.

>>> d = {} >>> d['dict1'] = {} >>> d['dict1']['innerkey'] = 'value' >>> d['dict1']['innerkey2'] = 'value2' >>> d {'dict1': {'innerkey': 'value', 'innerkey2': 'value2'}}

You can also use a defaultdict from the collections package to facilitate creating nested dictionaries.

>>> import collections >>> d = collections.defaultdict(dict) >>> d['dict1']['innerkey'] = 'value' >>> d  # currently a defaultdict type defaultdict(<type 'dict'>, {'dict1': {'innerkey': 'value'}}) >>> dict(d)  # but is exactly like a normal dictionary. {'dict1': {'innerkey': 'value'}}

You can populate that however you want.

I would recommend in your code something like the following:

d = {}  # can use defaultdict(dict) instead  for row in file_map:     # derive row key from something      # when using defaultdict, we can skip the next step creating a dictionary on row_key     d[row_key] = {}      for idx, col in enumerate(row):         d[row_key][idx] = col

According to your comment:

may be above code is confusing the question. My problem in nutshell: I have 2 files a.csv b.csv, a.csv has 4 columns i j k l, b.csv also has these columns. i is kind of key columns for these csvs'. j k l column is empty in a.csv but populated in b.csv. I want to map values of j k l columns using 'i` as key column from b.csv to a.csv file

My suggestion would be something like this (without using defaultdict):

a_file = "path/to/a.csv" b_file = "path/to/b.csv"  # read from file a.csv with open(a_file) as f:     # skip headers     f.next()     # get first colum as keys     keys = (line.split(',')[0] for line in f)   # create empty dictionary: d = {}  # read from file b.csv with open(b_file) as f:     # gather headers except first key header     headers = f.next().split(',')[1:]     # iterate lines     for line in f:         # gather the colums         cols = line.strip().split(',')         # check to make sure this key should be mapped.         if cols[0] not in keys:             continue         # add key to dict         d[cols[0]] = dict(             # inner keys are the header names, values are columns             (headers[idx], v) for idx, v in enumerate(cols[1:]))

Please note though, that for parsing csv files there is a csv module.

180

answered Sep 28 '22 00:09

Inbar Rose

UPDATE: For an arbitrary length of a nested dictionary, go to this answer.

Use the defaultdict function from the collections.

High performance: "if key not in dict" is very expensive when the data set is large.

Low maintenance: make the code more readable and can be easily extended.

from collections import defaultdict  target_dict = defaultdict(dict) target_dict[key1][key2] = val

answered Sep 28 '22 00:09

Junchen

Related questions
                            
                                Pretty Printing a pandas dataframe
                            
                                How to know which Python is running in Jupyter notebook?
                            
                                django change default runserver port
                            
                                How to properly use unit-testing's assertRaises() with NoneType objects? [duplicate]
                            
                                Generate a random date between two other dates
                            
                                Nested defaultdict of defaultdict
                            
                                Does "\d" in regex mean a digit?
                            
                                How do I count unique values inside a list
                            
                                Python Infinity - Any caveats?
                            
                                Why do Python's math.ceil() and math.floor() operations return floats instead of integers?
                            
                                Locking a file in Python
                            
                                Using Django time/date widgets in custom form
                            
                                'str' object does not support item assignment
                            
                                Seeking clarification on apparent contradictions regarding weakly typed languages
                            
                                What are the differences between the threading and multiprocessing modules?
                            
                                Numpy where function multiple conditions
                            
                                What's a standard way to do a no-op in python?
                            
                                Any way to clear python's IDLE window?
                            
                                type object 'datetime.datetime' has no attribute 'datetime'
                            
                                How do you use the ellipsis slicing syntax in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With