I want to use pandas to read a csv file that contains nodes and their attributes. Not all nodes have every attribute, and missing attributes are simply missing from the csv file. When pandas reads the csv file, the missing values appear as nan
. I want to add the nodes in bulk from the dataframe, but avoid adding attributes that are nan
.
For example, here is a sample csv file called mwe.csv
:
Name,Cost,Depth,Class,Mean,SD,CST,SL,Time
Manuf_0001,39.00,1,Manuf,,,12,,10.00
Manuf_0002,36.00,1,Manuf,,,8,,10.00
Part_0001,12.00,2,Part,,,,,28.00
Part_0002,5.00,2,Part,,,,,15.00
Part_0003,9.00,2,Part,,,,,10.00
Retail_0001,0.00,0,Retail,253,36.62,0,0.95,0.00
Retail_0002,0.00,0,Retail,45,1,0,0.95,0.00
Retail_0003,0.00,0,Retail,75,2,0,0.95,0.00
Here's how I'm currently handling this:
import pandas as pd
import numpy as np
import networkx as nx
node_df = pd.read_csv('mwe.csv')
graph = nx.DiGraph()
graph.add_nodes_from(node_df['Name'])
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['Cost'])), 'nodeCost')
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['Mean'])), 'avgDemand')
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['SD'])), 'sdDemand')
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['CST'])), 'servTime')
nx.set_node_attributes(graph, dict(zip(node_df['Name'], node_df['SL'])), 'servLevel')
# Loop through all nodes and all attributes and remove NaNs.
for i in graph.nodes:
for k, v in list(graph.nodes[i].items()):
if np.isnan(v):
del graph.nodes[i][k]
It works, but it's clunky. Is there a better way, e.g., a way to avoid the nan
s when adding the nodes, rather than deleting the nan
s afterwards?
You can leverage the power of Pandas to do your bidding in this case. So, I have created this function, which converts your DataFrame with two key and value columns to a series, then drop elements with NaNs, and finally changes it to a dictionary
def create_node_attribs(key_col, val_col):
# Upto you if you want to pass the dataframe as argument
# In your case, since this was the only df, I only passed the columns
global node_df
return Series(node_df[val_col].values,
index=node_df[key_col]).dropna().to_dict()
Here is the complete code
import pandas as pd
import networkx as nx
from pandas import Series
node_df = pd.read_csv('mwe.csv')
graph = nx.DiGraph()
def create_node_attribs(key_col, val_col):
# Upto you if you want to pass the dataframe as argument
# In your case, since this was the only df, I only passed the columns
global node_df
return Series(node_df[val_col].values,
index=node_df[key_col]).dropna().to_dict()
graph.add_nodes_from(node_df['Name'])
nx.set_node_attributes(graph, create_node_attribs('Name', 'Cost'), 'nodeCost')
nx.set_node_attributes(graph, create_node_attribs('Name', 'Mean'), 'avgDemand')
nx.set_node_attributes(graph, create_node_attribs('Name', 'SD'), 'sdDemand')
nx.set_node_attributes(graph, create_node_attribs('Name', 'CST'), 'servTime')
nx.set_node_attributes(graph, create_node_attribs('Name', 'SL'), 'servLevel')
Link to Google Colab Notebook with the code.
Also, see this answer, for more information about time comparison of the current method used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With