I'm searching for the most appropriate tool for python3.x on Windows to create a Bayesian Network, learn its parameters from data and perform the inference.
The network structure I want to define myself as follows:
It is taken from this paper.
All the variables are discrete (and can take only 2 possible states) except "Size" and "GraspPose", which are continuous and should be modeled as Mixture of Gaussians.
Authors use Expectation-Maximization algorithm to learn the parameters for conditional probability tables and Junction-Tree algorithm to compute the exact inference.
As I understand all is realised in MatLab with Bayes Net Toolbox by Murphy.
I tried to search something similar in python and here are my results:
PyMC seems to be a powerful module, but I have problems with importing it on Windows 64, python 3.3. I get error when I install development version
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
UPDATE:
Any advices and concrete examples will be highly appreciated.
It looks like pomegranate was recently updated to include Bayesian Networks. I haven't tried it myself, but the interface looks nice and sklearn-ish.
Try the bnlearn library, it contains many functions to learn parameters from data and perform the inference.
pip install bnlearn
Your use-case would be like this:
# Import the library import bnlearn # Define the network structure edges = [('task', 'size'), ('lat var', 'size'), ('task', 'fill level'), ('task', 'object shape'), ('task', 'side graspable'), ('size', 'GrasPose'), ('task', 'GrasPose'), ('fill level', 'GrasPose'), ('object shape', 'GrasPose'), ('side graspable', 'GrasPose'), ('GrasPose', 'latvar'), ] # Make the actual Bayesian DAG DAG = bnlearn.make_DAG(edges) # DAG is stored in adjacency matrix print(DAG['adjmat']) # target task size lat var ... side graspable GrasPose latvar # source ... # task False True False ... True True False # size False False False ... False True False # lat var False True False ... False False False # fill level False False False ... False True False # object shape False False False ... False True False # side graspable False False False ... False True False # GrasPose False False False ... False False True # latvar False False False ... False False False # # [8 rows x 8 columns] # No CPDs are in the DAG. Lets see what happens if we print it. bnlearn.print_CPD(DAG) # >[BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot. # Plot DAG. Note that it can be differently orientated if you re-make the plot. bnlearn.plot(DAG)
Now we need the data to learn its parameters. Suppose these are stored in your df. The variable names in the data-file must be present in the DAG.
# Read data df = pd.read_csv('path_to_your_data.csv') # Learn the parameters and store CPDs in the DAG. Use the methodtype your desire. Options are maximumlikelihood or bayes. DAG = bnlearn.parameter_learning.fit(DAG, df, methodtype='maximumlikelihood') # CPDs are present in the DAG at this point. bnlearn.print_CPD(DAG) # Start making inferences now. As an example: q1 = bnlearn.inference.fit(DAG, variables=['lat var'], evidence={'fill level':1, 'size':0, 'task':1})
Below is a working example with a demo dataset (sprinkler). You can play around with this.
# Import example dataset df = bnlearn.import_example('sprinkler') print(df) # Cloudy Sprinkler Rain Wet_Grass # 0 0 0 0 0 # 1 1 0 1 1 # 2 0 1 0 1 # 3 1 1 1 1 # 4 1 1 1 1 # .. ... ... ... ... # 995 1 0 1 1 # 996 1 0 1 1 # 997 1 0 1 1 # 998 0 0 0 0 # 999 0 1 1 1 # [1000 rows x 4 columns] # Define the network structure edges = [('Cloudy', 'Sprinkler'), ('Cloudy', 'Rain'), ('Sprinkler', 'Wet_Grass'), ('Rain', 'Wet_Grass')] # Make the actual Bayesian DAG DAG = bnlearn.make_DAG(edges) # Print the CPDs bnlearn.print_CPD(DAG) # [BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot. # Plot the DAG bnlearn.plot(DAG)
# Parameter learning on the user-defined DAG and input data DAG = bnlearn.parameter_learning.fit(DAG, df) # Print the learned CPDs bnlearn.print_CPD(DAG) # [BNLEARN.print_CPD] Independencies: # (Cloudy _|_ Wet_Grass | Rain, Sprinkler) # (Sprinkler _|_ Rain | Cloudy) # (Rain _|_ Sprinkler | Cloudy) # (Wet_Grass _|_ Cloudy | Rain, Sprinkler) # [BNLEARN.print_CPD] Nodes: ['Cloudy', 'Sprinkler', 'Rain', 'Wet_Grass'] # [BNLEARN.print_CPD] Edges: [('Cloudy', 'Sprinkler'), ('Cloudy', 'Rain'), ('Sprinkler', 'Wet_Grass'), ('Rain', 'Wet_Grass')] # CPD of Cloudy: # +-----------+-------+ # | Cloudy(0) | 0.494 | # +-----------+-------+ # | Cloudy(1) | 0.506 | # +-----------+-------+ # CPD of Sprinkler: # +--------------+--------------------+--------------------+ # | Cloudy | Cloudy(0) | Cloudy(1) | # +--------------+--------------------+--------------------+ # | Sprinkler(0) | 0.4807692307692308 | 0.7075098814229249 | # +--------------+--------------------+--------------------+ # | Sprinkler(1) | 0.5192307692307693 | 0.2924901185770751 | # +--------------+--------------------+--------------------+ # CPD of Rain: # +---------+--------------------+---------------------+ # | Cloudy | Cloudy(0) | Cloudy(1) | # +---------+--------------------+---------------------+ # | Rain(0) | 0.6518218623481782 | 0.33695652173913043 | # +---------+--------------------+---------------------+ # | Rain(1) | 0.3481781376518219 | 0.6630434782608695 | # +---------+--------------------+---------------------+ # CPD of Wet_Grass: # +--------------+--------------------+---------------------+---------------------+---------------------+ # | Rain | Rain(0) | Rain(0) | Rain(1) | Rain(1) | # +--------------+--------------------+---------------------+---------------------+---------------------+ # | Sprinkler | Sprinkler(0) | Sprinkler(1) | Sprinkler(0) | Sprinkler(1) | # +--------------+--------------------+---------------------+---------------------+---------------------+ # | Wet_Grass(0) | 0.7553816046966731 | 0.33755274261603374 | 0.25588235294117645 | 0.37910447761194027 | # +--------------+--------------------+---------------------+---------------------+---------------------+ # | Wet_Grass(1) | 0.2446183953033268 | 0.6624472573839663 | 0.7441176470588236 | 0.6208955223880597 | # +--------------+--------------------+---------------------+---------------------+---------------------+ # Make inference q1 = bnlearn.inference.fit(DAG, variables=['Wet_Grass'], evidence={'Rain':1, 'Sprinkler':0, 'Cloudy':1}) # +--------------+------------------+ # | Wet_Grass | phi(Wet_Grass) | # +==============+==================+ # | Wet_Grass(0) | 0.2559 | # +--------------+------------------+ # | Wet_Grass(1) | 0.7441 | # +--------------+------------------+ print(q1.values) # array([0.25588235, 0.74411765])
More examples can be found on documentation the pages of bnlearn or read the blog.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With