Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot topics with bokeh or matplotlib

I'm trying to plot topic visualization from a model. I want to do something like bokeh covariance implementation.

My data is:

data 1: index,                            topics.   
data 2: index, topics, weights(use it for color). 

where topic is just set of words.

How do i give the data to bokeh to plot the above data? From the example, data handling is not intuitive.

With matplot, it looks like this.
Obviously, it is not visually helpful to see what topic correspond to each circle. Here is my matplotlib code:

x = []
y = []
area = []

for row in joined:
      x.append(row['index']) 
      y.append(row['index'])
      #weight.append(row['score'])
      area.append(np.pi * (15 * row['score'])**2)
scale_values = 1000
plt.scatter(x, y, s=scale_values*np.array(area), alpha=0.5)
plt.show()

Any idea/suggestions?

like image 884
sb32134 Avatar asked Mar 28 '14 14:03

sb32134


People also ask

Is bokeh better than matplotlib?

Bokeh vs. While Bokeh and Matplotlib both help you plot data, these two libraries are different tools for different purposes. If your focus is on website interactivity, then Bokeh is the better choice. Matplotlib, on the other hand, provides Python visualizations that integrate well with Jupyter Notebook.

Which library would you prefer for plotting in Python language seaborn or matplotlib or bokeh?

Seaborn and Matplotlib are two of Python's most powerful visualization libraries. Seaborn uses fewer syntax and has stunning default themes and Matplotlib is more easily customizable through accessing the classes.

Is there anything better than matplotlib?

Plotly has several advantages over matplotlib. One of the main advantages is that only a few lines of codes are necessary to create aesthetically pleasing, interactive plots. The interactivity also offers a number of advantages over static matplotlib plots: Saves time when initially exploring your dataset.

Is bokeh better than Plotly?

In this comparison of Bokeh vs Plotly, we can't make out a decisive choice between the two. Though Plotly is good for plotting graphs and visualizing data for insights, it is not good for making dashboards. To make dashboards we can use bokeh and can have very fast dashboards and interactivity.


1 Answers

UPDATE: The answer below is still correct in all major points, but the API has changed slightly to be more explicit as of Bokeh 0.7. In general, things like:

rect(...)

should be replaced with

p = figure(...)
p.rect(...)

Here are the relevant lines from the Les Mis examples, simplified to your case. Let's take a look:

# A "ColumnDataSource" is like a dict, it maps names to columns of data.
# These names are not special we can call the columns whatever we like.
source = ColumnDataSource(
    data=dict(
        x = [row['name'] for row in joined],
        y = [row['name'] for row in joined],
        color = list_of_colors_one_for_each_row, 
    )
)

# We need a list of the categorical coordinates
names = list(set(row['name'] for row in joined))

# rect takes center coords (x,y) and width and height. We will draw 
# one rectangle for each row.
rect('x', 'y',        # use the 'x' and 'y' fields from the data source
     0.9, 0.9,        # use 0.9 for both width and height of each rectangle 
     color = 'color', # use the 'color' field to set the color
     source = source, # use the data source we created above
     x_range = names, # sequence of categorical coords for x-axis
     y_range = names, # sequence of categorical coords for y-axis
)

A few notes:

  • For numeric data x_range and y_range usually get supplied automatically. We have to give them explicitly here because we are using categorial coordinates.

  • You can order the list of names for x_range and y_range however you like, this is the order they are displayed on the plot axis.

  • I'm assuming you want to use categorical coordinates. :) This is what the Les Mes example does. See the bottom of this answer if you want numerical coordinates.

Also, the Les Mis example was a little more complicated (it had a hover tool) which is why we created a ColumnDataSource by hand. If you just need a simple plot you can probably skip creating a data source yourself, and just pass the data in to rect directly:

names = list(set(row['name'] for row in joined))

rect(names,    # x (categorical) coordinate for each rectangle
     names,    # y (categorical) coordinate for each rectangle
     0.9, 0.9, # use 0.9 for both width and height of each rectangle
     color = some_colors, # color for each rect
     x_range = names, # sequence of categorical coords for x-axis
     y_range = names, # sequence of categorical coords for y-axis
)

Another note: this only plots rectangles on the diagonal, where the x- and y-coordinates are the same. That seems to be what you want from your description. But just for completeness, it's possible to plot rectangles that have different x- and y-coordinates. The Les Mis example does this.

Finally, maybe you don't actually want categorical axes? If you just want to use the numeric index of the coordinates, its even simpler:

inds = [row['index'] for row in joined]

rect(inds,    # x-coordinate for each rectangle
     inds,    # y-coordinate for each rectangle
     0.9, 0.9, # use 0.9 for both width and height of each rectangle
     color = some_colors, # color for each rect
)

Edit: Here is a complete runnable example that uses numeric coords:

from bokeh.plotting import * 

output_file("foo.html")

inds = [2, 5, 6, 8, 9]
colors = ["red", "orange", "blue", "green", "#4488aa"]

rect(inds, inds, 1.0, 1.0, color=colors)

show()

and here is one that uses the same values as categorical coords:

from bokeh.plotting import * 

output_file("foo.html")

inds = [str(x) for x in [2, 5, 6, 8, 9]]
colors = ["red", "orange", "blue", "green", "#4488aa"]

rect(inds, inds, 1.0, 1.0, color=colors, x_range=inds, y_range=inds)

show()
like image 53
bigreddot Avatar answered Sep 29 '22 14:09

bigreddot