Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas - plot sorted column to increasing integer index

Tags:

python

pandas

Let's say I have a pandas series with numerical values. What's the shortest way to plot the sorted series against an increasing integer index?

The plot should show:

x-axis: 0,1,2,3,4,...

y-axis: the sorted values of the series.

(please notice that I cannot plot it against the series' index, because the index is not necessarily an increasing index. In my case it's some id that I use for different reasons)

Thanks

like image 223
nivniv Avatar asked Aug 12 '15 20:08

nivniv


3 Answers

This will first sort the series and then plot, ignoring the index of the series:

ts = pd.Series(np.random.randn(100), index=pd.date_range('1/1/2000', periods=100)) ts.sort_values().plot(use_index=False)

like image 142
paulperry Avatar answered Nov 10 '22 09:11

paulperry


Quick and easy: You can add a column with increasing integers and use this as x-values:

# some dataframe df
df['int_index'] = range(len(df))

df.plot(x='int_index', y='sorted_values')

If you don't want to keep the index, drop it afterwards:

df.drop('int_index', axis=1, inplace=True)

Helper function:

The Pandas plot function does not take "external" data as indices. You can use matplotlib directly to plot as tnknepp showed, or keep the Pandas plotting (and formatting) with a helper function:

def plot_sorted(series, **kwargs):
    df = pd.DataFrame({'data': series}).reset_index(drop=True)
    df.plot(**kwargs)

Using a wrapper like this you can quickly plot any Series and customize the plot by calling with the appropriate arguments that will be used to call the plot method. Examples:

ts = pd.Series(np.random.randn(100), index=pd.date_range('1/1/2000', periods=100))
# default pandas plot (with integer indices)
plot_sorted(ts)

# scatter plot (using `data` for x and y)
plot_sorted(ts, x='data', y='data', kind="scatter")

Default plot Scatter plot

like image 34
chris-sc Avatar answered Nov 10 '22 09:11

chris-sc


For one line:

import pandas as pd
from pylab import *

fig = figure( figsize=(5,5) )
ax  = fig.add_subplot(111)

# for Pandas version < 1.x
ax.plot( np.arange(df.shape[0]), df['A'].sort())

# for Pandas versions > 1.x (df["A"].sort is deprecated)
ax.plot( np.arange(df.shape[0]), df['A'].sort_values())

This is assuming your dataframe is called df and you want to plot the data stored in column "A".

like image 44
tnknepp Avatar answered Nov 10 '22 09:11

tnknepp