Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: plot mean values within bins - formatting help needed

I am doing some solar system dynamics simulations, and have been using this project as an excuse to teach myself some python/pandas. The resulting data set has a little over 1000 records, with values for orbital inclination, eccentricity and so on for each of the bodies involved.

I'm trying to use pandas to study the mean orbital inclinations of the ~1000 bodies ('test particles') in the result data, as a function of semi-major axis.

So far, what I've been doing is this:

1) read the data into a dataframe

df = pd.read_csv('final.csv')

2) limit the data to a range of semi-major axis values (the particles 'spread out' over the course of the simulations, but I want to limit my analysis

cf = df[df.a.between(30,80)]

3) plot the mean value for inclination for a given number of bins

cf.groupby(pd.cut(cf.a, 80))['inc'].mean().plot()

This creates an acceptable plot, but formatting-wise it has a couple of problems: unless it's completely maximized to fill my screen, the numbers along the x axis get squished together and overlap. They are also not exactly what I'd like to see: they show the max and min for bins, where I would prefer a straight ticking by 5s or something similar.

pandas output from above command

I've tried passing values (x=None, xticks=None) in to the plot() part of the string above, but this has had no effect on the resulting plot. Is plot/output control possible with the way I'm doing this?

Thanks,

G

like image 222
Glenn Becker Avatar asked Mar 28 '26 22:03

Glenn Becker


1 Answers

When I want to do something like this, I go to matplotlib directly. I will show a small example with this sample data:

df = pd.DataFrame([[1, 2], [2, 7], [3, 6], [4,7], [5,3]], columns=['A', 'B'])

Instead of cutting it with pd.cut, I make cuts with np.linspace. So

bins = np.linspace(0,5,4)
group = df.groupby(pd.cut(df.A, bins))

Now, to plot it, I want the middle of the bins

plot_centers = (bins [:-1] + bins [1:])/2
plot_values = group.B.mean()

and plot with

plt.plot(plot_centers, plot_values)

You should be careful handling missing data, ie if you have a bin with no data in it. In that case you can use fillna(0), to make all NaNs 0.

plot_values = group.B.mean().fillna(0)
like image 142
KPLauritzen Avatar answered Apr 02 '26 13:04

KPLauritzen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!