I have a pandas DataFrame that looks like this training.head()

The DataFrame has been sorted by date. I'd like to make a scatterplot where the date of the campaign is on the x axis and the rate of success is on the y axis. I was able to get a line graph by using training.plot(x='date',y='rate'). However, when I changed that to training.plot(kind='scatter',x='date',y='rate') I get an error: KeyError: u'no item named date'
Why does my index column go away when I try to make a scatterplot? Also, I bet I need to do something with that date field so that it doesn't get treated like a simple string, don't I?
Extra credit, what would I do if I wanted each of the account numbers to plot with a different color?
Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.
If I remember correctly, the plotting code only considers numeric columns. Internally it selects just the numeric columns, so that's why you get the key error.
What's the dtype of date? If it's a datetime64, you can recast it as an np.int64:
df['date_int'] = df.date.astype(np.int64)
And then you're plot.
For the color part, make a dictionary of {account number: color}. For example:
color_d = {1: 'k', 2: 'b', 3: 'r'}
Then when you plot:
training.plot(kind='scatter',x='date',y='rate', color=df.account.map(color_d))
I've found it simpler to change the style of a line chart to not include the connecting lines:
cb_df.plot(figsize=(16, 6), style='o')

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With