I have a pandas DataFrame that looks like this training.head()
The DataFrame has been sorted by date. I'd like to make a scatterplot where the date of the campaign is on the x axis and the rate of success is on the y axis. I was able to get a line graph by using training.plot(x='date',y='rate')
. However, when I changed that to training.plot(kind='scatter',x='date',y='rate')
I get an error: KeyError: u'no item named date'
Why does my index column go away when I try to make a scatterplot? Also, I bet I need to do something with that date field so that it doesn't get treated like a simple string, don't I?
Extra credit, what would I do if I wanted each of the account numbers to plot with a different color?
Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.
If I remember correctly, the plotting code only considers numeric columns. Internally it selects just the numeric columns, so that's why you get the key error.
What's the dtype of date
? If it's a datetime64
, you can recast it as an np.int64
:
df['date_int'] = df.date.astype(np.int64)
And then you're plot.
For the color part, make a dictionary of {account number: color}
. For example:
color_d = {1: 'k', 2: 'b', 3: 'r'}
Then when you plot:
training.plot(kind='scatter',x='date',y='rate', color=df.account.map(color_d))
I've found it simpler to change the style
of a line chart to not include the connecting lines:
cb_df.plot(figsize=(16, 6), style='o')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With