This is all on a windows 7 x64 bit machine, running python 3.4.3 x64 bit, in the PyCharm Educational edition 1.0.1 compiler. The data being used for this program is taken from the Citi Bike program in New York City (data found here: http://www.citibikenyc.com/system-data).
I have sorted the data so that I have a new CSV file with just the uniqe bike ID's and how many times each bicycle was ridden (file is called Sorted_Bike_Uses.csv). I am trying to make a graph with the bike ID's against the number of uses (Bike ID's on the x-axis, # of uses on the y-axis). My code looks like this:
import pandas as pd
import matplotlib.pyplot as plt
# read in the file and separate it into two lists
a = pd.read_csv('Sorted_Bike_Uses.csv', header=0)
b = a['Bike ID']
c = a['Number of Uses']
# create the graph
plt.plot(b, c)
# label the x and y axes
plt.xlabel('Bicycles', weight='bold', size='large')
plt.ylabel('Number of Rides', weight='bold', size='large')
# format the x and y ticks
plt.xticks(rotation=50, horizontalalignment='right', weight='bold', size='large')
plt.yticks(weight='bold', size='large')
# give it a title
plt.title("Top Ten Bicycles (by # of uses)", weight='bold')
# displays the graph
plt.show()
It creates an almost correctly formatted graph. The only issue is that it sorts the Bike ID's so that they are in numerical order, rather than being in order of uses. I have tried re-purposing old code that I used to make a similar graph, but it just makes an even worse graph that somehow has two sets of data being plotted. It looks like this:
my_plot = a.sort(columns='Number of Uses', ascending=True).plot(kind='bar', legend=None)
# labels the x and y axes
my_plot.set_xlabel('Bicycles')
my_plot.set_ylabel('Number of Rides')
# sets the labels along the x-axis as the names of each liquor
my_plot.set_xticklabels(b, rotation=45, horizontalalignment='right')
# displays the graph
plt.show()
The second set of code is using the same set of data as the first set of code, and has been changed from the original to fit the citi bike data. My google-fu is exhausted. I have tried reformatting the xticks, adding pieces of the second code to the first code, adding pieces of the first code to the second, etc. It is probably something staring me right in the face, but I can't see it. Any help is appreciated.
You want to plot just the number of uses using the plotting function, then set the x-labels to the bike ID numbers. So when you plot, don't include the bike ID numbers. Just do plt.plot(c). If you give the plot function only one argument, it creates the x-values itself, in this case as range(len(c)). Then you can change the labels on the x-axis to the bike IDs. This is done with plt.xticks. You need to pass it the list of x-values that it created and the list of labels. So that would be plt.xticks(range(len(c)), b).
Try this:
import pandas as pd
import matplotlib.pyplot as plt
# read in the file and separate it into two lists
a = pd.read_csv('Sorted_Bike_Uses.csv', header=0)
b = a['Bike ID']
c = a['Number of Uses']
# create the graph
plt.plot(c)
# label the x and y axes
plt.xlabel('Bicycles', weight='bold', size='large')
plt.ylabel('Number of Rides', weight='bold', size='large')
# format the x and y ticks
plt.xticks(range(len(c)), b, rotation=50, horizontalalignment='right', weight='bold', size='large')
plt.yticks(weight='bold', size='large')
# give it a title
plt.title("Top Ten Bicycles (by # of uses)", weight='bold')
# displays the graph
plt.show()
If you use .plot
method of pandas.DataFrame
, just grab the resultant axis
and set_xticklables
:
a = pd.DataFrame({'Bike ID': [5454, 3432, 4432, 3314],
'Number of Uses': [11, 23, 5, 9]})
a.sort(columns='Number of Uses', inplace=True)
ax = a.plot(y='Number of Uses', kind='bar')
_ = ax.set_xticklabels(a['Bike ID'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With