I have a dataframe that includes a number of statistics on crime, including the date and time of the crime as well as the category.
0 5/13/2015 8:55 VEHICLE THEFT
1 5/13/2015 8:41 OTHER OFFENSES
2 5/13/2015 8:36 OTHER OFFENSES
3 5/13/2015 8:30 NON-CRIMINAL
4 5/13/2015 8:17 OTHER OFFENSES
5 5/13/2015 8:16 OTHER OFFENSES
6 5/13/2015 8:10 LARCENY/THEFT
7 5/13/2015 8:00 BURGLARY
8 5/13/2015 8:00 MISSING PERSON
9 5/13/2015 8:00 OTHER OFFENSES
10 5/13/2015 8:00 ASSAULT
---
So for the above sample, it would simply print: "Other Offenses."
It is a massive database, over 400,000 rows.
I need to write a function that will allow me to enter any given time range (using from and to) and then determine which crime category occurred with the most frequency. This is what I have, and it is not working:
import pandas as pd
import csv
import datetime
timeData = open("timeData.csv")
df = pd.read_csv('timeData.csv')
from datetime import timedelta, date
df['Dates'] = pd.to_datetime(df['Dates']) #this converts the values in the Dates column to datetime format
def daterange(start_date, end_date):
for n in range(int ((end_date - start_date).days)):
yield start_date + timedelta(n)
start_date = date(2015, 5, 1)
end_date = date(2015, 6, 2)
for daterange(start_date, end_date):
df['Category'].value_counts()
I want to iterate through the Dates column (Column A) and select only those dates that fit within my daterange. For the dates within my daterange, I then want to count the number of instances of each crime (Column B). Having done this, I want to print the Crime that occurs most frequently.
Suggestions?
First, to make this efficient-er, set your index to be the dates. Assume your dates column is Date
and crime categories are Crime
# make pd.Series with date as the index and crimes as the values
d1 = df.set_index('Date').Crime.sort_index()
# function that uses date slicing and uses values counts
def most_freq(start, end):
return d1[start:end].value_counts().index[0]
# demo
most_freq('2015-05', '2015-05')
'OTHER_OFFENSES'
If you just want to group your DF by date:
In [204]: df.groupby([pd.Grouper(key='date', freq='D')])['crime'].value_counts()
Out[204]:
date crime
2015-05-13 OTHER OFFENSES 5
ASSAULT 1
BURGLARY 1
LARCENY/THEFT 1
MISSING PERSON 1
NON-CRIMINAL 1
VEHICLE THEFT 1
Name: crime, dtype: int64
or:
In [201]: df
Out[201]:
date crime
0 2015-05-13 08:55:00 VEHICLE THEFT
1 2015-05-13 08:41:00 OTHER OFFENSES
2 2015-05-13 08:36:00 OTHER OFFENSES
3 2015-05-13 08:30:00 NON-CRIMINAL
4 2015-05-13 08:17:00 OTHER OFFENSES
5 2015-05-13 08:16:00 OTHER OFFENSES
6 2015-05-13 08:10:00 LARCENY/THEFT
7 2015-05-13 08:00:00 BURGLARY
8 2015-05-13 08:00:00 MISSING PERSON
9 2015-05-13 08:00:00 OTHER OFFENSES
10 2015-05-13 08:00:00 ASSAULT
In [202]: df.groupby([pd.Grouper(key='date', freq='D'), 'crime']).size()
Out[202]:
date crime
2015-05-13 ASSAULT 1
BURGLARY 1
LARCENY/THEFT 1
MISSING PERSON 1
NON-CRIMINAL 1
OTHER OFFENSES 5
VEHICLE THEFT 1
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With