Aggregating weekly crime and traffic accidents
So far in this chapter, we have taken a basic tour of pandas’ offerings for dealing with temporal data. Starting with small sample datasets has made it easy to visually inspect the output of our operations, but we are now at the point where we can start focusing on applications to “real world” datasets.
The Denver crime dataset is huge, with over 460,000 rows each marked with a datetime of when the crime was reported. As you will see in this recipe, we can use pandas to easily resample these events and ask questions like How many crimes were reported in a given week?.
How to do it
To start, let’s read in the crime dataset, setting our index as the REPORTED_DATE. This dataset was saved using pandas extension types, so there is no need to specify the dtype_backend= argument:
df = pd.read_parquet(
"data/crime.parquet",
).set_index("REPORTED_DATE")
df.head()
REPORTED_DATE...