Home Crime Incident Data Collection in England (2011-2022)
Post
Cancel

Crime Incident Data Collection in England (2011-2022)

The following article details the process on collecting crime incident data in England from 2011-2022 in a single tabular file.

Crime incident here can be defined as crime incidents that occur in England recorded at a chosen geographical level. The geographical level in question here is a lower layer super output area (LSOA). Every crime incident fall within a LSOA and each corresponding LSOA code and name is provided. LSOAs can be understood as one of the constituent areas that makes up England. The total number of LSOAs in England currently is 32844 and were produced by the Office for National Statistics for the reporting of small area statistics. The table below details the geographical information on constituent areas in England.

Table 1: Description of area types in England. Adapted from LSOAs

Area TypeDescription
Output Area (OA)Have an average population of about 310 residents and is the smallest level
of geographies that data is published on. Apart from Census outputs,
not very much data is available
Lower Layer Super Output Area (LSOA)Have an average population of 1500 people or 650 households. Much more
data is available at this level
Middle Layer Super Output Area (MSOA)Have an average population of 7500 residents or 4000 households

OAs rests within the boundaries of LSOAs and LSOAs rests within the boundaries of MSOAs and finally, MSOAs rests within the boundaries of local authorities. Figure 1 below illustrates how these different geographies are related to one another.

Figure 1 Figure 1: Illustration of Area types in England. Adapted from LSOAs

The type of crime incidents and its corresponding description is as in Table 2 below.

Table 2: Description of type of crime incidents. Adapted from Police.UK

Type of crime incidentsDescription
Anti-social behaviourPersonal, environmental and nuisance anti-social behaviour
Bicycle theftTaking without consent or theft of a pedal cycle
BurglaryOffences where a person enters a house or other building with the
intention of stealing
Criminal damage and arsonDamage to buildings and vehicles and deliberate damage by fire
DrugsOffences related to possession, supply and production
Other crimeForgery, perjury and other miscellaneous crime
Other theftTheft by an employee, blackmail and making off without payment
Possession of weaponsPossession of a weapon, such as a firearm or knife
Public orderOffences which cause fear, alarm or distress
RobberyOffences where a person uses force or threat of force to steal
ShopliftingTheft from shops or stalls
Theft from the personCrimes that involve theft directly from the victim (including handbag,
wallet, cash, mobile phones) but without the use or threat of physical force
Vehicle crimeTheft from or of a vehicle or interference with a vehicle
Violence and sexual offencesOffences against the person such as common assaults, Grievous Bodily Harm
and sexual offences

Some of the crime types may seem to bear the same meaning, but there exists a clear distinction among them. Chief among them are theft, burglary, and robbery. Although they may be used interchangeably in our day-to-day to lives, care must be taken here to distinguish the types which may bring about different levels of severity depending on the case. To incorporate massive datasets such as this, data.police.uk offers chunks of data in compressed file format. Example of this is shown in Figure 2 below.

Figure 2 Figure 2: Example of chunks of crime data in compressed format. Adapted from data.police.uk

To satisfy this project requirement of data from 2011 till 2022, the compressed files that need to be downloaded are essentially 3 files which are,

  1. Crime data from October-2019 to September-2022. Approximately 1.52GB in size.
  2. Crime data from October-2016 to September-2019. Approximately 1.57GB in size.
  3. Crime data from December-2010 to September-2016. Approximately 2.09GB in size.

Crime data of December-2010 will be manually deleted after downloading as it does not fulfil this projects timeline requirement. The folder and file structure after extracting the compressed files are depicted in Figure 3 and Figure 4 respectively.

Figure 3 Figure 3: Folder structure after extracting compressed crime data

Figure 4 Figure 4: File structure inside the folders

Figure 3 informs us that the folder structure is organized by year and month. Figure 4 on the other hand informs us that the file structure is organized by year, month, LSOA name and crime file type. Crime file type consists of outcomes, stop-and-search and, street. For the purposes of this project, only crime file type of street shall be included. Figure 5 depicts the snippet of unprocessed crime incident data for the city of London in January of 2011.

Figure 5 Figure 5: Example of unprocessed crime incident data

The basic idea of the steps taken to combine all required crime data in a single file is as below,

  1. Iterating through 3 main parent files “2011-2016”, “2017-2019”, and “2020-2022”.
  2. Iterating through the main parent files to access the crime data by year and month.
  3. Iterating the crime files and selecting only crime files of type “street”.
  4. Crime files of type “street” are appended to a Python list in the form of a Pandas DataFrame.
  5. This Python list will continue appending with the next Pandas DataFrame of the next month.
  6. This process is then repeated for months from 2011 till 2022 which then will be merged into a single Pandas DataFrame which can then be exported in a tabular format.

The combined file has 42 million rows of data, 10 columns and is around 7.5GB in size. The description of the columns is detailed in Table 3 below and the snippet of the combined crime incident data is shown in Figure 6.

ColumnDescription
MonthMonth of the crime incident. Data type: string
Reported byThe police force name that provided the data about the crime. Data type: string
Falls withinThe police force name that provided the data about the crime. Data type: string
LongitudeAnonymised longitude coordinate of the crime incident. Data type: float
LatitudeAnonymised latitude coordinate of the crime incident. Data type: float
LocationDescription of the crime location. Data type: string
LSOA codeReferences the LSOA code that the anonymised coordinate point falls into.
Data type: string
LSOA nameReferences the LSOA name that the anonymised coordinate point falls into.
Data type: string
Crime typeTypes of crime category. Data type: string
Last outcome categoryReference to whichever of the outcomes associated with the crime occurred most recently.
Data type: string

Figure 6 Figure 6: Snippet of Combined Crime Incident Data

The Python notebook for the data collection mentioned in this article is hosted in my Github Repo.

This post is licensed under CC BY 4.0 by the author.