Introduction The purpose is to determine the trend of crime in the united states. Crime in the United States from 1997–2016. There is a misconception that the world is trending towards chaos, and that crime is the worst that is ever been in American history. This exploratory data analysis aims to shed light on the trend of crime in the United States. It should be noted at the beginning of this report that population has been steadily increasing in America. How will crime be affected by rising population? Will there be more or less crime? Are there any historical impacts in subtle increases on crime?
The Data For the data https://ucr.fbi.gov/crime-in-the-u.s/2016/crime-in-the-u.s.-2016/topic-pages/tables/table-1 was used following crime statistics from 1997–2016. it looked at Violent crime, Murder and non-negligent manslaughter, Rape, Robbery, Aggravated assault, Property Crime, Burglary, Larceny Theft, Motor Vehicle theft. It also had the rate per 100,000 Inhabitants. Python/Pandas was used to read the csv sheet. Data was cleaned and made into a form that was more useable for data analysis. o Line Charts were used to show each of the types of crimes and their trends. A Linear Regression was used for Violent Crime to show its trend(Vilent crime include the offenses of murder, rape (legacy definition), robbery, and aggravated assault.). The Linear regression shows a downward trend for violent crime and each for each of the other crimes there is a downard trend as well. It seems that crime in America in general is reducing even though the population is rising.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('Crime in the United States.csv')
print(df.columns)
df.head()
Year = [] #for loop rmoves that extra digit and makes list into int
for i in df['Year'].dropna().apply(int).apply(str):
nYear = i[:4]
nYear = int(nYear)
Year.append(nYear)
print(Year)
Population = [] #this for loop removes commas and changes to int
for i in df['Population1'].dropna():
i = i.replace(',', '')
i = int(i)
Population.append(i)
print(Population)
#'Violentcrime'
Violentcrime = [] #this for loop removes commas and changes to int
for i in df['Violentcrime'].dropna():
i = i.replace(',', '')
i = int(i)
Violentcrime.append(i)
print(Violentcrime)
#forloop for rates
df['Violent \ncrime \nrate '].dropna()
crimerate = []
for i in df['Violent \ncrime \nrate '].dropna():
if [i] == [' ']:
print('Yes')
else:
i = int(float(i))
i = round(i)
crimerate.append(i)
print(crimerate)
#Crime percentage
crimepercentage =[]
for i in crimerate:
i = i/100000
i = i*100
crimepercentage.append(i)
print(crimepercentage)
# Linear regression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
x = np.unique(Year).reshape(-1, 1)
y = Violentcrime
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.3, random_state=42)
reg = LinearRegression()
reg.fit(x_train, y_train)
y_pred = reg.predict(x_test)
#Murder/Manslaughter
df['Murder and\nnonnegligent \nmanslaughter'].dropna()
Murder = []
for i in df['Murder and\nnonnegligent \nmanslaughter'].dropna():
i = i.replace(',', '')
i = int(i)
Murder.append(i)
#Robbery
df['Robbery'].dropna()
Robbery = []
for i in df['Robbery'].dropna():
i = i.replace(',', '')
i = int(i)
Robbery.append(i)
#Aggravated Assault
df['Aggravated \nassault'].dropna()
Aggravated = []
for i in df['Aggravated \nassault'].dropna():
i = i.replace(',', '')
i = int(i)
Aggravated.append(i)
#property Crime
df['Property \ncrime'].dropna()
Property = []
for i in df['Property \ncrime'].dropna():
i = i.replace(',', '')
i = int(i)
Property.append(i)
#burgleury
df['Burglary'].dropna()
Burglary = []
for i in df['Burglary'].dropna():
i = i.replace(',', '')
i = int(i)
Burglary.append(i)
#Larceny
df['Larceny-\ntheft'].dropna()
Larceny = []
for i in df['Larceny-\ntheft'].dropna():
i = i.replace(',', '')
i = int(i)
Larceny.append(i)
#motor Vehicle
df['Motor \nvehicle \ntheft'].dropna()
Motor = []
for i in df['Motor \nvehicle \ntheft'].dropna():
i = i.replace(',', '')
i = int(i)
Motor.append(i)
plt.plot(Year, Population)
plt.title('Population and years')
plt.xlabel('years')
plt.ylabel('Population')
plt.show()
plt.plot(Year, crimepercentage)
plt.title('Percantage over years')
plt.xlabel('years')
plt.ylabel('Violent crime')
plt.show()
plt.scatter(x_test, y_test, color='black')
plt.plot(x_test, y_pred, color='blue', linewidth=3)
plt.title('Violent Crime and Years')
plt.xlabel('Year')
plt.ylabel('Violent Crime')
#plt.plot(Year, Violentcrime)
plt.plot(Year, crimerate)
plt.title('Violent crime and years')
plt.xlabel('years')
plt.ylabel('Violent crime')
plt.show()
plt.plot(Year, Murder)
plt.title('Murder and years')
plt.xlabel('years')
plt.ylabel('Murder')
plt.show()
plt.plot(Year, Robbery)
plt.title('Robbery and years')
plt.xlabel('years')
plt.ylabel('Robbery')
plt.show()
plt.plot(Year, Aggravated)
plt.title('Aggravated Assault and years')
plt.xlabel('years')
plt.ylabel('Aggravated Assault')
plt.show()
plt.plot(Year, Property)
plt.title('Property crime and years')
plt.xlabel('years')
plt.ylabel('Property crime')
plt.show()
plt.plot(Year, Burglary)
plt.title('Burglary and years')
plt.xlabel('years')
plt.ylabel('Burglary')
plt.show()
plt.plot(Year, Larceny)
plt.title('Larceny Theft and years')
plt.xlabel('years')
plt.ylabel('Larceny Theft')
plt.show()
plt.plot(Year, Motor)
plt.title('Motor vehicle theft and years')
plt.xlabel('years')
plt.ylabel('Motor vehicle theft')
plt.show()