Introduction The purpose is to determine the trend of crime in the united states. Crime in the United States from 1997–2016. There is a misconception that the world is trending towards chaos, and that crime is the worst that is ever been in American history. This exploratory data analysis aims to shed light on the trend of crime in the United States. It should be noted at the beginning of this report that population has been steadily increasing in America. How will crime be affected by rising population? Will there be more or less crime? Are there any historical impacts in subtle increases on crime?

The Data For the data https://ucr.fbi.gov/crime-in-the-u.s/2016/crime-in-the-u.s.-2016/topic-pages/tables/table-1 was used following crime statistics from 1997–2016. it looked at Violent crime, Murder and non-negligent manslaughter, Rape, Robbery, Aggravated assault, Property Crime, Burglary, Larceny Theft, Motor Vehicle theft. It also had the rate per 100,000 Inhabitants. Python/Pandas was used to read the csv sheet. Data was cleaned and made into a form that was more useable for data analysis. o Line Charts were used to show each of the types of crimes and their trends. A Linear Regression was used for Violent Crime to show its trend(Vilent crime include the offenses of murder, rape (legacy definition), robbery, and aggravated assault.). The Linear regression shows a downward trend for violent crime and each for each of the other crimes there is a downard trend as well. It seems that crime in America in general is reducing even though the population is rising.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
In [3]:
df = pd.read_csv('Crime in the United States.csv')
print(df.columns)
df.head()
Index(['Year', 'Population1', 'Violentcrime', 'Violent \ncrime \nrate ',
       'Murder and\nnonnegligent \nmanslaughter',
       'Murder and \nnonnegligent \nmanslaughter \nrate ',
       'Rape\n(revised \ndefinition3)', 'Rape\n(revised \ndefinition) \nrate3',
       'Rape\n(legacy \ndefinition4)', 'Rape\n(legacy \ndefinition) \nrate4',
       'Robbery', 'Robbery \nrate ', 'Aggravated \nassault',
       'Aggravated \nassault rate ', 'Property \ncrime',
       'Property \ncrime \nrate ', 'Burglary', 'Burglary \nrate ',
       'Larceny-\ntheft', 'Larceny-\ntheft rate ', 'Motor \nvehicle \ntheft',
       'Motor \nvehicle \ntheft \nrate '],
      dtype='object')
Out[3]:
Year Population1 Violentcrime Violent crime rate Murder and nonnegligent manslaughter Murder and nonnegligent manslaughter rate Rape (revised definition3) Rape (revised definition) rate3 Rape (legacy definition4) Rape (legacy definition) rate4 ... Aggravated assault Aggravated assault rate Property crime Property crime rate Burglary Burglary rate Larceny- theft Larceny- theft rate Motor vehicle theft Motor vehicle theft rate
0 1997.0 267,783,607 1,636,096 611 18,208 6.8 NaN NaN 96,153 35.9 ... 1,023,201 382.1 11,558,475 4,316.30 2,460,526 918.8 7,743,760 2,891.80 1,354,189 505.7
1 1998.0 270,248,003 1,533,887 567.6 16,974 6.3 NaN NaN 93,144 34.5 ... 976,583 361.4 10,951,827 4,052.50 2,332,735 863.2 7,376,311 2,729.50 1,242,781 459.9
2 1999.0 272,690,813 1,426,044 523 15,522 5.7 NaN NaN 89,411 32.8 ... 911,740 334.3 10,208,334 3,743.60 2,100,739 770.4 6,955,520 2,550.70 1,152,075 422.5
3 2000.0 281,421,906 1,425,486 506.5 15,586 5.5 NaN NaN 90,178 32.0 ... 911,706 324 10,182,584 3,618.30 2,050,992 728.8 6,971,590 2,477.30 1,160,002 412.2
4 20015.0 285,317,559 1,439,480 504.5 16,037 5.6 NaN NaN 90,863 31.8 ... 909,023 318.6 10,437,189 3,658.10 2,116,531 741.8 7,092,267 2,485.70 1,228,391 430.5

5 rows × 22 columns

In [4]:
Year = [] #for loop rmoves that extra digit and makes list into int
for i in df['Year'].dropna().apply(int).apply(str):
    nYear = i[:4]
    nYear = int(nYear)
    Year.append(nYear)
print(Year)

Population = [] #this for loop removes commas and changes to int
for i in df['Population1'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Population.append(i)
print(Population)

#'Violentcrime'
Violentcrime = [] #this for loop removes commas and changes to int
for i in df['Violentcrime'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Violentcrime.append(i)
print(Violentcrime)

#forloop for rates
df['Violent \ncrime \nrate '].dropna()
crimerate = []
for i in df['Violent \ncrime \nrate '].dropna():
    if [i] == [' ']:
        print('Yes')
    else:
        i = int(float(i))
        i = round(i)
        crimerate.append(i)
print(crimerate)

#Crime percentage
crimepercentage =[]
for i in crimerate:
    i = i/100000
    i = i*100
    crimepercentage.append(i)
print(crimepercentage)

# Linear regression 
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split


x = np.unique(Year).reshape(-1, 1)
y = Violentcrime

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.3, random_state=42)

reg = LinearRegression()
reg.fit(x_train, y_train)
y_pred = reg.predict(x_test)


#Murder/Manslaughter 
df['Murder and\nnonnegligent \nmanslaughter'].dropna()
Murder = []
for i in df['Murder and\nnonnegligent \nmanslaughter'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Murder.append(i)   
    
#Robbery    
df['Robbery'].dropna()
Robbery = []
for i in df['Robbery'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Robbery.append(i)  
    
#Aggravated Assault
df['Aggravated \nassault'].dropna()
Aggravated = []
for i in df['Aggravated \nassault'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Aggravated.append(i)  
    
#property Crime 
df['Property \ncrime'].dropna()
Property = []
for i in df['Property \ncrime'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Property.append(i)  
    
#burgleury
df['Burglary'].dropna()
Burglary = []
for i in df['Burglary'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Burglary.append(i)  
    
#Larceny
df['Larceny-\ntheft'].dropna()
Larceny = []
for i in df['Larceny-\ntheft'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Larceny.append(i)  
    
#motor Vehicle
df['Motor \nvehicle \ntheft'].dropna()
Motor = []
for i in df['Motor \nvehicle \ntheft'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Motor.append(i)  
[1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016]
[267783607, 270248003, 272690813, 281421906, 285317559, 287973924, 290788976, 293656842, 296507061, 299398484, 301621157, 304059724, 307006550, 309330219, 311587816, 313873685, 316497531, 318907401, 320896618, 323127513]
[1636096, 1533887, 1426044, 1425486, 1439480, 1423677, 1383676, 1360088, 1390745, 1435123, 1422970, 1394461, 1325896, 1251248, 1206005, 1217057, 1168298, 1153022, 1199310, 1248185]
Yes
Yes
[611, 567, 523, 506, 504, 494, 475, 463, 469, 479, 471, 458, 431, 404, 387, 387, 369, 361, 373, 386]
[0.611, 0.567, 0.523, 0.506, 0.504, 0.494, 0.475, 0.46299999999999997, 0.469, 0.479, 0.471, 0.45799999999999996, 0.43099999999999994, 0.404, 0.387, 0.387, 0.369, 0.361, 0.373, 0.386]
In [5]:
plt.plot(Year, Population)
plt.title('Population and years')
plt.xlabel('years')
plt.ylabel('Population')
plt.show()
In [6]:
plt.plot(Year, crimepercentage)
plt.title('Percantage over years')
plt.xlabel('years')
plt.ylabel('Violent crime')
plt.show()
In [7]:
plt.scatter(x_test, y_test, color='black')
plt.plot(x_test, y_pred, color='blue', linewidth=3)
plt.title('Violent Crime and Years')
plt.xlabel('Year')
plt.ylabel('Violent Crime')
Out[7]:
Text(0,0.5,'Violent Crime')
In [8]:
#plt.plot(Year, Violentcrime)
plt.plot(Year, crimerate)
plt.title('Violent crime and years')
plt.xlabel('years')
plt.ylabel('Violent crime')
plt.show()
In [9]:
plt.plot(Year, Murder)
plt.title('Murder and years')
plt.xlabel('years')
plt.ylabel('Murder')
plt.show()
In [10]:
plt.plot(Year, Robbery)
plt.title('Robbery and years')
plt.xlabel('years')
plt.ylabel('Robbery')
plt.show()
In [11]:
plt.plot(Year, Aggravated)
plt.title('Aggravated Assault and years')
plt.xlabel('years')
plt.ylabel('Aggravated Assault')
plt.show()
In [12]:
plt.plot(Year, Property)
plt.title('Property crime and years')
plt.xlabel('years')
plt.ylabel('Property crime')
plt.show()
In [13]:
plt.plot(Year, Burglary)
plt.title('Burglary and years')
plt.xlabel('years')
plt.ylabel('Burglary')
plt.show()
In [14]:
plt.plot(Year, Larceny)
plt.title('Larceny Theft and years')
plt.xlabel('years')
plt.ylabel('Larceny Theft')
plt.show()
In [15]:
plt.plot(Year, Motor)
plt.title('Motor vehicle theft and years')
plt.xlabel('years')
plt.ylabel('Motor vehicle theft')
plt.show()