Introduction The purpose is to determine the trend of crime in the united states. Crime in the United States from 1997–2016. There is a misconception that the world is trending towards chaos, and that crime is the worst that is ever been in American history. This exploratory data analysis aims to shed light on the trend of crime in the United States. It should be noted at the beginning of this report that population has been steadily increasing in America. How will crime be affected by rising population? Will there be more or less crime? Are there any historical impacts in subtle increases on crime?

The Data For the data https://ucr.fbi.gov/crime-in-the-u.s/2016/crime-in-the-u.s.-2016/topic-pages/tables/table-1 was used following crime statistics from 1997–2016. it looked at Violent crime, Murder and non-negligent manslaughter, Rape, Robbery, Aggravated assault, Property Crime, Burglary, Larceny Theft, Motor Vehicle theft. It also had the rate per 100,000 Inhabitants. Python/Pandas was used to read the csv sheet. Data was cleaned and made into a form that was more useable for data analysis. o Line Charts were used to show each of the types of crimes and their trends. A Linear Regression was used for Violent Crime to show its trend(Vilent crime include the offenses of murder, rape (legacy definition), robbery, and aggravated assault.). The Linear regression shows a downward trend for violent crime and each for each of the other crimes there is a downard trend as well. It seems that crime in America in general is reducing even though the population is rising.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('Crime in the United States.csv')
print(df.columns)
df.head()

Index(['Year', 'Population1', 'Violentcrime', 'Violent \ncrime \nrate ',
       'Murder and\nnonnegligent \nmanslaughter',
       'Murder and \nnonnegligent \nmanslaughter \nrate ',
       'Rape\n(revised \ndefinition3)', 'Rape\n(revised \ndefinition) \nrate3',
       'Rape\n(legacy \ndefinition4)', 'Rape\n(legacy \ndefinition) \nrate4',
       'Robbery', 'Robbery \nrate ', 'Aggravated \nassault',
       'Aggravated \nassault rate ', 'Property \ncrime',
       'Property \ncrime \nrate ', 'Burglary', 'Burglary \nrate ',
       'Larceny-\ntheft', 'Larceny-\ntheft rate ', 'Motor \nvehicle \ntheft',
       'Motor \nvehicle \ntheft \nrate '],
      dtype='object')

Year = [] #for loop rmoves that extra digit and makes list into int
for i in df['Year'].dropna().apply(int).apply(str):
    nYear = i[:4]
    nYear = int(nYear)
    Year.append(nYear)
print(Year)

Population = [] #this for loop removes commas and changes to int
for i in df['Population1'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Population.append(i)
print(Population)

#'Violentcrime'
Violentcrime = [] #this for loop removes commas and changes to int
for i in df['Violentcrime'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Violentcrime.append(i)
print(Violentcrime)

#forloop for rates
df['Violent \ncrime \nrate '].dropna()
crimerate = []
for i in df['Violent \ncrime \nrate '].dropna():
    if [i] == [' ']:
        print('Yes')
    else:
        i = int(float(i))
        i = round(i)
        crimerate.append(i)
print(crimerate)

#Crime percentage
crimepercentage =[]
for i in crimerate:
    i = i/100000
    i = i*100
    crimepercentage.append(i)
print(crimepercentage)

# Linear regression 
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split


x = np.unique(Year).reshape(-1, 1)
y = Violentcrime

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.3, random_state=42)

reg = LinearRegression()
reg.fit(x_train, y_train)
y_pred = reg.predict(x_test)


#Murder/Manslaughter 
df['Murder and\nnonnegligent \nmanslaughter'].dropna()
Murder = []
for i in df['Murder and\nnonnegligent \nmanslaughter'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Murder.append(i)   
    
#Robbery    
df['Robbery'].dropna()
Robbery = []
for i in df['Robbery'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Robbery.append(i)  
    
#Aggravated Assault
df['Aggravated \nassault'].dropna()
Aggravated = []
for i in df['Aggravated \nassault'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Aggravated.append(i)  
    
#property Crime 
df['Property \ncrime'].dropna()
Property = []
for i in df['Property \ncrime'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Property.append(i)  
    
#burgleury
df['Burglary'].dropna()
Burglary = []
for i in df['Burglary'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Burglary.append(i)  
    
#Larceny
df['Larceny-\ntheft'].dropna()
Larceny = []
for i in df['Larceny-\ntheft'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Larceny.append(i)  
    
#motor Vehicle
df['Motor \nvehicle \ntheft'].dropna()
Motor = []
for i in df['Motor \nvehicle \ntheft'].dropna():
    i = i.replace(',', '')
    i = int(i)
    Motor.append(i)

[1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016]
[267783607, 270248003, 272690813, 281421906, 285317559, 287973924, 290788976, 293656842, 296507061, 299398484, 301621157, 304059724, 307006550, 309330219, 311587816, 313873685, 316497531, 318907401, 320896618, 323127513]
[1636096, 1533887, 1426044, 1425486, 1439480, 1423677, 1383676, 1360088, 1390745, 1435123, 1422970, 1394461, 1325896, 1251248, 1206005, 1217057, 1168298, 1153022, 1199310, 1248185]
Yes
Yes
[611, 567, 523, 506, 504, 494, 475, 463, 469, 479, 471, 458, 431, 404, 387, 387, 369, 361, 373, 386]
[0.611, 0.567, 0.523, 0.506, 0.504, 0.494, 0.475, 0.46299999999999997, 0.469, 0.479, 0.471, 0.45799999999999996, 0.43099999999999994, 0.404, 0.387, 0.387, 0.369, 0.361, 0.373, 0.386]

plt.plot(Year, Population)
plt.title('Population and years')
plt.xlabel('years')
plt.ylabel('Population')
plt.show()

plt.plot(Year, crimepercentage)
plt.title('Percantage over years')
plt.xlabel('years')
plt.ylabel('Violent crime')
plt.show()

plt.scatter(x_test, y_test, color='black')
plt.plot(x_test, y_pred, color='blue', linewidth=3)
plt.title('Violent Crime and Years')
plt.xlabel('Year')
plt.ylabel('Violent Crime')

Text(0,0.5,'Violent Crime')

#plt.plot(Year, Violentcrime)
plt.plot(Year, crimerate)
plt.title('Violent crime and years')
plt.xlabel('years')
plt.ylabel('Violent crime')
plt.show()

plt.plot(Year, Murder)
plt.title('Murder and years')
plt.xlabel('years')
plt.ylabel('Murder')
plt.show()

plt.plot(Year, Robbery)
plt.title('Robbery and years')
plt.xlabel('years')
plt.ylabel('Robbery')
plt.show()

plt.plot(Year, Aggravated)
plt.title('Aggravated Assault and years')
plt.xlabel('years')
plt.ylabel('Aggravated Assault')
plt.show()

plt.plot(Year, Property)
plt.title('Property crime and years')
plt.xlabel('years')
plt.ylabel('Property crime')
plt.show()

plt.plot(Year, Burglary)
plt.title('Burglary and years')
plt.xlabel('years')
plt.ylabel('Burglary')
plt.show()

plt.plot(Year, Larceny)
plt.title('Larceny Theft and years')
plt.xlabel('years')
plt.ylabel('Larceny Theft')
plt.show()

plt.plot(Year, Motor)
plt.title('Motor vehicle theft and years')
plt.xlabel('years')
plt.ylabel('Motor vehicle theft')
plt.show()

	Year	Population1	Violentcrime	Violent crime rate	Murder and nonnegligent manslaughter	Murder and nonnegligent manslaughter rate	Rape (revised definition3)	Rape (revised definition) rate3	Rape (legacy definition4)	Rape (legacy definition) rate4	...	Aggravated assault	Aggravated assault rate	Property crime	Property crime rate	Burglary	Burglary rate	Larceny- theft	Larceny- theft rate	Motor vehicle theft	Motor vehicle theft rate
0	1997.0	267,783,607	1,636,096	611	18,208	6.8	NaN	NaN	96,153	35.9	...	1,023,201	382.1	11,558,475	4,316.30	2,460,526	918.8	7,743,760	2,891.80	1,354,189	505.7
1	1998.0	270,248,003	1,533,887	567.6	16,974	6.3	NaN	NaN	93,144	34.5	...	976,583	361.4	10,951,827	4,052.50	2,332,735	863.2	7,376,311	2,729.50	1,242,781	459.9
2	1999.0	272,690,813	1,426,044	523	15,522	5.7	NaN	NaN	89,411	32.8	...	911,740	334.3	10,208,334	3,743.60	2,100,739	770.4	6,955,520	2,550.70	1,152,075	422.5
3	2000.0	281,421,906	1,425,486	506.5	15,586	5.5	NaN	NaN	90,178	32.0	...	911,706	324	10,182,584	3,618.30	2,050,992	728.8	6,971,590	2,477.30	1,160,002	412.2
4	20015.0	285,317,559	1,439,480	504.5	16,037	5.6	NaN	NaN	90,863	31.8	...	909,023	318.6	10,437,189	3,658.10	2,116,531	741.8	7,092,267	2,485.70	1,228,391	430.5