For one of my upper level research classes I wrote a paper on how digital media alters personal risk assessment. The findings showed that the shift to a digital environment forces  different types of dangers to be processed which is worth studying because risk perception is the bottom line for keeping ourselves alive. 
          Read My Paper Here
          
       
          Related to my paper, I found a dataset online recording different social relationships and how that influences perceived risk. The source can be found here Risk Alteration Data 
          
        
          Source : Risk Alteration Data
          Sample of 391 Vietnamese respondents aged from 15 to 47 years, various geographical regions in Vietnam. 
          A data frame with 392 observations and 15 variables. I added in 10 more columns as a way to encode the response variable (Risk Perception) into classes for the neural  net program to run as a regression. 
        
          â—‹ ID- Participant ID 
          â—‹ Age - Age in years 
          â—‹ Gender- 0 - Male; 1 - Female; 2 - Others or do not want to answer
          â—‹ Religion- 0 - Non-religion; 1 - Buddhism; 2 - Christian and Protestian; 3 - Other religion 
          â—‹ Income- Millions in VND 
          â—‹ Career- 1 - Students; 2 - Government officers; 3 - Private sectors 
          â—‹ Family Member- Number of people in your household (except respondents) 
          â—‹ Social Media Usage Daily- 1 - Less 1 hour; 2 - From 1 to 3 hours; 3 - More than 3 hours 
          â—‹ Province- 1 - North, 2 - Middle, 3 - South, 4 - Overseas 
          â—‹ Education- 1 - Highschool, 2 - University/College, 3 - Postgraduation 
          â—‹ nCoV-source- 1 - Official information, 2 - Social media and word-of-mouth, 3 - Others
          
          â—‹ Frequency- The number of actively searching Corona virus information 
          â—‹ Risk Perception- 1 to 10 
          â—‹ Fake News- From 1-10: The level of fake news of virus Corona 
          â—‹ Officialnews- From 1-10: The level of official news of virus Corona
        
Dataset: Socioeconomic Dataset 
 
 
Load in dataset 
 Last 10 columns were created because a multi-class neural net cannot distinguish between factor inputs within one column
library(readxl)
Data_in_brief <- read_excel("C:/Users/ksree/Desktop/senior year/statistics/wh9xk5mp9m-3/Data-in-brief.xlsx", 
                            sheet = "Dataset", range = "b1:y392")
View(Data_in_brief)
str(Data_in_brief)## tibble [391 x 24] (S3: tbl_df/tbl/data.frame)
##  $ Age           : num [1:391] 26 21 22 25 22 20 25 35 24 19 ...
##  $ Gender        : num [1:391] 0 0 0 1 0 1 1 0 2 1 ...
##  $ Religion      : num [1:391] 1 0 0 0 0 0 0 0 0 0 ...
##  $ Income        : num [1:391] 10 7 8 15 4 8 15 10 7 4 ...
##  $ Career        : num [1:391] 1 3 3 3 3 1 3 3 3 1 ...
##  $ Familymember  : num [1:391] 4 6 0 4 4 5 5 4 4 2 ...
##  $ Socialmedia   : num [1:391] 2 2 2 3 3 3 2 3 2 3 ...
##  $ Province      : num [1:391] 1 3 3 3 3 1 2 3 3 1 ...
##  $ Education     : num [1:391] 3 2 2 2 2 2 2 2 2 1 ...
##  $ nCoVsource    : num [1:391] 1 1 1 1 2 1 2 2 2 2 ...
##  $ Frequency     : num [1:391] 0 4 2 2 1 4 8 24 2 2 ...
##  $ Fakenews      : num [1:391] 10 8 8 4 8 10 8 9 4 8 ...
##  $ Officialnews  : num [1:391] 5 7 5 3 5 5 4 2 5 4 ...
##  $ Riskperception: num [1:391] 1 10 9 8 8 7 10 8 6 10 ...
##  $ r1            : num [1:391] 1 0 0 0 0 0 0 0 0 0 ...
##  $ r2            : num [1:391] 0 0 0 0 0 0 0 0 0 0 ...
##  $ r3            : num [1:391] 0 0 0 0 0 0 0 0 0 0 ...
##  $ r4            : num [1:391] 0 0 0 0 0 0 0 0 0 0 ...
##  $ r5            : num [1:391] 0 0 0 0 0 0 0 0 0 0 ...
##  $ r6            : num [1:391] 0 0 0 0 0 0 0 0 1 0 ...
##  $ r7            : num [1:391] 0 0 0 0 0 1 0 0 0 0 ...
##  $ r8            : num [1:391] 0 0 0 1 1 0 0 1 0 0 ...
##  $ r9            : num [1:391] 0 0 1 0 0 0 0 0 0 0 ...
##  $ r10           : num [1:391] 0 1 0 0 0 0 1 0 0 1 ...
 Put data into a variable minus the Riskperception column because the results were encoded seprately in the last ten columns
data = subset(Data_in_brief, select = -c(Riskperception))
 Profile data to see any immediate anomalies and check variables
View(data)
 Load required packages and set seed to reproduce results
library(neuralnet)## Warning: package 'neuralnet' was built under R version 4.0.3library(nnet)## Warning: package 'nnet' was built under R version 4.0.3library(psych)
require(ggplot2)## Loading required package: ggplot2## 
## Attaching package: 'ggplot2'## The following objects are masked from 'package:psych':
## 
##     %+%, alphaset.seed(345)Get some descriptive statistics for the overall dataset 
 Identify any visual correlations among variables, visual outliers, mean and skew
plot(data)summary(data)##       Age            Gender          Religion          Income      
##  Min.   : 0.00   Min.   :0.0000   Min.   :0.0000   Min.   : 0.000  
##  1st Qu.:19.00   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.: 2.000  
##  Median :23.00   Median :0.0000   Median :0.0000   Median : 5.000  
##  Mean   :23.23   Mean   :0.5754   Mean   :0.4834   Mean   : 8.747  
##  3rd Qu.:28.00   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:10.000  
##  Max.   :47.00   Max.   :2.0000   Max.   :3.0000   Max.   :50.000  
##      Career       Familymember    Socialmedia       Province    
##  Min.   :1.000   Min.   : 0.00   Min.   :0.000   Min.   :0.000  
##  1st Qu.:1.000   1st Qu.: 4.00   1st Qu.:2.000   1st Qu.:2.000  
##  Median :2.000   Median : 4.00   Median :2.000   Median :3.000  
##  Mean   :1.926   Mean   : 4.21   Mean   :2.414   Mean   :2.315  
##  3rd Qu.:3.000   3rd Qu.: 5.00   3rd Qu.:3.000   3rd Qu.:3.000  
##  Max.   :3.000   Max.   :10.00   Max.   :3.000   Max.   :3.000  
##    Education       nCoVsource      Frequency         Fakenews     
##  Min.   :0.000   Min.   :1.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:2.000   1st Qu.:2.000   1st Qu.: 2.000   1st Qu.: 7.000  
##  Median :2.000   Median :2.000   Median : 2.000   Median : 8.000  
##  Mean   :2.056   Mean   :1.885   Mean   : 4.751   Mean   : 7.767  
##  3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.: 8.000   3rd Qu.: 9.500  
##  Max.   :3.000   Max.   :3.000   Max.   :24.000   Max.   :10.000  
##   Officialnews          r1                 r2                 r3         
##  Min.   : 0.000   Min.   :0.000000   Min.   :0.000000   Min.   :0.00000  
##  1st Qu.: 5.000   1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.00000  
##  Median : 6.000   Median :0.000000   Median :0.000000   Median :0.00000  
##  Mean   : 5.795   Mean   :0.005115   Mean   :0.002557   Mean   :0.01535  
##  3rd Qu.: 7.000   3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.00000  
##  Max.   :10.000   Max.   :1.000000   Max.   :1.000000   Max.   :1.00000  
##        r4          r5          r6                r7               r8        
##  Min.   :0   Min.   :0   Min.   :0.00000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0   1st Qu.:0   1st Qu.:0.00000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0   Median :0   Median :0.00000   Median :0.0000   Median :0.0000  
##  Mean   :0   Mean   :0   Mean   :0.07928   Mean   :0.1611   Mean   :0.3043  
##  3rd Qu.:0   3rd Qu.:0   3rd Qu.:0.00000   3rd Qu.:0.0000   3rd Qu.:1.0000  
##  Max.   :0   Max.   :0   Max.   :1.00000   Max.   :1.0000   Max.   :1.0000  
##        r9              r10        
##  Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000  
##  Mean   :0.1202   Mean   :0.1816  
##  3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000describe(data)Split data into training and test sets to validate model in the end
set.seed(4321)
dt <- sort(sample(nrow(data), nrow(data)*.75))
train <- data[dt,]
test <- data[-dt,]
describe(train)describe(test)scl <- function(x){ (x - min(x))/(max(x) - min(x)) }
train[, 1:23] <- data.frame(lapply(train[, 1:23], scl))
test[, 1:23] <- data.frame(lapply(test[, 1:23], scl))
is.nan.data.frame <- function(x)
  do.call(cbind, lapply(x, is.nan))
train[is.nan(train)] <- 0
test[is.nan(test)] <- 0To put in a formula each factor that is encoded in its own column must be put into its own variable
n <- names(train)
f <- as.formula(paste("r1 + r2 + r3 + r4 + r5 + r6  +r7 + r8 + r9 + r10~", paste(n[!n %in% c("r1","r2","r3","r4","r5","r6","r7","r8","r9","r10")], collapse = " + ")))
f## r1 + r2 + r3 + r4 + r5 + r6 + r7 + r8 + r9 + r10 ~ Age + Gender + 
##     Religion + Income + Career + Familymember + Socialmedia + 
##     Province + Education + nCoVsource + Frequency + Fakenews + 
##     Officialnews
 After trail and error I found parameters for the hidden vertices and neurons that minimize error
nntrain <- neuralnet(f,
                data = train,
                hidden = c(290,270,260,150),
                threshold = 0.001,
                rep = 1,
                act.fct = "logistic",
                linear.output = FALSE,
                lifesign = "minimal")## hidden: 290, 270, 260, 150    thresh: 0.001    rep: 1/1    steps:     218    error: 0.50139  time: 1.57 minsplot(nntrain)
 Test the model on the test dataset
nntest <- neuralnet(f,
                data = train,
                hidden = c(290,270,260,150),
                threshold = 0.001,
                rep = 1,
                act.fct = "logistic",
                linear.output = FALSE,
                lifesign = "minimal")## hidden: 290, 270, 260, 150    thresh: 0.001    rep: 1/1    steps:     249    error: 0.50073  time: 2.12 mins My notes on how feed forward neural nets work broken down into the basics 
    View My Notes Here 
After trial and error for determining how to bind the hidden layer of perceptrons, I reduced my error to about 0.5 for both the training and testing data which is a good indicator that the model is not overfitting the training data. However, the run time for the program was higher and thus tuning this model or applying it to a larger dataset will be costly. This was primarily a way for me to understand how to tune a neural net, and I played around with the learning rate and different starting weights but presented my basic model.  
       
    
    I researched for the ideal number of hidden neurons but there does not seem to be a unified answer. Some of the popular methods included taking the square root of the number of observations and multiplying it to the number of classes. But that did not reduce my error. Below is a visualization of the neural net. 
    
    
    
      
With this model, a person's risk perception, specifically towards COVID19, could be classified and predicted 
      
      Since the world has shifted online, the risks we encode are primarily based on  our digital environment. If this is making us worse at predicting real world danger is an ongoing debate, but this much is clear 
 
      Our media consumption and relationships affect our decision making