During my WhatsApp user experience redesign I started questioning 
             Why do people spread mis-information?
            
            And the answer is that people would rather trigger an intense emotion over a subtle one.  
            View My UX Case Study Here 
            This led to me questioning if these strong feelings could be quantitatively analyzed and connected to misclassifying COVID related claims.      
  
            
            I found a study conducted by Harvard students on South Korean Adults and their responses to a COVID19 survey. The source can be found here Harvard Study Data 
            
          
 
            I used their dataset to build a predictive model using principal component analysis to derive if a person would share misinformation based on 
○ Belief of false COVID statements 
○ Age 
○ Gender 
○ Political Ideology 
○ Anger towards virus 
○ Anxiety towards virus 
○ Education
○ Income 
          
            Source : Harvard Study Data
            Sample of 513 people from South Korea, of which the sample was matched to be the same proportions of key Korean demographics. 
            A data frame with 513 observations and 37 variables. The responses to the following 12 false statements regarding COVID19 were encoded as 0 or 1 for trust and sharing. 
            The following 12 statements were presented: 
 
            Q1: Even if completely cured, COVID-19 patients suffer from life-long lung damage
 
            Q2: A runny nose is a symptom of cold, not COVID-19
            Q3: Hair dryers can kill the virus
            Q4: Drinking alcohol can kill the virus
            Q5: Only certain age groups, races, or ethnicities are vulnerable
            to the virus
            Q6: Antibiotics or flu vaccines can prevent the disease 
            Q7: You can test yourself for COVID-19 by holding your breath
            for 10 seconds
            Q8: Garlic can prevent infection 
            Q9: Gargling with salt water can eliminate the virus 
            Q10: The virus can penetrate into the body through contact
 
            Q11: Smoking can kill the virus
            Q12: Drinking hot water or hot tea can reduce the chances of
            getting infected
          
            FNtrust(1-12) are responses for believing the 12 statements 
            FNsharing(1-12) are responses for sharing the 12 statements 
            Anger- Anger towards pandemic (6-point Likert scale value) 
            Anxiety- Fear towards the pandemic (6-point Likert scales) 
            Ideology- 7 point scale (1 being extremely conservative and 7 being extremely liberal) 
            Gender- Gender (Female is 1) 
            Age - Age in years 
            Educ- Education (1: No elementary school diploma to 12: Master’s degrees or higher) 
            Income- Income (1: Less than ₩10,000,000 to 11: More than ₩100,000,000) 
            st- Sum of trust per participant 
            ss- Sum of sharing per participant
          
Dataset: Harvard Dataset 
 
 
Load in dataset and create a variable for
library(readxl)
  harvard <- read_excel("C:/Users/ksree/Desktop/comp_app/harvard.xlsx", 
                        sheet = "Sheet16")
  View(harvard)
  data <- cbind(harvard)
 
 
 Remove binary answers to statements 1-12 for trust and sharing (I use the average per participant for these variables)
data = subset(data, select = -c(No, FNtrust1,FNtrust2,FNtrust3,FNtrust4,FNtrust5,FNtrust6,FNtrust7,FNtrust8,
                                  FNtrust9,FNtrust10,FNtrust11,FNtrust12,FNtrust13,  FNsharing1,FNsharing2,FNsharing3,FNsharing4,FNsharing5,
                                  FNsharing6,FNsharing7,FNsharing8,FNsharing9,FNsharing10,FNsharing11,FNsharing12, FNsharing13))
 
 
 Profile data to see any immediate anomalies and check variables
View(data)
 
 
 Needed packages
library(faraway)
  library(psych)## 
  ## Attaching package: 'psych'## The following object is masked from 'package:faraway':
  ## 
  ##     logitGet some descriptive statistics for the overall dataset Identify any visual correlations among variables, visual outliers, mean and skew
plot(data)summary(data)##      anger          anxiety         ideology         gender      
  ##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :0.0000  
  ##  1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:0.0000  
  ##  Median :4.000   Median :4.000   Median :4.000   Median :1.0000  
  ##  Mean   :3.928   Mean   :3.926   Mean   :4.109   Mean   :0.5224  
  ##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:1.0000  
  ##  Max.   :6.000   Max.   :6.000   Max.   :7.000   Max.   :1.0000  
  ##       age            region            educ            income      
  ##  Min.   :17.00   Min.   : 1.000   Min.   : 2.000   Min.   : 1.000  
  ##  1st Qu.:35.00   1st Qu.: 3.000   1st Qu.: 8.000   1st Qu.: 3.000  
  ##  Median :49.00   Median : 6.000   Median :10.000   Median : 5.000  
  ##  Mean   :48.19   Mean   : 6.665   Mean   : 9.158   Mean   : 5.374  
  ##  3rd Qu.:62.00   3rd Qu.:10.000   3rd Qu.:10.000   3rd Qu.: 7.000  
  ##  Max.   :94.00   Max.   :19.000   Max.   :12.000   Max.   :12.000  
  ##        st               ss        
  ##  Min.   : 0.000   Min.   : 0.000  
  ##  1st Qu.: 2.000   1st Qu.: 1.000  
  ##  Median : 3.000   Median : 3.000  
  ##  Mean   : 3.639   Mean   : 3.238  
  ##  3rd Qu.: 5.000   3rd Qu.: 5.000  
  ##  Max.   :13.000   Max.   :13.000describe(data)
 
 
 Create a visual for the correlations existing among variables through a correlation plot Correlations are not as high as hypothesized
library(GGally)## Warning: package 'GGally' was built under R version 4.0.3## Loading required package: ggplot2## 
  ## Attaching package: 'ggplot2'## The following objects are masked from 'package:psych':
  ## 
  ##     %+%, alpha## Registered S3 method overwritten by 'GGally':
  ##   method from   
  ##   +.gg   ggplot2## 
  ## Attaching package: 'GGally'## The following object is masked from 'package:faraway':
  ## 
  ##     happyggcorr(data, method = c("everything", "pearson")) Split data into training and test sets to validate model in the end
require(caTools)## Loading required package: caTools## Warning: package 'caTools' was built under R version 4.0.3set.seed(4321)
  sample = sample.split(data,SplitRatio = 0.75)
  train = subset(data,sample ==TRUE) 
  test = subset(data, sample==FALSE)
  describe(train)describe(test)Standardize and scale data so that the mean is 0 and std is 1 (comparing measurements with different units) Important to scale data after splitting train/test
train$anger_scaled<-scale(train$anger)
  train$anxiety_scaled<-scale(train$anxiety)
  train$ideology_scaled<-scale(train$ideology)
  train$gender_scaled<-scale(train$gender)
  train$age_scaled<-scale(train$age)
  train$educ_scaled<-scale(train$educ)
  train$income_scaled<-scale(train$income)
  train$region_scaled<-scale(train$region)
  train$st_scaled<-scale(train$st)
  train$ss_scaled<-scale(train$ss)
  train = subset(train, select = -c(anger, anxiety,ideology, gender, age, educ, income, region, st, ss))
  describe(train)test$anger_scaled<-scale(test$anger)
  test$anxiety_scaled<-scale(test$anxiety)
  test$ideology_scaled<-scale(test$ideology)
  test$gender_scaled<-scale(test$gender)
  test$age_scaled<-scale(test$age)
  test$educ_scaled<-scale(test$educ)
  test$income_scaled<-scale(test$income)
  test$region_scaled<-scale(test$region)
  test$st_scaled<-scale(test$st)
  test$ss_scaled<-scale(test$ss)
  test = subset(test, select = -c(anger, anxiety,ideology, gender, age, educ, income, region, st, ss))
  describe(test)Principle Component Analysis looks to reduce dimensionality through unsupervised learning The goal is for the model to explain as much variation as possible by transforming a large set of variables into a smaller one that still contains most of the information in the large set
train.pca <- prcomp(train[,1:10])
  train.pca## Standard deviations (1, .., p=10):
  ##  [1] 1.4255319 1.2077033 1.1529020 1.0887612 0.9731955 0.8938451 0.8922933
  ##  [8] 0.8441184 0.7324568 0.4510464
  ## 
  ## Rotation (n x k) = (10 x 10):
  ##                         PC1         PC2          PC3         PC4         PC5
  ## anger_scaled    -0.25583327  0.57400924 -0.218816336  0.01419926  0.13902518
  ## anxiety_scaled  -0.26163825  0.58754240 -0.112973270 -0.10224780  0.22324146
  ## ideology_scaled  0.18730345 -0.13699372 -0.460041912  0.36959192  0.16250685
  ## gender_scaled    0.11918834 -0.09918661  0.547144849 -0.20349712  0.29557019
  ## age_scaled      -0.25644739  0.11560798  0.503856074 -0.04362025 -0.18781140
  ## educ_scaled      0.24571311  0.05528468  0.003909389 -0.45788526  0.61382793
  ## income_scaled    0.17087265 -0.20222358 -0.348418899 -0.50472856  0.02881812
  ## region_scaled    0.06003516 -0.06929543  0.183675363  0.57769944  0.58003623
  ## st_scaled       -0.57538926 -0.34135434 -0.088042282 -0.06086302  0.19860545
  ## ss_scaled       -0.57165693 -0.34360070 -0.107612108 -0.08706893  0.16819793
  ##                          PC6          PC7          PC8          PC9
  ## anger_scaled    -0.102424733 -0.215839558  0.073590176  0.689284240
  ## anxiety_scaled  -0.007255479 -0.030576195  0.172903418 -0.694421526
  ## ideology_scaled -0.434023289  0.374473811  0.488566733 -0.002866363
  ## gender_scaled   -0.543400395 -0.392750484  0.307983359  0.007919557
  ## age_scaled       0.244165044  0.538362771  0.505317276  0.158149055
  ## educ_scaled      0.027212835  0.500412493 -0.288059340  0.123776087
  ## income_scaled    0.435959716 -0.286391627  0.528033751  0.044457474
  ## region_scaled    0.489857141 -0.188509564  0.088777883  0.014324610
  ## st_scaled        0.001721200  0.010591295 -0.077848971  0.008858990
  ## ss_scaled       -0.124158628  0.008914722  0.007234805 -0.001805096
  ##                         PC10
  ## anger_scaled     0.006834423
  ## anxiety_scaled   0.013440128
  ## ideology_scaled  0.064748705
  ## gender_scaled    0.057986670
  ## age_scaled       0.003875763
  ## educ_scaled     -0.025648413
  ## income_scaled    0.007340747
  ## region_scaled   -0.064000244
  ## st_scaled        0.703740331
  ## ss_scaled       -0.701530164summary(train.pca)## Importance of components:
  ##                           PC1    PC2    PC3    PC4     PC5    PC6     PC7
  ## Standard deviation     1.4255 1.2077 1.1529 1.0888 0.97320 0.8938 0.89229
  ## Proportion of Variance 0.2032 0.1459 0.1329 0.1185 0.09471 0.0799 0.07962
  ## Cumulative Proportion  0.2032 0.3491 0.4820 0.6005 0.69524 0.7751 0.85475
  ##                            PC8     PC9    PC10
  ## Standard deviation     0.84412 0.73246 0.45105
  ## Proportion of Variance 0.07125 0.05365 0.02034
  ## Cumulative Proportion  0.92601 0.97966 1.00000
 
 
 Visualize how each principle component explains model variation
plot(train.pca)biplot(train.pca)
 
 
 Reassess correlations with PCA’s
library(corrplot)## Warning: package 'corrplot' was built under R version 4.0.3## corrplot 0.84 loadedcor1 <- cor(train.pca$x, method="pearson")
  corrplot::corrplot(cor1, method= "color", order = "hclust", tl.pos = 'n')
 
 
 Save PCA’s as a seperate data frame to use for model building
pcs <- as.data.frame(train.pca$x)
  data2 <- cbind(train, pcs)
  data2 <- subset(data2, select = -c(anger_scaled, anxiety_scaled, ideology_scaled, gender_scaled, age_scaled, educ_scaled,
                                    income_scaled, region_scaled, st_scaled))
  data2
 
 
 Opitmal PCA’s is 3
  Run a regression model using the first 3 PCA’s
lmodel <- lm(ss_scaled ~ PC1 + PC2 + PC3, data = data2)
  summary(lmodel)## 
  ## Call:
  ## lm(formula = ss_scaled ~ PC1 + PC2 + PC3, data = data2)
  ## 
  ## Residuals:
  ##      Min       1Q   Median       3Q      Max 
  ## -1.31237 -0.23538  0.04115  0.22582  1.31479 
  ## 
  ## Coefficients:
  ##               Estimate Std. Error t value Pr(>|t|)    
  ## (Intercept) -1.956e-17  2.041e-02    0.00        1    
  ## PC1         -5.717e-01  1.434e-02  -39.87  < 2e-16 ***
  ## PC2         -3.436e-01  1.693e-02  -20.30  < 2e-16 ***
  ## PC3         -1.076e-01  1.773e-02   -6.07 3.29e-09 ***
  ## ---
  ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  ## 
  ## Residual standard error: 0.3868 on 355 degrees of freedom
  ## Multiple R-squared:  0.8517, Adjusted R-squared:  0.8504 
  ## F-statistic: 679.5 on 3 and 355 DF,  p-value: < 2.2e-16Compare to a simple least squares regression
lmod <- lm(ss_scaled ~ ., data = train)
  summary(lmod)## 
  ## Call:
  ## lm(formula = ss_scaled ~ ., data = train)
  ## 
  ## Residuals:
  ##      Min       1Q   Median       3Q      Max 
  ## -2.41925 -0.27647  0.06367  0.26180  2.72794 
  ## 
  ## Coefficients:
  ##                   Estimate Std. Error t value Pr(>|t|)    
  ## (Intercept)     -8.984e-17  3.238e-02   0.000   1.0000    
  ## anger_scaled     1.230e-02  3.668e-02   0.335   0.7375    
  ## anxiety_scaled   1.834e-02  3.673e-02   0.499   0.6179    
  ## ideology_scaled  4.701e-02  3.391e-02   1.386   0.1666    
  ## gender_scaled    3.810e-02  3.362e-02   1.133   0.2580    
  ## age_scaled       2.448e-02  3.421e-02   0.716   0.4748    
  ## educ_scaled     -5.381e-02  3.342e-02  -1.610   0.1083    
  ## income_scaled    7.737e-03  3.378e-02   0.229   0.8190    
  ## region_scaled   -6.915e-02  3.292e-02  -2.101   0.0364 *  
  ## st_scaled        7.847e-01  3.336e-02  23.522   <2e-16 ***
  ## ---
  ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  ## 
  ## Residual standard error: 0.6135 on 349 degrees of freedom
  ## Multiple R-squared:  0.6331, Adjusted R-squared:  0.6236 
  ## F-statistic:  66.9 on 9 and 349 DF,  p-value: < 2.2e-16
 
 
 Use forward selection to find most significant features in training dataset
library(MASS)
  library(olsrr)## Warning: package 'olsrr' was built under R version 4.0.3## Registered S3 methods overwritten by 'car':
  ##   method                          from
  ##   influence.merMod                lme4
  ##   cooks.distance.influence.merMod lme4
  ##   dfbeta.influence.merMod         lme4
  ##   dfbetas.influence.merMod        lme4## 
  ## Attaching package: 'olsrr'## The following object is masked from 'package:MASS':
  ## 
  ##     cement## The following object is masked from 'package:faraway':
  ## 
  ##     hsb## The following object is masked from 'package:datasets':
  ## 
  ##     riversmodel <- lm(ss_scaled ~ ., data = train)
  k <- ols_step_forward_p(model)
  ols_step_forward_p(model)## 
  ##                                Selection Summary                                
  ## -------------------------------------------------------------------------------
  ##         Variable                         Adj.                                      
  ## Step        Entered        R-Square    R-Square     C(p)       AIC        RMSE     
  ## -------------------------------------------------------------------------------
  ##    1    st_scaled            0.6226      0.6216    3.9338    673.9481    0.6152    
  ##    2    region_scaled        0.6269      0.6248    1.8961    671.8867    0.6126    
  ##    3    educ_scaled          0.6294      0.6263    1.4768    671.4311    0.6113    
  ##    4    ideology_scaled      0.6306      0.6264    2.3834    672.3158    0.6112    
  ##    5    gender_scaled        0.6318      0.6266    3.2080    673.1128    0.6111    
  ## -------------------------------------------------------------------------------plot(k)
 
 
 Repeat for stepwise selection
k2 <- ols_step_both_p(model)
  ols_step_both_p(model)## 
  ##                                Stepwise Selection Summary                                 
  ## -----------------------------------------------------------------------------------------
  ##                           Added/                   Adj.                                      
  ## Step      Variable       Removed     R-Square    R-Square     C(p)       AIC        RMSE     
  ## -----------------------------------------------------------------------------------------
  ##    1      st_scaled      addition       0.623       0.622    3.9340    673.9481    0.6152    
  ##    2    region_scaled    addition       0.627       0.625    1.8960    671.8867    0.6126    
  ## -----------------------------------------------------------------------------------------plot(k2)summary(k2)##            Length Class      Mode     
  ## orders      2     -none-     character
  ## method      2     -none-     character
  ## steps       1     -none-     numeric  
  ## predictors  2     -none-     character
  ## rsquare     2     -none-     numeric  
  ## aic         2     -none-     numeric  
  ## sbc         2     -none-     numeric  
  ## sbic        2     -none-     numeric  
  ## adjr        2     -none-     numeric  
  ## rmse        2     -none-     numeric  
  ## mallows_cp  2     -none-     numeric  
  ## indvar      9     -none-     character
  ## betas       5     -none-     numeric  
  ## lbetas      2     -none-     numeric  
  ## pvalues     5     -none-     numeric  
  ## beta_pval   4     data.frame list     
  ## model      12     lm         list
 
 
 View all possible models
library(olsrr)
  k3 <- ols_step_all_possible(model)
  k3plot(k3)
 
 
 Using these variable selection methods, formally write out each model to compare model accuracy
#pcr
  lmodpcr <- lm(train$ss_scaled ~ train.pca$x[,1:3])
  #simple 
  lmod <- lm(ss_scaled ~ ., data = train)
  #forward 
  forward <- lm(ss_scaled ~ region_scaled + educ_scaled + ideology_scaled + gender_scaled, data = train)
  #stepwise 
  stepwise <- lm(ss_scaled ~ st_scaled + region_scaled, data = train)Perform cross validation on these 4 models
library(caret)## Loading required package: lattice## 
  ## Attaching package: 'lattice'## The following object is masked from 'package:faraway':
  ## 
  ##     melanomaset.seed(13245)
  train.control <- trainControl(method = "repeatedcv", number = 10,repeats=20)
  model_leastsquares <- train(ss_scaled ~ ., data = train, method = "lm",
                        trControl = train.control)
  model_pcr <- train(ss_scaled ~ PC1 + PC2 + PC3, data = data2, method = "lm",
                    trControl = train.control)
  model_forward <- train(ss_scaled ~ region_scaled + educ_scaled + ideology_scaled + gender_scaled, data = train, method = "lm",
                        trControl = train.control)
  model_stepwise <- train(ss_scaled ~ st_scaled + region_scaled, data = train, method = "lm",
                          trControl = train.control)
  print(model_leastsquares)## Linear Regression 
  ## 
  ## 359 samples
  ##   9 predictor
  ## 
  ## No pre-processing
  ## Resampling: Cross-Validated (10 fold, repeated 20 times) 
  ## Summary of sample sizes: 323, 323, 324, 323, 323, 322, ... 
  ## Resampling results:
  ## 
  ##   RMSE       Rsquared   MAE      
  ##   0.6134204  0.6174199  0.4311321
  ## 
  ## Tuning parameter 'intercept' was held constant at a value of TRUEprint(model_pcr)## Linear Regression 
  ## 
  ## 359 samples
  ##   3 predictor
  ## 
  ## No pre-processing
  ## Resampling: Cross-Validated (10 fold, repeated 20 times) 
  ## Summary of sample sizes: 323, 324, 324, 323, 324, 323, ... 
  ## Resampling results:
  ## 
  ##   RMSE       Rsquared  MAE     
  ##   0.3847179  0.850367  0.293216
  ## 
  ## Tuning parameter 'intercept' was held constant at a value of TRUEprint(model_forward)## Linear Regression 
  ## 
  ## 359 samples
  ##   4 predictor
  ## 
  ## No pre-processing
  ## Resampling: Cross-Validated (10 fold, repeated 20 times) 
  ## Summary of sample sizes: 322, 325, 323, 324, 322, 323, ... 
  ## Resampling results:
  ## 
  ##   RMSE       Rsquared    MAE      
  ##   0.9848961  0.04987417  0.7874916
  ## 
  ## Tuning parameter 'intercept' was held constant at a value of TRUEprint(model_stepwise)## Linear Regression 
  ## 
  ## 359 samples
  ##   2 predictor
  ## 
  ## No pre-processing
  ## Resampling: Cross-Validated (10 fold, repeated 20 times) 
  ## Summary of sample sizes: 323, 324, 322, 324, 323, 323, ... 
  ## Resampling results:
  ## 
  ##   RMSE       Rsquared   MAE      
  ##   0.6053524  0.6288003  0.4233915
  ## 
  ## Tuning parameter 'intercept' was held constant at a value of TRUEValidate models on test data set
Set up PCA validation
testpca <- cbind(data2[1:154,])
testpcaPC1 <- train.pca$rotation[,1]
PC2<- train.pca$rotation[,2]
PC3<- train.pca$rotation[,3]
View(testpca)Validate models on test data set
model_leastsquares_test <- train(ss_scaled ~ ., data = test, method = "lm", trControl = train.control)
model_PCA_test <- model_pcr_test <- train(ss_scaled ~ PC1 + PC2 + PC3, data = testpca,
                        method = "lm",
                        trControl = train.control)
model_forward_test <- train(ss_scaled ~ region_scaled + educ_scaled + ideology_scaled + gender_scaled, data = test, method =                          "lm", trControl = train.control)
model_stepwise_test <- train(ss_scaled ~ st_scaled + region_scaled, data = test, method = "lm",
                        trControl = train.control)
model_leastsquares_test## Linear Regression 
## 
## 154 samples
##   9 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 20 times) 
## Summary of sample sizes: 139, 139, 140, 138, 138, 139, ... 
## Resampling results:
## 
##   RMSE       Rsquared   MAE      
##   0.6152499  0.6245287  0.4369182
## 
## Tuning parameter 'intercept' was held constant at a value of TRUEmodel_PCA_test## Linear Regression 
## 
## 154 samples
##   3 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 20 times) 
## Summary of sample sizes: 139, 138, 138, 139, 139, 139, ... 
## Resampling results:
## 
##   RMSE       Rsquared   MAE      
##   0.3604114  0.8871406  0.2763031
## 
## Tuning parameter 'intercept' was held constant at a value of TRUEmodel_forward_test## Linear Regression 
## 
## 154 samples
##   4 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 20 times) 
## Summary of sample sizes: 138, 138, 138, 140, 138, 140, ... 
## Resampling results:
## 
##   RMSE       Rsquared    MAE      
##   0.9988511  0.07279995  0.7535439
## 
## Tuning parameter 'intercept' was held constant at a value of TRUEmodel_stepwise_test## Linear Regression 
## 
## 154 samples
##   2 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 20 times) 
## Summary of sample sizes: 138, 138, 140, 139, 139, 138, ... 
## Resampling results:
## 
##   RMSE       Rsquared   MAE      
##   0.5980537  0.6459783  0.4289118
## 
## Tuning parameter 'intercept' was held constant at a value of TRUEBased on cross validation and testing the models on the test data set, the  Principal Component Regression is the strongest model. It has a low RMSE (prediction errors), low MSE, and high R-squared (fit of the model with the data). 
 
      
      The least squares line usually has the best fit of the data because it intentionally reduces the sum of squared residuals but here it is clear that the Principal Component Regression out performs even the least squares model.  
      Through PCA I was able to reduce the complexity in the dimensionality of the model from 10 variables (in ten-dimensional space) to 3 variables which is much more intuitive.
      
      
According to my model, a person's likelihood to share COVID19 misinformation could be calculated  through a simple survey of 
      ○ Belief in 12 false COVID19 claims 
      ○ Anger 
      ○ Anxiety 
      ○ Political Ideology 
      ○ Gender 
      ○ Age 
      ○ Education 
      ○ Income 
      ○ Region 
      (With very low prediction errors) 
      This project makes me wonder if people would be safer answering those questions prior to sharing information about the pandemic. The user experience side of me is saying no, but in terms of reducing the incorrect information which is potentially harming people, this appears to be an option. 
 However, this algorithm makes decisions based on the participant's demographic information to output a more accurate result. 
 The alternative is to use a stepwise model that yields a less accurate result but will utilize only the first and last features listed above. 
 Both are feasible  strategies but it would be up to the organization to decide the tradeoff between accuracy and requiring personal information.