To Be Finished
for (i in 1:length(luckfactor)) { for (j in 1:length(selected)) { for (k in 1:R) { simulationresults <- tibble(Skill = runif(n = N, min = minscore, max = maxscore), Luck = runif(n = N, min = minscore, max = maxscore)) %>% mutate(Score = Skill * (1 - luckfactor[i]) + Luck * luckfactor[i]) luckyFew <- simulationresults %>% mutate(SkillRank = N - rank(Skill) + 1, ScoreRank = N - rank(Score) + 1, SkillSelected = SkillRank <= selected[j] * N) %>% arrange(ScoreRank) %>% head(selected[j] * N) results[[i]][[j]][k,] <- c(mean(luckyFew$Skill), mean(luckyFew$Luck), sum(luckyFew$SkillSelected), luckfactor[i], selected[j]) } results[[i]][[j]] <- as.data.frame(results[[i]][[j]]) } results[[i]] <- bind_rows(results[[i]]) } results <- bind_rows(results) colnames(results) <- c("Skill", "Luck", "SkillSelected", "LuckFactor", "ProportionSelected") results$LuckFactor <- as.
Two weeks ago I wrote a post, Using R and Random Forest to predict Diabetes Risk. Since I am less experienced with using python in machine learning models, and this was a data set that worked out so nicely, I figured I would take an attempt at it.
First we need to load all the modules and functions we need to use.
import pandas as pd from matplotlib import pyplot as plt import numpy as np import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn import metrics The next thing is just like was done in R, load the data, clean it up a bit for using scikit-learn to create a classification model, and then split our factors from our classification variable.
Time for something a bit different from my previous posts, but hopefully more common in the future.
I was looking through the UCI Machine Learning Repository for a couple of data sets I could use for some simple machine learning problems to try interesting problems and keep my abilities sharp. This week I found the Early Stage Diabetes Risk Prediction. This comes from a paper on early prediction of diabetes risk. The abstract goes over the methods used, indicating Random Forest as doing particularly well. Since I don’t have access to the actual paper, I wanted to try out using R to try this model out.