Data Science

Posts with the tag Data Science:

Simmulating the effects of luck in highly competivie events

2021-06-27 2 minutes Data Science R Simulation

To Be Finished for (i in 1:length(luckfactor)) { for (j in 1:length(selected)) { for (k in 1:R) { simulationresults <- tibble(Skill = runif(n = N, min = minscore, max = maxscore), Luck = runif(n = N, min = minscore, max = maxscore)) %>% mutate(Score = Skill * (1 - luckfactor[i]) + Luck * luckfactor[i]) luckyFew <- simulationresults %>% mutate(SkillRank = N - rank(Skill) + 1, ScoreRank = N - rank(Score) + 1, SkillSelected = SkillRank <= selected[j] * N) %>% arrange(ScoreRank) %>% head(selected[j] * N) results[[i]][[j]][k,] <- c(mean(luckyFew$Skill), mean(luckyFew$Luck), sum(luckyFew$SkillSelected), luckfactor[i], selected[j]) } results[[i]][[j]] <- as.data.frame(results[[i]][[j]]) } results[[i]] <- bind_rows(results[[i]]) } results <- bind_rows(results) colnames(results) <- c("Skill", "Luck", "SkillSelected", "LuckFactor", "ProportionSelected") results$LuckFactor <- as.

Using Python and Random Forest to predict Diabetes Risk.

2021-01-08 3 minutes Python Data Science Machine Learning Random Forest

Two weeks ago I wrote a post, Using R and Random Forest to predict Diabetes Risk. Since I am less experienced with using python in machine learning models, and this was a data set that worked out so nicely, I figured I would take an attempt at it. First we need to load all the modules and functions we need to use. import pandas as pd from matplotlib import pyplot as plt import numpy as np import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn import metrics The next thing is just like was done in R, load the data, clean it up a bit for using scikit-learn to create a classification model, and then split our factors from our classification variable.

Using R and Random Forest to predict Diabetes Risk

2020-12-24 3 minutes R Random Forest Machine Learning Classification Data Science

Time for something a bit different from my previous posts, but hopefully more common in the future. I was looking through the UCI Machine Learning Repository for a couple of data sets I could use for some simple machine learning problems to try interesting problems and keep my abilities sharp. This week I found the Early Stage Diabetes Risk Prediction. This comes from a paper on early prediction of diabetes risk. The abstract goes over the methods used, indicating Random Forest as doing particularly well. Since I don’t have access to the actual paper, I wanted to try out using R to try this model out.