Elemental Data Science A blog on Data Science and Statistics.

Simmulating the effects of luck in highly competivie events

To Be Finished for (i in 1:length(luckfactor)) { for (j in 1:length(selected)) { for (k in 1:R) { simulationresults <- tibble(Skill = runif(n = N, min = minscore, max = maxscore), Luck = runif(n = N, min = minscore, max = maxscore)) %>% mutate(Score = Skill * (1 - luckfactor[i]) + Luck * luckfactor[i]) luckyFew <- simulationresults %>% mutate(SkillRank = N - rank(Skill) + 1, ScoreRank = N - rank(Score) + 1, SkillSelected = SkillRank <= selected[j] * N) %>% arrange(ScoreRank) %>% head(selected[j] * N) results[[i]][[j]][k,] <- c(mean(luckyFew$Skill), mean(luckyFew$Luck), sum(luckyFew$SkillSelected), luckfactor[i], selected[j]) } results[[i]][[j]] <- as.data.frame(results[[i]][[j]]) } results[[i]] <- bind_rows(results[[i]]) } results <- bind_rows(results) colnames(results) <- c("Skill", "Luck", "SkillSelected", "LuckFactor", "ProportionSelected") results$LuckFactor <- as.

Multivariate Statistic Process Control method comparison Part 2: Generating control limits and initial look

Earlier this year, I posted the first part of a write-up for a project I worked on during a project I worked on last year. As a reminder, my project involved taking 4 different types of correlation, test the baseline and 8 different data shifts, generating 1000 simulations with 100 points of data each. I then compared how three different multivariate control charts, Hotelling T^2, MEWMA, and MCUSUM, were in detecting a shift while holding the false rate fixed over all simulations. For this I use the pretty standard for in control average run length (ARL) of 300. This means, on average when the process is in control, you can expect to see an out of control data point in any 300 sequential points.

Odds of winning constant probability games against one person

Probabilities and games. Two simple games used to illustrate certain features of calculating probabilities are as follows: You and a friend take turns flipping a fair coin. The first person to flip heads wins. If you go first, what are your chances of winning? You and a friend take turns rolling a fair die with 6 sides. The first person to roll a 6 wins. If you go first, what are your chances of winning. These end up being very similar problems, and can be generalized. For convenience, I’ll talk about the rolling a die version. To start we’ll walk through a couple winning scenarios to look for a pattern.

Using Python and Random Forest to predict Diabetes Risk.

Two weeks ago I wrote a post, Using R and Random Forest to predict Diabetes Risk. Since I am less experienced with using python in machine learning models, and this was a data set that worked out so nicely, I figured I would take an attempt at it. First we need to load all the modules and functions we need to use. import pandas as pd from matplotlib import pyplot as plt import numpy as np import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn import metrics The next thing is just like was done in R, load the data, clean it up a bit for using scikit-learn to create a classification model, and then split our factors from our classification variable.

Multivariate Statistic Process Control method comparison Part 1: Simulating correlated data and optimizing data generation.

Time to mix it up a bit from my usual game probability related simulation. This week I want to talk about a project I worked on last year in doing simulations for Statistical Process Control. For a brief primer, Statistical Process Control or SPC is a method used for monitoring something like a manufacturing process. It’s a form of time series analysis where you monitor results of some statistic taken from observing the process and then compare that statistic to a set of limits on a control chart to determine whether or not the distribution of that statistic has likely changed.

The Effect of the Advantage/Disadvantage in Dungeons and Dragons 5E on Success, Failure and Damage

So, after my last Dungeons and Dragons post, I got asked about the Advantage/Disadvantage mechanic in Dungeons and Dragons Fifth Edition, so let’s look into that a bit. In D&D, to succeed at a task, you roll a 20-sided die(d20) and with some math and checking with your Dungeon Master (DM), you determine if you succeed or not. In previous editions you would add your character’s modifier, as well as any additional bonuses or penalties due to the specific scenario. Fifth edition got rid of that except for one specific case, and instead replaced it with Advantage/Disadvantage. Simply put, if your character is in some sort of advantageous situation when attempting a task, you get Advantage.

Using R and Random Forest to predict Diabetes Risk

Time for something a bit different from my previous posts, but hopefully more common in the future. I was looking through the UCI Machine Learning Repository for a couple of data sets I could use for some simple machine learning problems to try interesting problems and keep my abilities sharp. This week I found the Early Stage Diabetes Risk Prediction. This comes from a paper on early prediction of diabetes risk. The abstract goes over the methods used, indicating Random Forest as doing particularly well. Since I don’t have access to the actual paper, I wanted to try out using R to try this model out.

The likelihood of getting or beating an array of stats in D&D Part 2

So, after sharing my post from last week about probabilities and stat arrays. Now, the way I determine which array is better is clearly overly arbitrary. Under that system the standard array is better than an array of (7, 18, 18, 18, 18, 18), when in reality any player looking for maximal stats would take that over the standard array (8, 10, 12, 13, 14, 15). So maybe there is a different way to compare these. Now the first approach would be to just take the ability modifier. Each integer from 3-18 maps to an integer from -4 to +4 that is used for almost everything in the game (except for carrying capacity, which uses the actual Strength score).

The likelihood of getting or beating an array of stats in D&D

I got involved in another conversation about Dungeons and Dragons and probabilities again. This time I saw something that I’ve seen come up in a variety of circumstances. So, let’s take this scenario. A Dungeon Master is running a game where the players roll for their stats, and one player has suspiciously high results. The question of course is, did the player cheat with their stats, either by rolling a different way, or did they just pick the stats they wanted. Given D&D is a cooperative game, this can be especially frustrating to the other players at the table, as this one player has a character that is well above average for doing just about everything.

Probabilities of Winning in the game Among Us

Back in August of this year, the game Among Us leapt in popularity. A social game where crewmates try to complete tasks while staying alive. At the same time, impostors try to kill off the innocent players while also sabotaging the crew. It’s a fun, interactive twist on in person social games such as Werewolf or Mafia. With the interactive gameplay, rounds end when a player reports a dead body or call an emergency meeting and the remaining players get a chance to discus and potentially vote out one player. This got me thinking, ignoring the various strategies for the game, and avoiding where impostors kill off innocent players, if players were just removed/voted off randomly, what would be the odds of a crewmate or an impostor victory, and with that, the expected win rate overall.