Elemental Data Science A blog on Data Science and Statistics.
Posts with the tag R:

Simmulating the effects of luck in highly competivie events

To Be Finished for (i in 1:length(luckfactor)) { for (j in 1:length(selected)) { for (k in 1:R) { simulationresults <- tibble(Skill = runif(n = N, min = minscore, max = maxscore), Luck = runif(n = N, min = minscore, max = maxscore)) %>% mutate(Score = Skill * (1 - luckfactor[i]) + Luck * luckfactor[i]) luckyFew <- simulationresults %>% mutate(SkillRank = N - rank(Skill) + 1, ScoreRank = N - rank(Score) + 1, SkillSelected = SkillRank <= selected[j] * N) %>% arrange(ScoreRank) %>% head(selected[j] * N) results[[i]][[j]][k,] <- c(mean(luckyFew$Skill), mean(luckyFew$Luck), sum(luckyFew$SkillSelected), luckfactor[i], selected[j]) } results[[i]][[j]] <- as.data.frame(results[[i]][[j]]) } results[[i]] <- bind_rows(results[[i]]) } results <- bind_rows(results) colnames(results) <- c("Skill", "Luck", "SkillSelected", "LuckFactor", "ProportionSelected") results$LuckFactor <- as.

Multivariate Statistic Process Control method comparison Part 2: Generating control limits and initial look

Earlier this year, I posted the first part of a write-up for a project I worked on during a project I worked on last year. As a reminder, my project involved taking 4 different types of correlation, test the baseline and 8 different data shifts, generating 1000 simulations with 100 points of data each. I then compared how three different multivariate control charts, Hotelling T^2, MEWMA, and MCUSUM, were in detecting a shift while holding the false rate fixed over all simulations. For this I use the pretty standard for in control average run length (ARL) of 300. This means, on average when the process is in control, you can expect to see an out of control data point in any 300 sequential points.

Multivariate Statistic Process Control method comparison Part 1: Simulating correlated data and optimizing data generation.

Time to mix it up a bit from my usual game probability related simulation. This week I want to talk about a project I worked on last year in doing simulations for Statistical Process Control. For a brief primer, Statistical Process Control or SPC is a method used for monitoring something like a manufacturing process. It’s a form of time series analysis where you monitor results of some statistic taken from observing the process and then compare that statistic to a set of limits on a control chart to determine whether or not the distribution of that statistic has likely changed.

The Effect of the Advantage/Disadvantage in Dungeons and Dragons 5E on Success, Failure and Damage

So, after my last Dungeons and Dragons post, I got asked about the Advantage/Disadvantage mechanic in Dungeons and Dragons Fifth Edition, so let’s look into that a bit. In D&D, to succeed at a task, you roll a 20-sided die(d20) and with some math and checking with your Dungeon Master (DM), you determine if you succeed or not. In previous editions you would add your character’s modifier, as well as any additional bonuses or penalties due to the specific scenario. Fifth edition got rid of that except for one specific case, and instead replaced it with Advantage/Disadvantage. Simply put, if your character is in some sort of advantageous situation when attempting a task, you get Advantage.

Using R and Random Forest to predict Diabetes Risk

Time for something a bit different from my previous posts, but hopefully more common in the future. I was looking through the UCI Machine Learning Repository for a couple of data sets I could use for some simple machine learning problems to try interesting problems and keep my abilities sharp. This week I found the Early Stage Diabetes Risk Prediction. This comes from a paper on early prediction of diabetes risk. The abstract goes over the methods used, indicating Random Forest as doing particularly well. Since I don’t have access to the actual paper, I wanted to try out using R to try this model out.

The likelihood of getting or beating an array of stats in D&D Part 2

So, after sharing my post from last week about probabilities and stat arrays. Now, the way I determine which array is better is clearly overly arbitrary. Under that system the standard array is better than an array of (7, 18, 18, 18, 18, 18), when in reality any player looking for maximal stats would take that over the standard array (8, 10, 12, 13, 14, 15). So maybe there is a different way to compare these. Now the first approach would be to just take the ability modifier. Each integer from 3-18 maps to an integer from -4 to +4 that is used for almost everything in the game (except for carrying capacity, which uses the actual Strength score).