I’ve been doing a lot of in-game testing of spells lately, as a part of making spreadsheets and other projects. In particular, with the new beta, I’m more inclined to vet the info for any spell I look at by measuring in-game, rather than simply putting the coefficient from wod.wowhead into a spreadsheet, because:

- The designers are changing spells a lot, and tooltips are out of date much more often than on live.
- The passives, talents, and Draenor Perks aren’t all familiar, and you have make sure you know what all needs to be multiplied in between the coefficient in the data and the final damage amount.
- There are frequent bugs on beta, and actually testing means you can help catch/report them.

There are a lot of various techniques and tricks you get used to for doing this stuff quickly, but I wanted to dash off a quick post on one that both saves work and is mathematically interesting. It looks like Theck is starting a series on general concepts of theorycrafting, and while I don’t expect to do anything that elaborate, I do want to write down ideas that are familiar to me but might be helpful to people who are just getting into it.

# Background

The focus of this post is how to measure a spell damage value in-game, but I should give at least an outline of what to do with that information once you get it.

Take your measured value and divide by your spellpower or attackpower, and you have your coefficient (including any modifiers). If it matches the Wowhead data, or the Wowhead data plus the modifers you know about (20% from a Draenor Perk is common), then everything checks out and you’re done. Otherwise, you’re looking at a modifier you don’t know about, a bug, an incorrect tooltip, or something else.

2 brief points just in case they help people:

- Some passives don’t show up in the spellbook any longer (typically, ones that do nothing but give passive bonuses), but they usually show up in the “Specialization” tab on Wowhead (a submenu under “Spells”).
- Checking the tooltip in-game can help track down discrepancies (unless it turns out to be a case where it’s entirely wrong). Since the Wowhead data is itself from tooltips, if you know all the modifiers that should be included, they should always match. If the game tooltip is what you expect after taking bonuses into account, but your observed damage/heals are different, then the client tooltip data is wrong and you can’t rely on it. If the game tooltip differs what you’d expect based on the Wowhead tooltip, the game one factoring in a bonus that you don’t know about.

Weapon-based attacks are a little more complicated, but I won’t run through that all here.

# Spell Ranges

For spells with constant damage/healing (such as HoTs and DoTs), taking the in-game measurement is easy; you only have to look at one tick. Also, many non-DoT spells have constant damage/healing right now, since they no longer have base values like they used to (it’s just coefficient*spellpower, both constants). It looks like there’s a system for artificially re-inserting variance like there used to be, but it’s not done or not used everywhere yet.

But how do you measure the average damage/healing when it’s not constant? The average is what you want, because 1) when you need to compute a spellpower coefficient, that’s what you want to start from, and 2) when you’re actually making a model, you typically only care about average damage values. The instinct is to take a large data set with a lot of casts, and take the mean.

# An Abnormal Distribution

Why is that the instinct? Probably because it would be correct in nearly any real-world context. If you’re scientifically-trained, or have done statistics in basically any other context, it’s probably second nature to process a large set of measurements by taking their mean (and possibly standard deviation) and going from there. It may never even have occurred to you to do anything else.

However, the reason we do that rests on an assumption–one that’s so universal in the real world that it’s rarely worth thinking about: the assumption that measurements of uncertain phenomena are normally distributed. The “normal distribution” is a specific statistical distribution that earned that name because of its ubiquity. What makes the normal distribution special is the Central Limit Theorem, which states that when you combine a large number of identical probability distributions, a normal distribution results (I’ll defer to the wikipedia article for a less abbreviated summary of how that works).

WoW spells, however, do not follow a normal distribution. Nor are they (like many real-world measurements) an aggregate of microscopic phenomena that, regardless of the behavior of each individual molecule, aggregates to a normal distribution due to the Central Limit Theorem. They follow a predictable, known, artificial distribution: a perfectly uniform spread between the min and max values.

What this means is that when playing the role of the scientist, looking at data and trying to figure out what the underlying behavior is, we have a huge advantage. We know the exact form of the phenomenon we’re observing, and all we’re missing is two numbers: the min and the max. Get those, and we know the complete underlying distribution with full precision.

# The Upshot

Which brings to the thesis of this post. When you attack a target dummy 10 or even 100 times, there are only two trials you need to record: the min and the max observed values. The rest is quite irrelevant (which is logical: if you’ve seen a hit for 100 and a hit for 120, then a hit for 119 adds literally zero information to what you know about the spell*). In both cases, the mean value of the spell that you should put in your spreadsheet is 110.

It’s interesting to think about why this is different if you measuring something in a real-world scientific experiment. There, if you’d measured 100 and then 120, your best estimate of the true mean value would be 110. And then when you measured 119 on the next trial, your updated estimate would be the mean of 100, 120, and 119, which is 113. If those were the only three trials you did (bad procedure, I know), your best information going forward would be that the true value is 113.

So the difference is quite material. In the WoW setting, taking that extra step of averaging in the 119 data point is not only unnecessary work (possibly substantially so, depending on how much data you’re taking), but it’s also incorrect.

*Don’t fall into the trap of thinking that, maybe it at least allows a secondary inference of how close to the true min and max you’ve gotten so far. For example if you get 100, 120, and a whole bunch of 118-119’s, that’s not evidence that the 120 is less likely to be maximal. That’s your intuition for normal distributions talking.

# Conclusion

There are two potential audiences for this post. The first are people who already did this intuitively when measuring WoW spells. I was in this category for while before thinking about the statistic rationale. For them, I hope it was interesting to stop and focus on this odd method of taking data that feels like a shortcut (glancing through numbers for the min and max and ignoring the rest). Specifically, how it’s a product of the way that games generate random events and how it differs from the way that nature generally does The other are people who are making WoW spreadsheets and can hopefully save some time and effort with this explanation.

In any event, I’m glad all the talk about theorycraft on beta prompted me to write this down. I do want to be better about writing down things I think about when working on WoW projects, especially when the topic has a little math intrigue. Hopefully there will be more to come.

About “So the difference is quite material. In the WoW setting, taking that extra step of averaging in the 119 data point is not only unnecessary work (possibly substantially so, depending on how much data you’re taking), but it’s also incorrect.” :

I don’t think it is incorrect to take the average ; according to the Law of Large Numbers, the average of the observations converges towards the expected value (in this case, (a+b)/2). Also, the Central Limit Theorem states that the average will be distributed according to the normal distribution (for a large number of observations), no matter what the initial distribution is (uniform in our case), which allows to compute a confidence interval for the expected value.

Both math points you say are correct: the CLT means the average of a large of number of trials will be normally distributed, and its center will be at (a+b)/2. However, it doesn’t follow that at any point, the mean of all your trials so far is a better estimate than the (min+max)/2.

So you’re right that using the mean of all the trials will get you to the right place in the long run, but it’s “incorrect” in that you’re adding in noise that worsens your current estimate compared to simply using (min+max)/2.

His point is that, knowing you are dealing with a uniform distribution, you CAN just look at min and max. Is it definitive? Nope. Still, with a uniform distribution, it will get you very close. And with a large enough data set and small enough range, it will be good enough.

Pingback: TC101: Experimental Design | Sacred Duty

Pingback: Healing Theory, Part 9: First Look at Warlords Heal Numbers | It's Dangerous to Go Alone

Pingback: TC101: Testing Simulationcraft | Sacred Duty