Data Analysis #4: Pushing past a simple KD using Statshark data

Just a short(ish) one this time… please see Data Analysis #3 for a more detailed conversation, that goes on and on in the comments, on what Statshark has meant for players getting a better sense of how this game works than they had before.

This is just about the fact that K/D (Kills over Deaths) is actually a pretty simple and unrevealing stat. While it’s simple to calculate, and Statshark is allowing us to see our own K/Ds and those of vehicles general better than before, you can’t tell by just looking at the one number, how much of the value of it is in the numerator (kills) or denominator (deaths). Is a vehicle tanky, but not able to kill much (low denominator), or a “glass cannon” that kills a lot and dies a lot? (high denominator).

This is a question that has been asked before about real combat. The field of operational research is all about simplifying all the conditions of individual battles in order to ask larger strategic questions. A key value for operational researchers is kills per unit of time ( normally expressed as α for the red force, β for the blue force). Can we get a similar value from Statshark values and what would it tell us?

For this purpose we’ll just look at a small set of vehicles currently the focus of some discussion, the battleships introduced in the Leviathans update. We take the Statshark data from the last six days of June, and add a couple new derived values:

Spoiler

So the first five columns are just straight Statshark, for RB above and AB below. Number of spawns, naval kills (NK), air kills (AK), total kills (TK) and deaths (D). I’ve thrown in the two new premiums, Gneisenau and Sevastopol, even though they’re not “top tier”… it doesn’t really work to put in other ships as BR decompression on June 25 will likely have significant effects on this data, so we can really just look at the new ships.

The next three columns are what’s of interest, and are color-coded for ease of reference (blue is good, red is bad). The first standard K/D is as shown on Statshark. You can then decompose this, however, into the next two columns… first measuring the chance a vehicle has of surviving a spawn (1-deaths/spawns), giving you their survivability/tankiness value, and then, multiplying that to give how many kills they would have gotten on average if they didn’t die in that spawn (regular TK divided by survivability). A low number here means they wouldn’t have killed very much if they’d just been left alone the whole game, basically.

What this shows in the case of Battleships is that Sovetsky Soyuz is both a defensive and an offensive problem. It has both high survivability and high destruction, which is why it’s dominant just now. This means fixing it will likely require both offensive and defensive nerfing.

Here’s a scatterplot of the same data.

Spoiler

A couple other conclusions one could draw from this:

  • The Yamato’s problem currently is basically survivability; in AB and RB it’s killing when left alone just fine.

  • The Vanguard is surprisingly tanky in early Leviathans play, which is elevating its stats above Bismarck or Richelieu.

  • The Roma is problematic both in terms of survivability and damage-dealing. Even if you left it alone it wouldn’t kill very much currently.

  • Also interesting is that AB’s overall curve is steeper than RB. In operational research this would be reflect the greater “intensity” of AB battleship play over RB, in that more things are dying faster.

Of course these are all battleships, so they’re all relatively tanky and clustered on the right side of the scatterplot. PT boats and the like with low survivability could be clustered on the other side.

It’s important to not this is very much dependent on the vehicles the vehicle you’re measuring is currently facing. A BR shift up will likely move any vehicle both to the right and up somewhat on the scatterplot. As such this could suggest a way to look at a vehicle’s poor performance and determine if it could be solved with a nerf/buff (producting movement on the scatterplot in only one axis, greater or lesser survivability or damage) or you need a BR change (which would move you diagonally up or down).

The value of kills per spawn if left alone, the last column in the chart, equates to kills in a fixed unit of time, which is a key value for operational research calculations (α or β). Lanchester’s Square Law, for instance, states that in modern warfare, the comparison of two values here in terms of who will win a fight is as to the square. So if the square law held for War Thunder, in an even fight a Soyuz (11^2 or 121) would have a 50-50 chance against two Iowas (8^2 is 64, times 2 = 128) in RB (or 2v4, 3v6, etc.). This doesn’t mean that’s a true statement for War Thunder, but what you’d do for real wartime operational research is use real-world performance to “tune that in” to come up with an exponent less than 2 that gave good predictions… in some research they raise α/β to an exponent of 1.5, or even 1 to get good predictability out of Lanchester equations.

The next step past Lanchester is so-called “salvo” combat models, which incorporate a value for defensive toughness, and is seen as more appropriate for punctuated battles with distinct you-go/I-go phasing, where as Lanchester is more for warfare with “continuous” damage. The survivability factor here could give us a method to allow us to start introducing salvo-based analysis to War Thunder as well. But that gets us deep into the weeds. It’s not so much that operational research could help us understand this game better, really (that would take a much deeper dive to confirm) but Statshark data and a little math does allow us to know some of the variables it would be looking for. Which is kinda cool, I think anyway.

Also of note, Statshark has allowed the addition of custom columns for your personal records (not collective records yet). So if you ever wanted to determine what vehicles are your tanks, and what are your glass cannons, consider adding a deaths/spawn value there yourself. For me it produced some interesting, if perhaps unsurprising conclusions (my strike aircraft and coastal small boats die, A LOT… but on the other hand I’ve clearly been doing something right with keeping my StuG III G alive, which has not been a go-to TD for me before this, so there you go).

Previously: Data Analysis #3: The arrival of Statshark answers some old questions

17 Likes

Huh. Neat.

This work exceeds all expectations. Outstanding!

Lots of conclusions to unpack from this.

1 Like

Thanks for this. Sad to see Yamato in such a state.

I’ve always took K/Ds with a grain of salt in ground battles. For example, your first spawn is typically the vehicle you have the best chance to get in the right positions in battle, have the most support from a team that hasn’t one death left on you yet. While you may have other vehicles in your line-up you only bring out when everything has fallen apart. You might get some good kills picking off spawn campers with these, or you’re just as likely to just be nuked right away with not much to show for it. If you cared more for this vehicle and brought it as one of your earlier spawns your K/D ratio would be much different.

Yep, to get really useful data you’d need kills per unit of actual time, K/D based or K-per-spawn calculations are always going to be subject to “first spawn vs last spawn” bias. As you say it’s inherent in the data… same as late-lineup vehicles (like SPAA tends to be) will tend to be higher in terms of win-loss too. Only a single-spawn mode like Air RB gets you away from that.

That said, it’s not too different statistically from just really liking a vehicle and getting good with it (which also tends to playing it first) improving your K/D. And you like vehicles and play them because they’re good, so there tends to be positive-negative feedback loops in all of this, even Air RB.

2 Likes

Yeah I never liked KD as a stat in this game for 2 big reasons:

  1. The stock grind, especially on higher tier vehicles, leads to a lot of difficult games that ruin a vehicles stats before i have a chance to use it at is full potential

  2. Its a very old game, my account is 9 years old and I started playing when I was a kid. I was not very good at the game back then and my stats are damaged by it forever

I try to play the game as casually as I can now so I’m not very concerned about stats anyway. It makes the game a lot more enjoyable when I don’t worry about going positive every game

3 Likes

Roma can be easily fixed, just add the 250mm of foamed cement she had irl, decrease the rof to 29,7s aced (as demonstrated in the various bug reports) and bring dispersion on par with bismarck (also realistic), so basically all Gaijin has to do is just to implement it correctly.

1 Like

The reduction on K/D alone as a measurement is terribly flawed, so well done in adding more factors to it.

( I am from now on referring to Air RB, as that is the mode I am most familiar with and also lacking any respawns or lineups, since you play a single vehicle, the mode that focusses on a single vehicle the most.)

Another example of how flawed any K/D ratio is as a means to judge a plane, is that it does not take into account your CONTRIBUTION to the overall success of your team:
I have had several air battles, in which I ended up working together with a teammate, in a way that I was playing a turn fighter, luring an enemy into a dogfight, and my spontaneous partner zoomed in and killed the distracted enemy. That is good teamwork - sadly mechanics give no reward to the “bait”.

Attackers who decisively reduce the tickets so the desperate enemy can be killed more easily are another example.

Or bombers in cases where there are bomber wins possible ( by killing the airfield).

So how about considering the impact the vehicle has based on the OUTCOME of the match? Which with teams of very varying quality opens a totally other can of worms, I am aware… but shouldn’t that statistically level out?

Then there are uptiers and downtiers. From my experience there are planes that work pretty well in all matches, regardless of uptier or downtier, then there are others that work well in equal or downtiers, but suffer terribly in uptiers, and some people ( like @Real_K_Soze ) describe their favourite plane as performing BETTER in uptiers than in downtiers.

At some point, I think all those factors should be considered… you like some more data digging, @Bruce_R1 ?

Regards,
– E.

3 Likes

Same. One of the cool things about Statshark for me is I can look at just my stats since January (or any interval since).

Yes, this is the conclusion I would draw is that for Roma both offensive AND defensive buffs are required (and vice versa for S. Soyuz)… Just one or the other won’t do it.

The other alternative is a BR change, moving Roma up and right diagonally relative to other BBs on the scatterplot due to reduced competition. Similarly you could fix S.Soyuz by moving it in its current state to a higher BR, but for a top tier that creates new problems.

1 Like

Nerf Soyz level of analytics


btw. previous patch was best ballanced one with barbets fires oneshots and accuracy buff.



Will you ask to nerf Marlboro, Pittsburg, Duke and Baltiomore?

Statshark can filter based on a timeframe (monthly stats) so that part can be avoided, not sure about filtering out games when the vic wasnt spaded.

No filter (yet) but there’s a nice little icon now for your spaded ones on the list, very useful.

Would be nice if they added that, people wouldnt really have too much to complain about looking at monthly spaded stats.

Totally. It is a game after all! Your K/D ration says precisely nothing about you or your abilities!

Well theres monthly rating for players performance overall.

Also IIRC i saw someone filter out vehicles one player played recently and only show those, i believe it was through “sessions”.

Kinda jumping through the hoops to get that info but hey, at least its something.

So the usual troll stopped by to kvetch, I see. I’d never bother to respond, but there’s two interesting questions in there:

  1. Can we prove the Sovetsky Soyuz is unbalanced; and
  2. Is the Sovetsky Soyuz the most unbalanced ship?

Turns out, that’s exactly the kind of question operational research math, which is all about determining who’s going to win fights in advance, actually can help! So let’s answer them.

  1. Can we prove the Sovetsky Soyuz is unbalanced? Lanchester’s Square Law actually applies here, stating that in terms of that lethality value, our modified K/D, if you compare force A with a lethality of α, and force B with a lethality of β, the relationship of force sizes for force B to match force A can be found with Bβ^n = Aα^n, where n is a value between 1 and 2.

So in the case of a force of Sovetsky Soyuz vs Iowa, in RB Soyuz has an α of 11.17, and Iowa has a β of 7.93, the lower bound of force comparison (if n=1, so it’s just linear) is a Soyuz is equivalent to 11.17/7.93 Iowas, or 1.4 Iowas. If n =2, the upper bound, a Soyuz is equal to 11.17^2/7.93^2 or 2.0 Iowas.

So if it takes 1.4 to 2 Iowas to make an even fight with a Soyuz statistically the Iowa is overmatched, and probably shouldn’t be at the same BR. For the other ship at the BR, the numbers are worse, with a Soyuz equal to between 2.2 and 4.9 Yamatos. I don’t think anyone can say they should be at the same BR with that kind of ratio.

  1. So is it the worst-balanced? Well we can put the vessels that were suggested in, plus let’s throw in the Scharnhorst. Remember though, these are values for balance against the vessel’s current competition, not these ships against each other. And because a lot of BRs changed in June, we have to use the May data here for existing ships, pending an apples-to-apples comparison when we have a full month of data at the same BR for everyone. For that you get the resulting scatter plot, which shows that some of the ships mentioned have higher lethality values than Soyuz in RB, so yes, it’s fair to say they’re pretty unbalanced right now too, and one can assume if this were to continue some BR changes would have been in order for them as well. (And in fact this is what happened in June, with Baltimore and Pittsburgh going up two BR steps in decompression and Iron Duke and Marlborough going up one… we’ll see what that does in the July data).
Spoiler

  1. Bonus round! The same scatterplot can also be used to track a vehicle’s progression through time, as game rules and balance changes, as far back as February on Statshark. So, to take Scharnhorst again, comparing Feb to June you get:
Spoiler

Scharnhorst, which started where Soyuz is today in Feb, is now down to a much lower status in terms of relative supremacy. Using this method to chart a vehicle over time can give an idea when major changes occurred, either buffs or nerfs, and whether they affected predominantly offense or defense as well, more than just a straight listing of K/D could. In this case, not knowing anything about Scharnhorst, you can still say the biggest changes were on the survivability side, and they came between March and April (so, the Hornet’s Sting update in March would be most of what changed Scharnhorst’s fortunes here, or thereabouts). It also went up a BR step in late June, too early to be well-reflected here, but we’ll see what happens to its fortunes in July.

5 Likes

For me, I won’t judge you by your K/D, I’ll judge you by how you contribute to the team in getting the victory. I’d rather someone with a poor K/D that actually cares about the team and trying to cap points and get the win over someone who just goes and sits at a predefined point of the map and farms kills all game without a care for how many caps the team has. In fact, the ones who have the most to gain with a focus on K/D are one death leavers.

However, I didn’t want to derail this topic, Bruce has shown there is some value with K/D for demonstrating glaring issues with imbalances between vehicles (or in the case of the OP, showing the suffering of our beloved Yamato).

If only we actually had a naval mode where you often didn’t spawn within range and within view of the entire enemy fleet, where survival is simply a case of not being the one targeted by said enemy fleet. That’s when I’ll come relive my Navyfield glory days. Right now, Gaijin is forcing 18th century-esque line battles on ships that were simply not designed for it.

Statistics dont lie.

1 Like

People do?

btw. do you ever heard about tendentious research or data gathering?

ps. Wrong conclusions is a thing too.

Indeed.

But nothing in the OPs post suggest to me this is the case. Data alligns with what ive heard about current state of naval and especially sovetskyj sojuz.

2 Likes