Data Analysis #4: Pushing past a simple KD using Statshark data

May be you research some BB in the first place, than trust some biased internet mate conclusions, who don’t even have them.

Why don’t you use a different metric? Kills/spawns shows the effectiveness of the vehicle much better and is less susceptible to stat abuse.

Ah I see I forgot to put the “don’t feed the troll” sign on this post. My bad, sorry.

Anyone who has seen “Moneyball” (great movie) recognizes all these tired “arguments,” applied there against Jonah Hill and Brad Pitt’s character’s faith in sabermetrics in baseball. It’s all just bluster and vibes-based thinking

Now, one could talk here about Richard “nature will not be fooled” Feynman and space shuttles, or John Paul Vann and Vietnam, or Trent Telenko using very similar operational research math in 2022 to predict Russia would still be in a damaging war today, years after the vibes-based “analysts” said it would end. “He’s just a logistician, what does he know about artillery? He doesn’t understand the true power of our wishful thinking.” They all got to hear some version of that, but in the end, the math always tends to win. I COULD talk about all those vibes-based analytical fails.

But I’d rather shout out Statshark here (and do give them some scratch on Patreon etc if you can, they’re doing us all a great service)… A year ago an argument like this would just go on forever, vibes vs vibes because we had no data, or flawed data. For this last six months, thanks to Statshark we’ve had a way to win these arguments fast and move on to something else, and the people who don’t like the result just get to look worse and worse, sillier and sillier, with every vibes based post they still try to put up in opposition. I’ve just tried to bring in some analytical techniques I use for real-life jobs to see if they apply here and what they could tell us… The real credit for this brief golden period of anti-troll, anti-vibes thinking on this forum and elsewhere goes to Statshark. Thanks Statshark.

When you bring this mr. Logistician?

Or maybe bring some proofs that NRB got more RP gains than NAB as you said.

KpS is better than KD, agreed. But it also can’t tease out the difference between a high numerator (kills) and a low denominator (spawns) without reference to that other data.

To get closest to an op research lethality value, with the data we have, you need to do what I did here, breaking out the KD into an x and a y, offense and defense. In the first case, for the defense value we use the derived offensive value of deaths/spawn as our survivability. The second value is just kills/deaths divided by deaths/spawn, really, which gives a kills/spawn based value because the “deaths” value cancels out, but with the effect of higher KpS due to living longer also cancelled out to give something closer to a lethality value, kills per unit of time (in this case, using a “game” as the time unit because that’s the best we can do) than either KpS or KD provides.

Im not interested in naval beyond looking at models. I tried it, i didnt like the gameplay loop and decided its not for me.

Friends gave me their opinion. I trust them and know what there biases are.

People on forum talked extensively about sovetskyj sojuz.

What general playerbase thinks about sovetskyj sojuz and what my friends think alligned for the most part.

Clearly there is some consensus about sovetskyj sojuz capabilities.

Now we have hard data which alligns with said consensus.

So far I have zero reason to believe that OP misinterprets data, or that he lied about something.

You on the other hand provided zero evidence or explanation to show that OP is lying or misinterpreting data beyond “hes theorycrafting” and “i think hes wrong”.

1 Like

The good thing about this is the data is all right there to check, no barriers. I encourage people to go visit (and support, if you are as grateful as I am) Statshark, do their own analysis on its contents. I’m just offering a couple methods people could try themselves… or maybe come up with better ones. It’s a gold mine that as amateur analysts we’ve really never had before this year.

But that would require discipline and objectivity and effort. So much easier to complain, or say “why don’t you do THIS other analysis?” to which I would say, “why don’t YOU?” But the trolls kick my threads reliably up to the top of the forum a couple times every day though, so I hope it continues anyway. :)

2 Likes

I agree, its amazing tool.

I personally dont have time nor energy for any deeper analysis, which is why i appreciate this series of yours.

Keep up the good work!

1 Like

Sure thing lets nerf Marlboro, Pittsburg, Duke and Baltimore cause Statshark told us so.

And don’t forget Kerch cause in NAB it’s nuts

And don’t leave Hood and Rodney with Leningrad and Marat cause they pwn.

But no one said to go on exclusively off statshark.

Again theres consensus about SS performance and statshark data supports it.

What are you presenting is just data without said consensus.

1 Like

So lets listen forum whiners or what?

I know what they gonna say lets nerf all Russians vehicles they are OP and devs are biased cause they hiding in Hungary

Define “forum whiners”, because some of the “forum whiners” actually made an impact bug reporting SLM during dev server so that it released at least somehow correct.

That kind of impact?

Impressive bug collection.

While i admit i didnt have time to read through all of it - how does that relate to the original issue of SS performance and OP misinterpreting statshark data or lying?

1 Like

As I’ve mentioned in the past a few times, it’s not even good data. It’s comparing a bunch of ships that shifted their BRs, in some cases radically, on the 25th of the month, against ships like the Soyuz, that have only been at the one BR for the last six days of the month. It’s exactly the kind of bad apples-to-oranges comparison that tells you nothing and one should never put their name to :)

The inclusion of the Leningrad highlights something else you can do here though… note it’s got a high K/D but a nearly as high K/S. Any time those two numbers are close, that means that vehicle is NOT very survivable (because it’s number of spawns that end in a death is approaching 100%). This can be a quick way to eyeball a vehicle in Statshark and tell if it’s “tanky” without having to go through the math I did. Turns out the Yamato actually has a comparable survivability, equating to TTL (time to live) in game terms, at the moment (0.34, RB) to the Leningrad (0.31). One could understand people being a little annoyed with the famous Yamato, that took 11 torpedoes and 6 bombs to sink, having a comparable TTL to a 4.3 destroyer, I think.

3 Likes

Ah, here is that issue of english not being my first language. I should have said context instead of consensus.

This is what i tried to (and failed to) point out - what kweedko posted is just statshark data which is prone to be misinterpreted without any context - which is something kweedko is doing - and statshark alone cannot provide said context.

1 Like

I’m doing, not the TS doing? Yeah sure.

yes.

your post is just bunch of data without context, as Bruce_R1 explained:

we will likely see some radical shifts in effectivness of said ships during next month.

As for Soyz, which TS demand to nerf in first place.

I mean technically seen K/D is the only key figure which matters in a shooter - especially if the individual game impact is reduced to 5-25% of your matches as the MM tries its best to kill skill (applicable experience) advantages.

This looks quite reasonable if there would be a way to measure the amount of sole survivor awards (playing 1 vs 1 as last players alive). But thanks to the MM this is rather rare.

But considering the outcome of the match is even more flawed than looking at K/D ratios. As written above: In Air RB the WR is highly influenced by the MM. Therefore WR (=Outcome) is even less valid than K/D vs players.

Even if you would consider mission score - the guy killing 6 ai planes gets more score than than the guy killing 18 aaa/arty pieces - whilst the 2nd guy might have created a ticket win.

This is from my pov an important factor - but planes which are perfectly balanced are rather rare - mainly as if they are popular they tend to be under-BRd (like P-39 N-0 or the Yak-3) and dominate full downtiers and need skilled opponents in full uptiers to tame them.

And aircraft which perform better in uptiers than in downtiers have usually superior turn performance at their own BR. So whilst planes at higher BRs tend to get heavier and less nimble you see the opposite effect in downtiers as your turn advantage gets strongly reduced or vanishes completely.

Best example (from my pov) is the B7A2, followed by my new favorite the Swiss C-3604. They perform for me way better in full uptiers thanks to good (B7A2) and insane turn (C-3604) performance than in full downtiers.

Imho the efforts of the OP makes sense in niche modes like Naval or Simulator modes - but not for modes played by masses like Air RB of Ground RB.

The main issue is that by looking at average statistical values the confidence level of your data is severely affected. So the more clueless guys use a certain vehicle the more the data hide the real combat power of a vehicle flown by a veteran.

Imho the BR setting policy of gaijin should consider the performance of the top 5-10% of players using a specific vehicle only - if you want to get better you look always at the top.

Have a good one!