Proper Mathemtical model for BR and MatchMaking

goteguru · August 30, 2024, 6:18pm

I’ve read several opinions about MatchMaker, most of which are complaints. While I’m nearly certain that creating a matchmaker that satisfies everyone’s needs is impossible, it’s still interesting to consider how it could (or should) be designed and optimized purely from a mathematical standpoint. (We can safely assume there’s a vast amount of input data to work with so we can use statistics.)

The current implementation is discussed in (more or less) details under this forum post.
The exact statistical method is not know though. (Do you have more information, maybe?)

Generally if we want to solve some parameter optimization problem, first we need some target function, ie.: for what to optimize. According to the above post, we actually want to achieve two different goals using estimates BR values:

balance sides in battles
somehow “balance” personal efficiency (based on research points and silver lions amounts)

There is an inherent problem. Single parameter might not be adequate to describe two different goals (it can, but accuracy and certainty will suffer considerably).

The first goal is clear, we want to approach to 50% win rate for each side. However there is a problem here, unless BR is weighted by player skill (ELO system or alike) this goal is hard to achieve. Skill definitely modify the result and even may have stronger effect on the outcome than BR. Therefore if we factor out BR from the results it will be inaccurate because our input is loaded with unnecessary noise (randomness in skill level). I think incorporating skill would result in much better BR-s (for this goal).

The target function of the second goal is pretty much fuzzy and ill defined in the post:

it will provide maximum equal personal efficiency (based on research points and silver lions amounts) in battles (in average)

This doesn’t seem to be a good target, because it allows some strange (unfair) solutions where one guy gets extreme high income while all the others get nothing (the resulting average is still high enough). The average is not a good fairness validator function (eg. median would be much better). I’m not sure what was the intent in the original post (please share if you do know. It’s probably to achieve some kind of “efficiency fairness”) so let’s see what are the alternatives.

Target: BR reflects capability of the vehicle, ie.: how well can it be played (and not necessarily is played).
Effect: this method favors skill. If you know your vehicle well you earn a lot, if you don’t know how to use it properly you will suffer. BR reflects the maximum potential, not the average usage.
Solution: Best way to filter data. Exclude casual users results entirely and use advanced player’s performance only. It can be done using “clan matches” only (downside: very small subset of data) or evaluate top n players’ result and discard the rest of the scoreboard. If we have enough data, it is even possible to factor out effects of different aspects of the vehicles (eg. effect of a specific gun, or speed) with multivariate statistical analysis. This could help estimate BR of new vehicles.
Your alternative solution?

Target: BR reflects the vehicle usage efficiency, ie.: how effectively it is played by the community. (probably the current implementation).
Effect: Battle rating represents the capabilities of the vehicle in the hand of an “average player”. This method will introduce some skewness in favor of popular vehicles and might not be stable (it’s hard to make it stable). It’s easy to see why: we can safely suppose usage (and popularity) depends on BR, so if we calculate BR from usage we get a dynamic (recursive) system what is notoriously unstable and sensitive to initial conditions (the system may have an attractor or even a fixed point attractor, but also might behave completely chaotic). It seems the current implementation falls into this category.
Solution: I’d suggest unknown function estimation with regression. The preferred Silver Lion (SL) income ratio (preferred by Gaijin at least) is already implemented as income multipliers per vehicle, therefore we can safely suppose we should optimize for equal silver lion income per Vehicle. First, we should calculate the average SL income: compute average SL income for each vehicle, then average the averages. This is our target value. Supposedly if we increase the BR of some vehicle we’ll overshoot, if we decrease it, undershoot. Unfortunately we do not know the exact mapping function(s) Mv(BR)->SL which maps battle ratings of vehicle v to silver lions (income), so we must estimate it. But how can it be done? Fortunately we have plenty of data! The matchmaker constantly insert vehicles uptier and downtier (Higher or Lower BR than the vehicle BR) so we simply consider the effective BR of the vehicle as the tier of the match. We estimate the function Mv with linear (or polynomial) regression: eg. using the simple income model Mv(b)=β0 + β1 * b + ϵ, where b is the battle rating and ϵ is the error term. Now we can use our Mv models for each v to identify the correct BR to get the average SL income. (It’s worth to note that, the method would work much better if the uptier/downtier spread would be larger (currently we have 7 data points) and using polynomial functions probably worthless).

Target: Balanced battles, BR reflect true capabilities.
Effect: Handicapped games, veterans will have harder time. This can be compensated by Silver Lion multiplier for being veteran or high bonuses for “rank doesn’t matter” events.
Solution: Use the first method to calculate BR (it’s stable) and modify the matchmaker to consider skill. Do not force equal number of players. Even a simple multiplication might do the trick: weighting BR between eg. 0.3 (absolute noob) and 1 (veteran). BR still can be capped. This way weaker players will be handicapped by BR or number. Eg. matchmaker can run 10 unskilled players against 5 skilled players everyone having the same BR.

What is your preferred target / solution?
[your preferred target and solution here…]

Please don’t post “personal experiences”. Personal experience inevitably contains huge amount of subjective bias. We should trust statistical data analysis and math and only that. At least in this topic.

Disclaimer: sorry for possibly inaccurate mathematical formulations, I’m computer scientist / researcher but not a mathematician. :) Comments are welcome.

tripod2008 · August 30, 2024, 8:42pm

TL;DR: This is not a Simple or Easy undertaking, as the existing system does; to some degree accomplish the tasks it set out to achieve (Player Retention / Conversion? rates). War Thunder wouldn’t be as popular as it is / was, not that there aren’t blatant issues and areas where it could be improved on but the balance it has struck obviously has had some success; not that I don’t think Gaijin hasn’t run it into the ground in some ways, in some areas on occasion.

The first thing I think that would be needed before attempting to produce an improvement to the system, would be to come to an agreement on where issues lie and what the replacement system intends to accomplish and how it would go about doing so.

One of the most important things to realize I think is that a well functioning Matchmaker should have no intent of being at all impartial; avoid approximating a fair contest of skill ( e.g. as a heuristic attempt at defining the issue of what constitutes fair in a game like War Thunder, a multiplayer game attempting to reduce the influence of individual skill; assume that any given player’s global win rate should be reflected by their ELO).

This should be so, at least to some degree in order to account for the proportion of lower skilled players relative to the playing population (since with a low player count, the population would eat its tail relatively quickly should even a fraction of the poorly performing players reduce the pool and so higher skilled players fall below whatever critical waterline, over time thus causing a host of poor outcomes that implicate themselves in broad issues well beyond just the matchmaker).

Further solely relying on the Matchmaker to account for implicit issues (e.g. Map design & knowledge, Meta lineups, gameplay concessions etc.) in other areas works to a point, but can to some degree be gamed or otherwise worked around with effort and time and so is not a wholesale solution to the presented problem, it should be but one element of a broader solution.

A "Minor" tangent

To Illustrate my point take the various changes to Airspawns that have occurred over time; it was originally provided at a set height (per map) for Vehicles of specific Airframe Classes (e.g. Bomber & Attackers) that were deemed to not have sufficient Rates of Climb to accomplish their objectives (Bomb Map Targets / opposing Airfields, Destroy AI targets) in order to reduce the encounter rates of Intercepting Fighters, since spawns & objective placements were fairly basic as a concession to map design & engine performance.

It was then provided to underperforming Fighters and Interceptors (and heights adjusted, and caps on bomber counts per match) in order to increase the rate of Interceptions since bombers were ending games early, and at a significant rate. This dealt with the aforementioned issue, but then a group of enterprising individuals figured out that the relative energy advantage conferred provides significant advantages since it could be used offensively to disrupt climbing fighters that had lower energy, since routing was easily predicted (since spawns, and objectives were Static and marked on the map), and spotting fairly forgiving. This of course remains a contributing factor to why AAB & ARB is practically a fighter (and select Fast Bomber) dominated TDM these days and incredibly stale, with little changing in the established meta for years at this point; at BRs before Missiles.

The Solution to the above comes in Two (Three) parts;

Introduce a universal Dynamic Altitude Airspawn that is set on per Vehicle’ basis on the grounds of a Optimal Time to Altitude metric specified by assigned Role & rank (e.g. For Rank IV; Bombers are set at 6km, Interceptors 5.5km, Attackers to 4km, Fighters to 3km etc.).
Diversify Spawn locations in some way; be that, locations for Flight’s (2 Squads) are picked at random from an existing set of curated locations, Dynamically Spawn across a map edge / boundary (consider including ability to exit the match with 80% the repair cost, after 1/2 ~ 3/4th’s the time limit from the same zone, though this should not replace the existing Airfields.)
- Spread objective locations (Clustering and Overlapping defenses is fine) over a greater area of the Map; This helps adjust the rate of encounters and rewards planning and luck and rewards reactive route planning and refreshes and broadens the scope of potential intercepts (and composition) up since the direction of travel, quantity altitude and potential intercept geometry can be much more varied due to being less able to use pre-existing knowledge, though still rewarding planning.

This also brings up; in what way should Player Agency be accounted for in the sense that if felt impact on the round is low, that should produces a negative outcome, so things shouldn’t be too skewed; though Skewed maps ( & sidedness) and team composition (and count) are really the only inputs the Matchmaker has to leverage over in order to achieve the desired outcome. And Thus why Battle Rating (& changes to said system) and the Matchmaker; while ultimately influencing one another should not be mechanically tethered tightly (If at all), though should be considered in tandem where changes should be made.

And also If / How The existing state of the game should influence the planning of future content additions / expedites timelines, and if actual changes should be delayed to account / prepare for / be paired with, new additions; take for example the addition of Guided Air to Surface missiles (and how it permitted the addition of Surface to Air, and later Fire & Forget systems) which occurred in update 1.79 (Still waiting on the VB-x series of guided bombs to be added). or how the poor performance of the Chinese Tor-M1, prompted it to be skipped in favor of the Pantsir for the Russian Tree, a dramatic power hike in response to the 2S6’s degrading performance.

This is in some sense important since Content additions need to be planned about a year, to a Year and a half in advance for timelines to converge (not that it doesn’t cause significant Crunch sometimes considering how busy the Devs are).

goteguru · August 31, 2024, 11:16am

Thank you for your answer!

As far as I know there is no consensus (even the existing method/target is a black-box to the community). Moreover, consensus without deep understanding of possible consequences is questionable. This topic is about pure theoretic data crunching in order to (somewhat) explore possible methods and their effects. You define your favorite goal, try to estimate its effect and show us how to achieve it algorithmically (to see it is viable or not).

This can (/might) help to evaluate different solutions and eventually reach some kind of consensus.

I don’t think we shouldn’t try to fix everything in one run. This post is about BR balance, ie.: how can we find the best BR values using big-data analysis for a specific set goal. There are of course tremendous other aspects could be changed what is not considered here. Even BR values could be determined by eg. careful manual investigation and testing, but that’s also not considered here. Only pure statistical methods.

Yes, that’s correct, I agree! However I think if we keep the current goal (which is still a black box to me) no change is needed in MatchMaker and we could improve the BR evaluator separately. This is attempted in the second suggestion.

About Matchmaker: we shouldn’t underestimate the power of data science. If the input is big enough (and I think it is) I’m pretty sure we could achieve unbelievably tight level of balance (if we wanted to) by using proper methods. But I don’t think we should. For me some imbalance is absolutely OK. I’d pretty much like to fight against the odds sometimes, it’s fun. My goal here was to find a more precise BR calculator (which might be a naive heuristic one currently. Or not. Who knows?).

tripod2008 · August 31, 2024, 12:43pm

From a set of posts on the old forum, which I’m trying to track down

From what I remember the mechanics behind BR changes are discussed in a release covering why the proposal to lower one of the German M48s (was either the Super or M48A2GA2) was to 7.7 below the US’s basic M48 ( at 8.0) made sense. Even though in real terms it was far superior in all aspects. (after feedback it was reverted before the changes went live)

SL costs and efficiency / balancing were discussed in a sperate post about adjusting the Sim economy

We know that it least for the Matchmaker is attempting to produce a Global, 50% win rate (would have to assume that Global in this case refers to on a per player basis, since it doesn’t make sense otherwise). thus will skew matches to have this occur

and the rewards, multipliers and repair costs are set up around that to establish average net SL income per hour target (efficiency) which is what they compare since it already effectively accounts for most rewarded actions, K/D & W/L Ratios. Thus vehicles deviating from said targets (or being an outlier for its BR bracket) sufficiently are not balanced and so are moved to bring them back into line.

The issue with the above (seen historically with high tier bombers like the B-29 / Tu-4) is that since repair costs are effectively the only moderator and in most modes are paid only after dying, having a low deaths per battle rate distorts the metrics.

Conversely vehicles with smaller populations of active use (See German CL-13 to 9.x), are less impacted by the Law of Large numbers, and so will tend to have a larger variance in performance allowing the metrics to more easily be skewed by skilled / determined players

goteguru · August 31, 2024, 10:32pm

Oh, I’m interested. (if you happen to find it).

That makes sense indeed. It’s easy to check though: just ask top players if they loose about 50% of the time. If not, there is no such feature (or not working well) if yes, proven. In order to force top players to get beaten 50% of the time, they either must be heavily outtiered, or paired with large number of weak players. I don’t think it can be done without some kind of skill tracking, so if it is true, probably there is a hidden skill ranking system.

Yupp. Exactly what I liked to see mathematically formulated.

thefoxiscunning-psn · September 1, 2024, 2:42am

Great post! However…

The genesis of the problem is that the vacuum in which we play these vehicles are antiquated, poorly designed game modes lacking creativity and immersion.

They keep rebalancing vehicles relative to their performance (and earnings) within this vacuum, but the first step to solving the issue—particularly the one of how to balance vehicles of differs types—is reimagining the games modes themselves.

Until they do this they’re just rearranging deck chairs on a sinking ship.

goteguru · September 1, 2024, 11:38am

Yes, sure. Maybe. (however “poor” is a relative term, considering their user base, one can also argue it’s well designed :-) ). I always thought SB is immersive, but has steep learning curve.

Anyways, this topic is about pure data analysis, and I’m sure we can find a good stable algorithm for any kind of game-mode let it be bad or good.

Wiki says: " Battle rating is a number assigned to every aircraft, ground unit, and naval vessel in the game that correlates with their effectiveness in combat" It’s a mathematical term. Without any subjective opinion it can be objectively achieved. (correlation means explicit linear dependency, so we should shoot for that).