The Player Similarity algorithm compares a "target" player at a given age ("target season") to other players
who played the same position, on a season by season basis based on season age.
The algorithm begins by identifying the number of seasons for comparison. Players who played more than 15 seasons
(as of the target season) are compared for the preceding 12 seasons; more than 12 seasons, 9 comparison years;
more than 9 seasons, 7 comparison years; otherwise 5 comparison years or as many prior years as the target player
played by the target age.
Next, the algorithm identifies the comparison cohort of players who played the same position during the comparison years.
Comparison seasons are seasons at the same season age as the target player's seasons.
The rules for selecting the comparison cohort are:
For pitchers, the Starting Pitcher cohort is used for target players who started 80%
or more of their games during the comparison seasons. The Starting Pitcher cohort includes pitchers who started
at least 75% of their games during the comparison seasons.
The Relief Pitcher cohort is used for players who started 20% or fewer of their games during the
comparison seasons, and includes pitchers who started no more than 25% of their games in the comparison seasons.
Other pitchers are in the Mixed Pitcher cohort, which includes pitchers who started between 15% and 85% of their
games during the comparison seasons.
For non-pitchers, games played in Right Field and Left Field are combined into a position called Corner Outfield,
for both target players and potential comparison players.
Players who played 70% or more of their games at one position are assigned to the Primary Position cohort
for that position. This cohort includes players who played at least 60% of their games at that position during
the comparison seasons.
Other non-pitchers who played at least 40% of their games at one position are assigned to a Two Position cohort,
which includes players who played at least 25% of their games at the target player's primary position, and at least
20% of their games at the target player's secondary position.
Other non-pitchers are assigned to the Three Position cohort, which includes players who played at least 10% of
their games at all 3 of the target player's top three positions during the comparison years.
For all cohorts, players must have at least n-3 comparison years (where n is the number of comparison seasons
for the target player.)
For unusual position combinations (such as Ernie Banks after he moved to first base ),
the comparison cohort size may be too small to give a meaningful comparison - an error message is displayed.
The target player's statistics are compared with every player in the comparison cohort, matching the season age,
following these steps:
Short seasons (1981, 1994, 2020) are adjusted to a 162-game equivalent.
When there are at least 5 comparison seasons, single-season stats are converted to a three-year rolling total.
For example, to compare two players' age 28 seasons, the totals for their age 26-28 seasons are compared.
A recency factor is assigned, with the value 1.0 for the last comparison year, and increasing by 0.05
for every preceding year.
For each statistic (see below), the difference is calculated as:
((targetPlayer - comparisonPlayer) * recencyFactor)**2
For each statistic and across all comparison players, the 75th percentile difference is set as the
threshold - differences higher than this are assigned a score of zero.
For differences below the threshold, a difference score between 0 and 1 is assigned: 1 is an exact match,
0 is the 75th percentile.
Each statistic is multiplied by the weight shown in the table below, and the statistics are totaled to
give a score between 0 and 100.