SIMPLE Game Project - Terminology

Below is a list of important terms and definitions used in this project. This page is here to help clarify concepts and reduce confusion.

Key Terms

Term	Definition
Reinforcement Learning	A type of machine learning where agents learn by interacting with an environment and receiving feedback in the form of rewards or penalties.
Multiplayer Game	A game where multiple players interact and compete against each other, often requiring strategies that adapt to opponents' actions.
SIMPLE	An acronym for Self-play In MultiPlayer Environments, a framework that trains AI by pitting versions of itself against one another.
Agent	An AI or program that interacts with an environment to make decisions and perform actions.
Self-Play	A training mechanism where an AI agent competes against itself or past versions to improve and learn strategies.
Policy	The strategy or rule set that an agent follows when making decisions in a given state within the environment.
Environment	The virtual world or game scenario where the agent interacts, performs actions, and receives feedback.
State	A representation of the current situation or configuration in the environment that the agent can perceive.
Action	A move or decision taken by the agent to interact with the environment and progress toward a goal.
Reward	Feedback given to an agent after an action, used to reinforce learning by encouraging positive outcomes.
Training Epoch	A full iteration over the training dataset during the machine learning process, often used to refine the agent's model.
Model	A mathematical framework or neural network that an agent uses to process data, predict outcomes, and make decisions.
Opponent Modeling	A technique where an AI learns or predicts the strategies and actions of an opponent to adapt and compete effectively.
Exploration vs Exploitation	A trade-off in AI training where the agent must balance trying new actions (exploration) and using known successful strategies (exploitation).

Optimization Metrics

Metric	Meaning
Pol_surr (Policy Surrogate Loss)	Measures the difference in the policy after an update, constrained by PPO’s clipping mechanism. Negative values indicate the policy is improving (higher probability for better actions).
Pol_entpen (Policy Entropy Penalty)	Represents the entropy term encouraging exploration. Higher entropy means the policy explores more; as training progresses, entropy should gradually decrease as the policy converges.
Vf_loss (Value Function Loss)	Measures the error in the value function approximation. Lower values indicate the agent is better at predicting expected returns.
Kl (Kullback-Leibler Divergence)	Measures the change in policy before and after the update. A small Kl divergence suggests the policy update is within acceptable bounds.
Ent (Entropy)	Reflects the randomness of the policy. High entropy indicates more exploration; as training progresses, entropy decreases as the policy stabilizes.

Evaluation Results

Metric	Meaning
EpLenMean (Episode Length Mean)	Average number of steps per episode. If this stabilizes, it might indicate that the agent is learning an optimal strategy.
EpRewMean (Episode Reward Mean)	Average reward per episode. This is a primary measure of learning progress, where an increase implies the agent is performing better in the environment.
EpThisIter (Episodes This Iteration)	The number of episodes completed in the current training iteration.
EpisodesSoFar	Total episodes completed so far.
TimestepsSoFar	Total timesteps processed.

Loss Metrics

Metric	Meaning
Ev_tdlam_before	The expected value of TD(lambda) before the policy update. Positive values close to 1 indicate good alignment between predicted values and returns.
Loss_ent	Entropy of the policy (should decrease over time as policy converges).
Loss_kl	Kl divergence for the policy update.
Loss_pol_entpen	Entropy penalty term for the policy.
Loss_vf_loss	Loss related to the value function approximation.

Test.py Results Meanings

Explanation	Details
Most Recent Model	The first model listed is the most recently trained, while the second is the previous model competitor.
Cumulative Scores	Each line shows the cumulative scores after a certain number of games. Positive scores indicate success or winning, while negative scores indicate losses.
Winning Score	A win adds 1 to the agent’s total score.
Losing Score	A loss subtracts 1 from the agent’s total score.
Draw	A draw results in no change.