Personal Project  /  Multi-Paradigm Machine Learning  /  Python

STAR
REALMS. ML Platform

A ground-up Star Realms engine with two modes: a tight headless environment for AI self-play, and a clean UI for humans. A hybrid learning framework combining human gameplay data, supervised bootstrapping, and reinforcement learning โ€” that trains from scratch, learns what wins, and turns that into insights that make you a better player.

Multi-Paradigm ML Self-Play RL Supervised Learning Data Analysis Game AI
STAR
REALMS
ML
// Core Concept

The Learning Loop

A closed feedback loop between gameplay, data collection, training, and evaluation. Every game played โ€” human or AI โ€” feeds the next iteration of the model.

๐Ÿง‘โ€๐Ÿ’ป
Human Play
Real matches logged as structured training data
โ†’
episode
logs
๐Ÿ—„๏ธ
Data Pipeline
Preprocessing, analysis, reward signal design
โ†’
training
batches
๐Ÿง 
ML Training
Supervised bootstrap then RL self-play
Improved Model โ†’ Self-Play โ†’ More Data โ†’ Repeat
Data Type 01

General Human Play

High volume, natural strategies and mistakes. Broad coverage of diverse game states.

Data Type 02

Expert Gameplay

Smaller dataset, higher strategic quality. Powers initial supervised bootstrapping.

Data Type 03

AI Self-Play

Unlimited scale. Discovers strategies beyond human play through policy iteration.

// Architecture

System Design

Five layers, each with a clear responsibility. The engine handles gameplay; training is fully external so the live system is never affected.

01

Game Engine

Fully faithful Star Realms base set implementation. Card effects, faction synergies, ally triggers, scrap mechanics โ€” every rule encoded and covered by tests. Seeded RNG for deterministic replay and reproducible experiments.

rules engine pytest coverage seeded rng deterministic
02

Headless ML Environment

Zero UI, zero I/O โ€” just state, actions, and outcomes. Gym-style step/reset interface so any framework plugs straight in. Discrete action space, serializable state, runs thousands of games per second.

gymnasium api discrete actions state serialization high throughput
03

Supervised Bootstrapping

Before RL begins, the agent trains on human gameplay data โ€” learning to predict strong moves and evaluate positions from real games. Eliminates the cold-start problem and gives RL a head start beyond random exploration.

imitation learning move prediction position evaluation
04

Self-Play Reinforcement Learning

The agent improves through self-play against versioned snapshots of itself. Policy improvement and value estimation evolve from game outcomes alone. Discovers strategies unreachable by human data.

self-play policy iteration value estimation snapshot versioning
05

Analysis and Visualization

Structured episode logs feed an external analysis pipeline โ€” win rates by faction, card impact by turn, deck composition correlations. Policy and data are distilled into a playable UI and a data-backed play guide.

episode logging win rate analysis card impact scoring play guide
// Components

What Was Built

Four distinct components with clean separation of concerns. The engine knows nothing about the UI. Training happens entirely outside the live system.

01 / ENGINE

Game Engine

Full rules implementation with modular effect system, ally resolution, trade row management, and deck cycling. Dedicated test suite per feature area with extensive coverage.

โš™๏ธ
02 / ENVIRONMENT

ML Environment

Headless game wrapper with standardized observation space, discrete action interface, and serializable state. Designed for high-throughput simulation compatible with standard ML tooling.

๐Ÿง 
03 / INTERFACE

Human UI

Graphical interface for playing and observing. Real-time game log panel alongside the board. Watch the AI play and compare its decisions against your own in the same view.

๐Ÿ–ฅ๏ธ
04 / ANALYSIS

Data Pipeline

External training and analysis system processing structured episode logs. Produces win rate analytics, card impact scores, and the reward signal design underpinning RL training.

๐Ÿ“Š
// Outcomes

What It Produces

The system becomes both a research platform and a practical tool for improving at the game.

Trained Agent

A multi-paradigm agent that learned from scratch โ€” supervised bootstrapping from human data, then refined through self-play RL. Strategy emerges from data, not hardcoded heuristics.

Data-Backed Play Guide

Faction win rates, card priority by turn, deck pacing โ€” all grounded in simulation results. Every claim backed by a number, not conventional wisdom.

Real-Time Advisor

Query the trained policy mid-game. Given the current board state, what does the agent buy? Compare your line against its and understand why it diverges.

Training Visualization

Watch strategy evolve over training iterations. See which cards the agent learned to value, when it discovered faction synergies, and how play style shifted over time.

Research Platform

A reusable environment for experimenting with ML approaches against a complex hidden-information game with discrete actions and long-horizon strategy.

Game Analysis

Quantified analysis of what drives wins โ€” not opinion. Authority trajectories, optimal scrap timing, faction purity vs. splashing โ€” all evaluated at scale.

// Stack

Technologies

Engine & Simulation

Python Pygame Pytest JSON

ML & Training

PyTorch Gymnasium NumPy Self-Play RL Supervised Learning

Analysis & Viz

Pandas Matplotlib Data Analysis
Active
Engine Refactor + Tests Headless Env Logging Analysis Training