MininetGym Manual

MininetGym is a browser-accessible, open-source framework for training and evaluating Reinforcement Learning (RL) agents on cybersecurity tasks inside a live Software-Defined Network (SDN) emulated with Mininet and controlled by OpenDayLight.

The web dashboard exposes three main panels — Configuration Setup, Training Dashboard, and Results Panel — that cover the full experiment lifecycle without requiring any command-line interaction.

▶ Screensaver Presentation ▶ Demo Video

MininetGym System Architecture — System Architecture — from Web Dashboard down to Mininet SDN network and OpenDayLight controller.

Access: open http://<host>:5000 in any modern browser (Chrome, Firefox, Edge). The UI is fully responsive and works on mobile devices too.

Configuration Setup

This panel lets you define every aspect of an experiment before starting training. Changes are saved immediately with the Save button.

Network Topology

The switch (OVS) sits at the center. All hosts and IoT devices connect to it in a star topology, as shown in the schema below.

Hosts — number of regular hosts in the Mininet topology (default 5).
IoT Devices — number of IoT nodes attached to the switch (default 5).
Controller — OpenDayLight address; usually 127.0.0.1:8181.

Screenshots — Configuration Panel

Configuration Desktop — Configuration Setup — Desktop

Configuration Mobile — Configuration Setup — Mobile

Gym Type (Scenario)

Select the RL environment the agents will train on. Four scenarios are available:

Scenario	Task	Action
Classification	Classify traffic: None / Ping / UDP / TCP	Alert only
Attack-Net	Binary detection: Normal vs. Attack (global)	Alert only
Attack-PerHost	Per-host: Normal / Victim / Attacker	SDN link block
MARL	Hierarchical multi-agent (Coordinator + Host agents)	Distributed SDN block

Attack-PerHost is the primary demonstration scenario: a correct Attacker classification triggers an OpenDayLight flow rule that drops traffic from that host at the switch level.

Agent Configuration

Use the Add Agent button to include one or more agents in the experiment. Each agent entry has:

Algorithm — choose from Q-Learning, SARSA, DQN, PPO, A2C, Supervised.
Name — a free-form label shown in all charts and tables.
Episodes — number of training episodes.
Steps per episode — maximum steps before an episode resets.

Multiple agents can run sequentially in the same experiment for easy comparison.

For a first run, leave all hyperparameters at their defaults — they are tuned for the Attack-PerHost scenario with 5 hosts.

Full Configuration Reference

All parameters are stored in config/default.yaml and editable live from the Configuration panel. The table below documents every key available in the configuration file, organised by section.

Global Settings

Parameter	Description
`training_directory`	Directory where experiment outputs and training artefacts are stored.
`enable_web_interface`	Enable or disable the web UI.
`server_user`	Unix user used by the runtime when launching Mininet-related processes.
`log_level`	Logging verbosity used by the application.
`random_seed`	Seed used to make runs reproducible.

Environment Parameters (`env_params`)

Parameter	Description
`gym_type`	Select which scenario pipeline to execute. Use `*_from_dataset` variants when replaying pre-recorded status JSON files; use non-dataset variants for live/synthetic traffic generation.
`episodes`	Number of training episodes.
`max_steps`	Maximum number of steps per episode.
`test_episodes`	Number of evaluation episodes run after training.
`n_bins`	Number of bins used by state discretisation. Higher values preserve more detail but increase the state-space size for tabular agents.
`steps_min_percentage`	Minimum fraction of episode steps required before early-stop checks are considered valid. Typical range: 0.7–0.95.
`accuracy_min`	Accuracy threshold used by early-stop logic. If reached (with `steps_min_percentage` satisfied), training can terminate early.
`wait_after_read`	Delay (seconds) after reading network counters before the next observation is collected.
`show_normal_traffic`	Show normal traffic generation events in the logs and UI.
`print_training_chart`	Generate chart images to disk during training.
`must_check_env`	Enable extra Gymnasium environment validity checks before running.

Scenario source (UI): The Scenario Source selector controls how the environment is initialised.

Generate new (default) — a new scenario is created when training starts.
Generate test scenario — creates a preview in memory and opens the analysis popup; no file is saved.
Load existing — click a row to select an existing scenario.json. When navigating to Training, episodes, max_steps and test_episodes are overridden with the values stored in that file.

Attack Parameters (`env_params.attacks`)

Active only for attack-oriented scenarios: Attack-Net, Attack-PerHost, MARL.

Parameter	Description
`likely`	Base probability that a host starts an attack when idle. Can evolve dynamically during generation depending on scenario logic.
`likely_train`	Attack likelihood during training episodes (Attack-PerHost only). Range: 0.0–1.0. Higher values create frequent attacks for policy learning.
`likely_eval`	Attack likelihood during evaluation episodes (Attack-PerHost only). Range: 0.0–1.0. Lower values simulate realistic sparse attack conditions.
`max_attack_percentage`	Upper cap for effective attack probability (0.0–1.0). Example: 0.9 means attack likelihood scales up to 90% in high-pressure phases.
`short_attack_duration`	Duration (steps) of SHORT_ATTACK windows. Lower values create bursty spikes; higher values create sustained micro-attacks.
`long_attack_duration`	Duration (steps) of LONG_ATTACK windows. Increase for persistent attacks; reduce for more fragmented behaviour.
`no_attack_timeout`	Cooldown (steps) after an attack before the same host can attack again. Lower values increase attack density.
`unblock_min_hold_rounds`	Minimum full rounds to keep a host blocked before allowing unblock.
`unblock_required_normal_streak`	Consecutive NORMAL decisions required while blocked before the host is unblocked.
`apply_drop_rules`	If `true`, the environment applies OpenDayLight drop rules when the agent takes the block action. If `false`, the environment provides reward feedback only without modifying network traffic.

Detection Thresholds

Thresholds used by rule-based labelling in both attack and classification scenarios.

Parameter	Applies to	Description
`attacks.thresholds.packets`	Attack scenarios	Absolute packet-count threshold. Values above this are treated as anomalous.
`attacks.thresholds.var_packets`	Attack scenarios	Allowed packet variation (%) versus baseline. Above this delta, traffic may be flagged as unstable.
`attacks.thresholds.bytes`	Attack scenarios	Absolute byte-volume threshold used with packet thresholds to characterise attack intensity.
`attacks.thresholds.var_bytes`	Attack scenarios	Allowed byte variation (%) versus baseline. Useful for volumetric anomaly detection.
`classification.thresholds.packets`	Classification	Packet threshold for traffic class labelling.
`classification.thresholds.bytes`	Classification	Byte threshold for traffic class labelling.
`classification.thresholds.var_packets`	Classification	Allowed packet variation percentage for classification.
`classification.thresholds.var_bytes`	Classification	Allowed byte variation percentage for classification.

Network Topology (`env_params.net_params`)

Parameter	Description
`num_hosts`	Number of regular hosts in the simulated network.
`num_switches`	Number of OVS switches in the simulated network.
`num_iot`	Number of IoT nodes in the simulated network.
`traffic_types`	Allowed normal-traffic generators. Common set: `none`, `ping`, `udp`, `tcp`.
`start_cli`	Open the Mininet CLI after creating the network (useful for manual inspection).

SDN Controller (`env_params.net_params.controller`)

Parameter	Description
`ip`	Controller IP address (e.g. `127.0.0.1` or Docker host IP).
`port`	Controller TCP port (default `8181` for OpenDayLight REST API).
`usr`	Controller username (default `admin`).
`pwd`	Controller password.

Ensure IP, port and credentials match your OpenDayLight instance before starting training. A wrong controller address will prevent SDN drop rules from being applied.

Agent Common Parameters (`agents.*`)

Each entry in the agents list represents one agent configuration. Multiple agents can be enabled in the same run.

Parameter	Description
`name`	Display name shown in charts, tables and PDF exports.
`algorithm`	Algorithm identifier: `q_learning`, `sarsa`, `dqn`, `ppo`, `a2c`, `supervised`.
`enabled`	Enable or disable this agent without removing its configuration.
`progress_bar`	Show a progress bar in the terminal while training this agent.
`skip_learn`	Skip learning and use the agent for evaluation or inspection only.
`show_action`	Log the chosen action at every step (verbose).
`load`	If `true`, attempt to load saved weights/artefacts before training.
`load_dir`	Relative path (from training root) to a saved run folder. Must contain artefacts compatible with the selected algorithm.
`save`	Save the trained model artefacts at the end of the run.
`state_input_mode`	Input representation for deep agents: `normalized` (recommended) or `raw`. Keep consistent when loading checkpoints.
`net_arch`	Neural network architecture list for deep agents (e.g. `[64, 64]`).

Algorithm-Specific Hyperparameters

Q-Learning & SARSA

Parameter	Description
`learning_rate`	Step size for Q-table updates.
`discount_factor`	γ — how much future rewards are discounted (0–1).
`exploration_rate`	Initial ε for ε-greedy exploration.
`exploration_decay`	Multiplicative factor applied to ε after each episode.

DQN (Deep Q-Network)

Parameter	Description
`learning_rate`	Adam optimiser learning rate.
`gamma`	Discount factor for future rewards.
`buffer_size`	Capacity of the experience replay buffer.
`batch_size`	Mini-batch size drawn from the replay buffer per gradient update.
`target_update_interval`	Number of steps between target network weight copies.
`learning_starts`	Number of environment steps collected before learning begins.
`exploration_fraction`	Fraction of total training steps over which ε is annealed.
`exploration_initial_eps`	Starting value of ε.
`exploration_final_eps`	Final (minimum) value of ε.

PPO & A2C

Parameter	Description
`learning_rate`	Adam optimiser learning rate.
`gamma`	Discount factor for future rewards.
`n_steps`	Number of rollout steps collected before each policy update. Must satisfy `batch_size < n_steps × n_envs`.
`batch_size`	Mini-batch size used for gradient updates (PPO only).
`ent_coef`	Entropy coefficient — higher values encourage exploration; lower values promote exploitation.

Rule of thumb: batch_size must be strictly less than n_steps × n_envs. Violating this causes a Stable-Baselines3 assertion error at startup.

Training Dashboard

Start the experiment from the Configuration panel and switch to Training Dashboard to monitor progress in real time. All data is pushed via WebSocket (Socket.IO) — no page refresh needed.

Screenshots — Training in Action

Training Dashboard Desktop — Training Dashboard — Desktop

Experiment Dashboard Desktop — Experiment Dashboard — Desktop (detail)

Training Dashboard Mobile — Training Dashboard — Mobile

Live Charts

Reward per Episode — cumulative reward obtained by the agent in each episode.
Accuracy per Episode — percentage of correct classifications over the episode.
ε (Epsilon) Decay — shows how exploration decreases over time (tabular / DQN).

Host Status Monitor

A live grid shows the current classification label for every host: Normal, Attacker, or Victim. When an SDN block is issued, the attacker cell displays a lock icon.

During heavy attacks the reward may drop sharply — this is expected. A well-trained agent quickly recovers accuracy and triggers the SDN mitigation within a few steps.

Controls

Pause / Resume — suspends the training thread; the network keeps running.
Stop — terminates the current experiment and saves partial results.

Results Panel

After training completes (or is stopped), switch to the Results Panel to inspect per-agent metrics and export a PDF report.

Screenshots — Results Panel

Results Analysis Desktop — Results Analysis — Desktop

Result Dashboard Desktop — Result Dashboard — Desktop (detail)

Result Detail Mobile — Result Detail — Mobile

Metrics Summary Table

For each agent the table shows:

Accuracy, Precision, Recall, F1-score
Mitigation Ratio — SDN blocks issued / total attack events detected.
False Negative Rate (FNR) — missed attacks; penalised ×2 in the reward function.
Attack Latency — average steps from attack start to SDN block.

Comparison Charts

Bar chart — side-by-side accuracy / F1 across all agents.
Radar chart — multi-dimensional agent profile (accuracy, precision, recall, FNR, mitigation).
Confusion Matrix — per-agent heatmap of predicted vs. true labels.

PDF Export

Click Export PDF to download a formatted report containing all charts, the metrics table, and the experiment configuration.

You can also download the raw results as a ZIP archive (CSV + JSON + charts) from the Results panel toolbar for offline analysis.

Supported RL Agents

Q-Learning

Tabular off-policy TD method. The observation space is log-bin discretized to keep the state table manageable. Best suited for low-dimensional scenarios (Classification, Attack-Net).

SARSA

On-policy variant of Q-Learning. Uses the action actually taken (rather than the greedy action) for the TD update, making it slightly more conservative in exploration.

DQN — Deep Q-Network

Uses a neural network to approximate the Q-function. Includes an experience replay buffer and a target network. Implemented via Stable-Baselines3.

PPO — Proximal Policy Optimization

Actor-critic on-policy algorithm with a clipped surrogate objective. Robust and sample-efficient; generally the best baseline for continuous or large discrete observation spaces.

A2C — Advantage Actor-Critic

Synchronous on-policy actor-critic. Faster per-step than PPO but may require more episodes to converge. Good for quick experiments.

Supervised Agent

A classification baseline trained with supervised learning on labelled traffic data. Supports incremental learning: the model is updated after each episode with the new observations collected during training. Used as a performance ceiling reference.

Attack Scenarios

The Attack Generator runs on dedicated Mininet hosts and can produce:

UDP Flood — high-rate UDP packets toward a victim host.
TCP Flood — repeated TCP connection attempts.
ICMP Flood — continuous ping storm.
SYN Flood — half-open TCP connections to exhaust victim resources.
Slowloris — slow HTTP attack that holds connections open.

Attacks are injected randomly during training so that agents learn to detect and mitigate them across varying conditions.

MARL — Multi-Agent Mode

In the MARL scenario, two agent types collaborate via an internal message bus:

Coordinator — observes global network statistics (ℝ⁵), decides whether to signal an alert to host agents.
Host Agents (one per host) — observe per-host traffic (ℝ⁹), take local block / no-block actions.

All agents run in parallel threads. The coordinator's signal is included as an extra feature in each host agent's observation vector.

MARL training is more resource-intensive. Reduce episodes or steps per episode if the machine is under heavy load.

Tips & Troubleshooting

Training does not start

Make sure the Mininet topology is running (sudo python main.py).
Check that OpenDayLight is reachable at the configured address.
Verify no previous experiment is still running (use Stop first).

Charts are not updating

Confirm the browser has an active WebSocket connection (look for the green dot in the status bar).
Reload the page — the Socket.IO client will reconnect automatically.

SDN blocks are never triggered

Only the Attack-PerHost and MARL scenarios issue SDN drop rules.
Make sure OpenDayLight credentials are correct in the config.

Low accuracy after many episodes

Increase episodes or lower learning_rate for slower but more stable convergence.
For tabular agents (Q-Learning / SARSA) try increasing epsilon_decay to explore longer.
PPO / A2C may need more n_steps for complex scenarios.

PDF export fails

Ensure wkhtmltopdf or WeasyPrint is installed on the server.
Check server logs (app.log) for the detailed error.

Keyboard & UI Shortcuts

Action	How
Close this manual	Press `Esc` or click outside the panel
Navigate sections	Click any entry in the left Contents sidebar
Switch panel	Header navigation buttons (Configuration / Training / Results)
Mobile menu	Tap the ☰ button in the top-right corner

About

MininetGym is developed at DISMI — University of Modena and Reggio Emilia by Salvo Finistrella, Stefano Mariani, and Franco Zambonelli.

Source code: github.com/dipi-unimore/mininet-gym
Contact: salvo.finistrella@unimore.it
Paper: ScienceDirect — doi.org/10.1016/j.simpa.2025…