MininetGym Manual

MininetGym is a browser-accessible, open-source framework for training and evaluating Reinforcement Learning (RL) agents on cybersecurity tasks inside a live Software-Defined Network (SDN) emulated with Mininet and controlled by OpenDayLight.

The web dashboard exposes three main panels — Configuration Setup, Training Dashboard, and Results Panel — that cover the full experiment lifecycle without requiring any command-line interaction.

MininetGym System Architecture
System Architecture — from Web Dashboard down to Mininet SDN network and OpenDayLight controller.
Access: open http://<host>:5000 in any modern browser (Chrome, Firefox, Edge). The UI is fully responsive and works on mobile devices too.

Configuration Setup

This panel lets you define every aspect of an experiment before starting training. Changes are saved immediately with the Save button.

Network Topology

The switch (OVS) sits at the center. All hosts and IoT devices connect to it in a star topology, as shown in the schema below.

MininetGym Network Topology Schema
Network topology: OVS switch at center, regular hosts and IoT nodes on the perimeter.
  • Hosts — number of regular hosts in the Mininet topology (default 5).
  • IoT Devices — number of IoT nodes attached to the switch (default 5).
  • Controller — OpenDayLight address; usually 127.0.0.1:8181.

Screenshots — Configuration Panel

Gym Type (Scenario)

Select the RL environment the agents will train on. Four scenarios are available:

ScenarioTaskAction
ClassificationClassify traffic: None / Ping / UDP / TCPAlert only
Attack-NetBinary detection: Normal vs. Attack (global)Alert only
Attack-PerHostPer-host: Normal / Victim / AttackerSDN link block
MARLHierarchical multi-agent (Coordinator + Host agents)Distributed SDN block
Attack-PerHost is the primary demonstration scenario: a correct Attacker classification triggers an OpenDayLight flow rule that drops traffic from that host at the switch level.

Agent Configuration

Use the Add Agent button to include one or more agents in the experiment. Each agent entry has:

  • Algorithm — choose from Q-Learning, SARSA, DQN, PPO, A2C, Supervised.
  • Name — a free-form label shown in all charts and tables.
  • Episodes — number of training episodes.
  • Steps per episode — maximum steps before an episode resets.

Multiple agents can run sequentially in the same experiment for easy comparison.

For a first run, leave all hyperparameters at their defaults — they are tuned for the Attack-PerHost scenario with 5 hosts.

Full Configuration Reference

All parameters are stored in config/default.yaml and editable live from the Configuration panel. The table below documents every key available in the configuration file, organised by section.

Global Settings

ParameterDescription
training_directoryDirectory where experiment outputs and training artefacts are stored.
enable_web_interfaceEnable or disable the web UI.
server_userUnix user used by the runtime when launching Mininet-related processes.
log_levelLogging verbosity used by the application.
random_seedSeed used to make runs reproducible.

Environment Parameters (env_params)

ParameterDescription
gym_typeSelect which scenario pipeline to execute. Use *_from_dataset variants when replaying pre-recorded status JSON files; use non-dataset variants for live/synthetic traffic generation.
episodesNumber of training episodes.
max_stepsMaximum number of steps per episode.
test_episodesNumber of evaluation episodes run after training.
n_binsNumber of bins used by state discretisation. Higher values preserve more detail but increase the state-space size for tabular agents.
steps_min_percentageMinimum fraction of episode steps required before early-stop checks are considered valid. Typical range: 0.7–0.95.
accuracy_minAccuracy threshold used by early-stop logic. If reached (with steps_min_percentage satisfied), training can terminate early.
wait_after_readDelay (seconds) after reading network counters before the next observation is collected.
show_normal_trafficShow normal traffic generation events in the logs and UI.
print_training_chartGenerate chart images to disk during training.
must_check_envEnable extra Gymnasium environment validity checks before running.
Scenario source (UI): The Scenario Source selector controls how the environment is initialised.
  • Generate new (default) — a new scenario is created when training starts.
  • Generate test scenario — creates a preview in memory and opens the analysis popup; no file is saved.
  • Load existing — click a row to select an existing scenario.json. When navigating to Training, episodes, max_steps and test_episodes are overridden with the values stored in that file.

Attack Parameters (env_params.attacks)

Active only for attack-oriented scenarios: Attack-Net, Attack-PerHost, MARL.

ParameterDescription
likelyBase probability that a host starts an attack when idle. Can evolve dynamically during generation depending on scenario logic.
likely_trainAttack likelihood during training episodes (Attack-PerHost only). Range: 0.0–1.0. Higher values create frequent attacks for policy learning.
likely_evalAttack likelihood during evaluation episodes (Attack-PerHost only). Range: 0.0–1.0. Lower values simulate realistic sparse attack conditions.
max_attack_percentageUpper cap for effective attack probability (0.0–1.0). Example: 0.9 means attack likelihood scales up to 90% in high-pressure phases.
short_attack_durationDuration (steps) of SHORT_ATTACK windows. Lower values create bursty spikes; higher values create sustained micro-attacks.
long_attack_durationDuration (steps) of LONG_ATTACK windows. Increase for persistent attacks; reduce for more fragmented behaviour.
no_attack_timeoutCooldown (steps) after an attack before the same host can attack again. Lower values increase attack density.
unblock_min_hold_roundsMinimum full rounds to keep a host blocked before allowing unblock.
unblock_required_normal_streakConsecutive NORMAL decisions required while blocked before the host is unblocked.
apply_drop_rulesIf true, the environment applies OpenDayLight drop rules when the agent takes the block action. If false, the environment provides reward feedback only without modifying network traffic.

Detection Thresholds

Thresholds used by rule-based labelling in both attack and classification scenarios.

ParameterApplies toDescription
attacks.thresholds.packetsAttack scenariosAbsolute packet-count threshold. Values above this are treated as anomalous.
attacks.thresholds.var_packetsAttack scenariosAllowed packet variation (%) versus baseline. Above this delta, traffic may be flagged as unstable.
attacks.thresholds.bytesAttack scenariosAbsolute byte-volume threshold used with packet thresholds to characterise attack intensity.
attacks.thresholds.var_bytesAttack scenariosAllowed byte variation (%) versus baseline. Useful for volumetric anomaly detection.
classification.thresholds.packetsClassificationPacket threshold for traffic class labelling.
classification.thresholds.bytesClassificationByte threshold for traffic class labelling.
classification.thresholds.var_packetsClassificationAllowed packet variation percentage for classification.
classification.thresholds.var_bytesClassificationAllowed byte variation percentage for classification.

Network Topology (env_params.net_params)

ParameterDescription
num_hostsNumber of regular hosts in the simulated network.
num_switchesNumber of OVS switches in the simulated network.
num_iotNumber of IoT nodes in the simulated network.
traffic_typesAllowed normal-traffic generators. Common set: none, ping, udp, tcp.
start_cliOpen the Mininet CLI after creating the network (useful for manual inspection).

SDN Controller (env_params.net_params.controller)

ParameterDescription
ipController IP address (e.g. 127.0.0.1 or Docker host IP).
portController TCP port (default 8181 for OpenDayLight REST API).
usrController username (default admin).
pwdController password.
Ensure IP, port and credentials match your OpenDayLight instance before starting training. A wrong controller address will prevent SDN drop rules from being applied.

Agent Common Parameters (agents.*)

Each entry in the agents list represents one agent configuration. Multiple agents can be enabled in the same run.

ParameterDescription
nameDisplay name shown in charts, tables and PDF exports.
algorithmAlgorithm identifier: q_learning, sarsa, dqn, ppo, a2c, supervised.
enabledEnable or disable this agent without removing its configuration.
progress_barShow a progress bar in the terminal while training this agent.
skip_learnSkip learning and use the agent for evaluation or inspection only.
show_actionLog the chosen action at every step (verbose).
loadIf true, attempt to load saved weights/artefacts before training.
load_dirRelative path (from training root) to a saved run folder. Must contain artefacts compatible with the selected algorithm.
saveSave the trained model artefacts at the end of the run.
state_input_modeInput representation for deep agents: normalized (recommended) or raw. Keep consistent when loading checkpoints.
net_archNeural network architecture list for deep agents (e.g. [64, 64]).

Algorithm-Specific Hyperparameters

Q-Learning & SARSA

ParameterDescription
learning_rateStep size for Q-table updates.
discount_factorγ — how much future rewards are discounted (0–1).
exploration_rateInitial ε for ε-greedy exploration.
exploration_decayMultiplicative factor applied to ε after each episode.

DQN (Deep Q-Network)

ParameterDescription
learning_rateAdam optimiser learning rate.
gammaDiscount factor for future rewards.
buffer_sizeCapacity of the experience replay buffer.
batch_sizeMini-batch size drawn from the replay buffer per gradient update.
target_update_intervalNumber of steps between target network weight copies.
learning_startsNumber of environment steps collected before learning begins.
exploration_fractionFraction of total training steps over which ε is annealed.
exploration_initial_epsStarting value of ε.
exploration_final_epsFinal (minimum) value of ε.

PPO & A2C

ParameterDescription
learning_rateAdam optimiser learning rate.
gammaDiscount factor for future rewards.
n_stepsNumber of rollout steps collected before each policy update. Must satisfy batch_size < n_steps × n_envs.
batch_sizeMini-batch size used for gradient updates (PPO only).
ent_coefEntropy coefficient — higher values encourage exploration; lower values promote exploitation.
Rule of thumb: batch_size must be strictly less than n_steps × n_envs. Violating this causes a Stable-Baselines3 assertion error at startup.

Training Dashboard

Start the experiment from the Configuration panel and switch to Training Dashboard to monitor progress in real time. All data is pushed via WebSocket (Socket.IO) — no page refresh needed.

Screenshots — Training in Action

Live Charts

  • Reward per Episode — cumulative reward obtained by the agent in each episode.
  • Accuracy per Episode — percentage of correct classifications over the episode.
  • ε (Epsilon) Decay — shows how exploration decreases over time (tabular / DQN).

Host Status Monitor

A live grid shows the current classification label for every host: Normal, Attacker, or Victim. When an SDN block is issued, the attacker cell displays a lock icon.

During heavy attacks the reward may drop sharply — this is expected. A well-trained agent quickly recovers accuracy and triggers the SDN mitigation within a few steps.

Controls

  • Pause / Resume — suspends the training thread; the network keeps running.
  • Stop — terminates the current experiment and saves partial results.

Results Panel

After training completes (or is stopped), switch to the Results Panel to inspect per-agent metrics and export a PDF report.

Screenshots — Results Panel

Metrics Summary Table

For each agent the table shows:

  • Accuracy, Precision, Recall, F1-score
  • Mitigation Ratio — SDN blocks issued / total attack events detected.
  • False Negative Rate (FNR) — missed attacks; penalised ×2 in the reward function.
  • Attack Latency — average steps from attack start to SDN block.

Comparison Charts

  • Bar chart — side-by-side accuracy / F1 across all agents.
  • Radar chart — multi-dimensional agent profile (accuracy, precision, recall, FNR, mitigation).
  • Confusion Matrix — per-agent heatmap of predicted vs. true labels.

PDF Export

Click Export PDF to download a formatted report containing all charts, the metrics table, and the experiment configuration.

You can also download the raw results as a ZIP archive (CSV + JSON + charts) from the Results panel toolbar for offline analysis.

Supported RL Agents

Q-Learning

Tabular off-policy TD method. The observation space is log-bin discretized to keep the state table manageable. Best suited for low-dimensional scenarios (Classification, Attack-Net).

SARSA

On-policy variant of Q-Learning. Uses the action actually taken (rather than the greedy action) for the TD update, making it slightly more conservative in exploration.

DQN — Deep Q-Network

Uses a neural network to approximate the Q-function. Includes an experience replay buffer and a target network. Implemented via Stable-Baselines3.

PPO — Proximal Policy Optimization

Actor-critic on-policy algorithm with a clipped surrogate objective. Robust and sample-efficient; generally the best baseline for continuous or large discrete observation spaces.

A2C — Advantage Actor-Critic

Synchronous on-policy actor-critic. Faster per-step than PPO but may require more episodes to converge. Good for quick experiments.

Supervised Agent

A classification baseline trained with supervised learning on labelled traffic data. Supports incremental learning: the model is updated after each episode with the new observations collected during training. Used as a performance ceiling reference.


Attack Scenarios

The Attack Generator runs on dedicated Mininet hosts and can produce:

  • UDP Flood — high-rate UDP packets toward a victim host.
  • TCP Flood — repeated TCP connection attempts.
  • ICMP Flood — continuous ping storm.
  • SYN Flood — half-open TCP connections to exhaust victim resources.
  • Slowloris — slow HTTP attack that holds connections open.

Attacks are injected randomly during training so that agents learn to detect and mitigate them across varying conditions.


MARL — Multi-Agent Mode

In the MARL scenario, two agent types collaborate via an internal message bus:

  • Coordinator — observes global network statistics (ℝ5), decides whether to signal an alert to host agents.
  • Host Agents (one per host) — observe per-host traffic (ℝ9), take local block / no-block actions.

All agents run in parallel threads. The coordinator's signal is included as an extra feature in each host agent's observation vector.

MARL training is more resource-intensive. Reduce episodes or steps per episode if the machine is under heavy load.

Tips & Troubleshooting

Training does not start

  • Make sure the Mininet topology is running (sudo python main.py).
  • Check that OpenDayLight is reachable at the configured address.
  • Verify no previous experiment is still running (use Stop first).

Charts are not updating

  • Confirm the browser has an active WebSocket connection (look for the green dot in the status bar).
  • Reload the page — the Socket.IO client will reconnect automatically.

SDN blocks are never triggered

  • Only the Attack-PerHost and MARL scenarios issue SDN drop rules.
  • Make sure OpenDayLight credentials are correct in the config.

Low accuracy after many episodes

  • Increase episodes or lower learning_rate for slower but more stable convergence.
  • For tabular agents (Q-Learning / SARSA) try increasing epsilon_decay to explore longer.
  • PPO / A2C may need more n_steps for complex scenarios.

PDF export fails

  • Ensure wkhtmltopdf or WeasyPrint is installed on the server.
  • Check server logs (app.log) for the detailed error.

Keyboard & UI Shortcuts

ActionHow
Close this manualPress Esc or click outside the panel
Navigate sectionsClick any entry in the left Contents sidebar
Switch panelHeader navigation buttons (Configuration / Training / Results)
Mobile menuTap the ☰ button in the top-right corner

About

MininetGym is developed at DISMI — University of Modena and Reggio Emilia by Salvo Finistrella, Stefano Mariani, and Franco Zambonelli.

Source code: github.com/dipi-unimore/mininet-gym
Contact: salvo.finistrella@unimore.it
Paper: ScienceDirect — doi.org/10.1016/j.simpa.2025…