IoT-23 — Labeled IoT Malware & Benign Traffic [325M Flows, 500+ Hours]
Abstract
"Real IoT malware traffic dataset with 325M labeled network flows from 20 malware and 3 benign device captures over 500+ hours. PCAP and Zeek conn.log formats. Used for IoT botnet detection, malware traffic classification, and ML security research."
Description
Overview
IoT-23 (Aposemat IoT-23) is a large-scale, publicly available dataset of labeled network traffic from real IoT devices captured at the Stratosphere Laboratory, AIC Group, FEL, Czech Technical University in Prague. It is the first dataset to combine actual malware execution on physical IoT devices with benign device traffic, making it uniquely representative of real-world IoT threat scenarios.
The dataset contains 23 scenarios: 20 network captures from IoT devices infected with real malware samples (including Mirai, Torii, Okiru, Muhstik, and IRCBot variants) and 3 captures of completely benign IoT devices (Philips Hue smart light bridge, Amazon Echo, and a Somfy smart door lock). In total, it contains more than 760 million packets and 325 million labeled flows spanning over 500 hours of network traffic captured between 2018 and 2019.
Traffic was labeled using Zeek (Bro) conn.log format, with labels including Benign, C&C, DDoS, PartOfAHorizontalPortScan, FileDownload, Attack, and Okiru, among others. The research and dataset collection was funded by Avast Software.
Column Schema
| Column | Description |
|---|---|
| ts | Timestamp of the connection record. |
| uid | Unique connection identifier. |
| id.orig_h / id.resp_h | Originator and responder IP addresses. |
| id.orig_p / id.resp_p | Originator and responder port numbers. |
| proto | Transport protocol (tcp, udp, icmp). |
| service | Application-layer service detected. |
| duration | Connection duration in seconds. |
| orig_bytes / resp_bytes | Bytes transferred by originator and responder. |
| label | Traffic class: Benign, C&C, DDoS, PortScan, FileDownload, etc. |
| detailed-label | Granular attack sub-label. |
Key Statistics
- Total Flows: 325+ million labeled flows
- Total Packets: 760+ million
- Traffic Duration: 500+ hours
- Scenarios: 23 (20 malware + 3 benign)
- Malware Families: Mirai, Torii, Okiru, Muhstik, IRCBot, and others
- File Format: PCAP and Zeek conn.log (labeled)
- Full download: ~20 GB; Light version (flows only): ~8.7 GB
- Capture Period: 2018–2019; Published: January 2020
Use Cases
- IoT malware traffic detection and botnet identification
- Behavioral analysis of compromised IoT devices vs. benign devices
- C&C communication detection and lateral movement analysis
- ML-based multi-label network traffic classification for IoT security
Source & Attribution
Created by Sebastian Garcia, Agustin Parmisano, and Maria Jose Erquiaga at the Stratosphere Laboratory, Czech Technical University in Prague. Funded by Avast Software. Available for download from the Stratosphere IPS website and mirrored on Zenodo (record 4743746).
View Data Structure
To explore column names, data types, and sample rows, visit the official dataset page on Zenodo.
Preview on ZenodoCite This Dataset
Garcia, Sebastian, Parmisano, Agustin, & Erquiaga, Maria Jose (2020). IoT-23 — Labeled IoT Malware & Benign Traffic [325M Flows, 500+ Hours]. [Dataset]. Zenodo. https://doi.org/10.5281/ZENODO.4743745
Source: Zenodo (2020) · DOI: 10.5281/ZENODO.4743745
Indexed by IoTDataset.com on Apr 13, 2026
Ready to Start Your Research?
Download this dataset directly from the official repository and start building your next breakthrough project.