Bot-IoT Dataset - Large-Scale IoT Botnet Traffic with Full Packet Capture
Abstract
"Comprehensive large-scale IoT botnet dataset combining legitimate IoT network traffic with realistic botnet attack scenarios. Features full packet captures (PCAP) and extracted flow features for diverse attack types including DDoS, reconnaissance, theft, and DoS attacks."
Description
Dataset Overview
The Bot-IoT dataset is a large-scale IoT security resource that combines realistic benign IoT network traffic with diverse botnet attack scenarios. It provides both raw packet captures and processed network flow features, making it versatile for various research approaches.
Dataset Composition
The dataset integrates two distinct traffic sources:
1. Legitimate IoT Traffic
Normal operational traffic from IoT devices and services including:
- Smart home device communications
- IoT sensor data transmissions
- Device-to-cloud service interactions
- Inter-device communications in smart environments
- Firmware updates and maintenance traffic
2. Botnet Attack Traffic
Realistic attack scenarios simulating compromised IoT devices participating in malicious activities across four major categories.
Attack Categories
DDoS Attacks (Distributed Denial of Service)
- UDP flooding from multiple compromised devices
- TCP SYN flood attacks
- HTTP flood targeting web services
- DNS amplification attacks
Reconnaissance and Information Gathering
- Network scanning to identify vulnerable devices
- Port scanning for open services
- OS fingerprinting attempts
- Service enumeration
Data Theft and Exfiltration
- Keylogging traffic patterns
- Data exfiltration through covert channels
- Credential harvesting communications
DoS Attacks (Single-Source)
- Resource exhaustion attacks
- Protocol-specific DoS targeting IoT services
Data Formats
PCAP Files (Full Packet Capture)
Raw network packets captured at wire-level enabling deep packet inspection, protocol analysis, payload examination, and development of signature-based detection systems.
Flow Features (Extracted Statistics)
Processed network flow statistics providing efficient machine learning features without requiring packet-level processing:
- Flow durations and packet counts
- Byte statistics and protocol distributions
- Flag counts and connection states
- Inter-arrival times and burst patterns
- Bidirectional flow characteristics
Scale and Diversity
The large-scale nature provides:
- Millions of network flows
- Diverse attack implementations
- Realistic traffic mixing (benign and malicious)
- Multiple device types and manufacturers
- Temporal patterns spanning extended periods
Research Applications
- Deep Learning IDS: Train neural networks on flow features for intrusion detection
- Signature Development: Use PCAP files to create attack signatures
- Behavioral Analysis: Study differences between legitimate and botnet traffic patterns
- Protocol Analysis: Examine protocol-level characteristics of attacks
- Real-Time Detection: Develop systems using flow-based features for online detection
Advantages for ML Research
- Both binary classification (benign vs attack) and multi-class (attack type) labels
- Rich feature set reducing preprocessing requirements
- Sufficient data volume for deep learning approaches
- Realistic class imbalance reflecting real networks
📊 View Data Structure
To explore column names, data types, and sample rows, visit the official dataset page on Kaggle.
Preview on Kaggle
Cite This Dataset
Vignesh Venkateswaran (2023). Bot-IoT Dataset - Large-Scale IoT Botnet Traffic with Full Packet Capture. [Dataset]. Kaggle. https://www.kaggle.com/datasets/vigneshvenkateswaran/bot-iot
Select your preferred citation style above. The citation will automatically update and you can copy it to your clipboard.
Original source: Kaggle (2023). Visit official page for more details.
Indexed by IoTDataset.com on Jan 23, 2026
Ready to Start Your Research?
Download this dataset directly from the official repository and start building your next breakthrough project.