Skip to main content
Kaggle

IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification (NAACL 2025)

Smart Home
Jan 31, 2026
46 views
License

Abstract

"Multimodal dataset combining Text-to-SQL natural language queries with IoT network traffic classification, featuring 10,985 SQL training examples and labeled network traffic (benign/malicious) from IoT-23 and Smart Building sensors for NLP and security research."

Description

Overview

The IoT-SQL Dataset was published in the TrustNLP Workshop (Fifth Workshop on Trustworthy Natural Language Processing) at NAACL 2025, designed to advance research at the intersection of natural language processing and IoT security.

Dataset Components

  • IoT Database: SQL schema and data from IoT-23 network logs and Smart Building Sensor datasets structured in a relational database format (iot_database.sql.gz).
  • Text-to-SQL Data (text-to-SQL-data.zip): 10,985 total examples split into training (6,591), validation (2,197), and test (2,197) sets with natural language questions paired with corresponding SQL queries.
  • Network Traffic Data (network_traffic_data.zip - 315 MB): Labeled IoT network traffic with features including timestamps, IP addresses, ports, protocols, byte counts, and connection history.

Text-to-SQL Features

  • Queries include joins, aggregations, temporal conditions, and nested clauses specifically designed for IoT security contexts.
  • Natural language questions covering topics such as device identification, anomaly detection, traffic pattern analysis, and threat assessment.
  • Designed to train and evaluate language models for generating SQL queries from natural language in cybersecurity domains.

Network Traffic Classification

  • Each record labeled as benign or malicious for binary classification tasks.
  • Malicious traffic includes DDoS attacks, Command & Control (C&C) communications, and botnet-related activities.
  • Feature set enables both traditional machine learning and deep learning approaches for intrusion detection.

Use Cases

  • Text-to-SQL research: Training and evaluating NLP models for database query generation in IoT security contexts.
  • IoT threat detection: Developing intrusion detection systems using network traffic features.
  • Multimodal learning: Combining structured database queries with network security classification in unified frameworks.
  • Trustworthy AI research: Evaluating reliability, explainability, and robustness of AI models in security-critical IoT applications.

📊 View Data Structure

To explore column names, data types, and sample rows, visit the official dataset page on Kaggle.

Preview on Kaggle

Cite This Dataset

Palvich, R., Ebadi, N., Tarbell, R., Linares, B., Tan, A., Humphreys, R., Das, J. K., Ghandiparsi, R., Haley, H., George, J., Slavin, R., Choo, K.-K. R., Dietrich, G., & Rios, A. (2025). IoT-SQL Dataset: A Benchmark for Text-to-SQL and IoT Threat Classification (NAACL 2025). [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.15000588

Select your preferred citation style above. The citation will automatically update and you can copy it to your clipboard.

Original source: Zenodo (2025). Visit official page for more details.

Indexed by IoTDataset.com on Jan 31, 2026

Ready to Start Your Research?

Download this dataset directly from the official repository and start building your next breakthrough project.

Download Dataset

Related Topics & Keywords

Share This Research

More in Smart Home

View All