UCI Air Quality — Italian City Chemical Multisensor Device [9,358 Hourly Records]
Abstract
"Longest freely available on-field IoT air quality sensor deployment: 9,358 hourly records from 5 metal oxide gas sensors in an Italian city. CSV/XLSX format. Used for gas sensor regression, drift correction, and pollution forecasting research."
Description
Overview
The UCI Air Quality dataset contains 9,358 instances of hourly averaged responses from an array of 5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device deployed at road level in a significantly polluted area of an Italian city. Data were recorded continuously from March 2004 to February 2005, making it the longest freely available recording of on-field deployed air quality chemical sensor device responses.
The sensors target carbon monoxide (CO), non-methanic hydrocarbons (NMHC), benzene (C₆H₆), nitrogen oxides (NOₓ), and nitrogen dioxide (NO₂). Co-located certified reference analyzer measurements are included for ground truth comparison, enabling cross-calibration and sensor drift correction studies. The dataset is a foundational IoT benchmark for chemical sensing, regression, and environmental data analytics.
Missing values are coded as -200 and require preprocessing. The dataset is widely used to benchmark regression models predicting pollutant concentrations from sensor responses, and to study sensor cross-sensitivity and temporal drift — critical challenges in long-term IoT environmental deployments.
Column Schema
| Column | Description |
|---|---|
| Date | Measurement date (DD/MM/YYYY). |
| Time | Measurement time (HH.MM.SS). |
| CO(GT) | True hourly averaged CO concentration in mg/m³ (reference analyzer). |
| PT08.S1(CO) | Tin oxide sensor hourly averaged response (CO-targeted). |
| NMHC(GT) | True hourly averaged NMHC concentration in µg/m³. |
| C6H6(GT) | True hourly averaged benzene concentration in µg/m³. |
| PT08.S2(NMHC) | Titania sensor response (NMHC-targeted). |
| NOx(GT) | True hourly averaged NOₓ concentration in ppb. |
| PT08.S3(NOx) | Tungsten oxide sensor response (NOₓ-targeted). |
| NO2(GT) | True hourly averaged NO₂ concentration in µg/m³. |
| PT08.S4(NO2) | Tungsten oxide sensor response (NO₂-targeted). |
| PT08.S5(O3) | Indium oxide sensor response (O₃-targeted). |
| T | Temperature in °C. |
| RH | Relative humidity (%). |
| AH | Absolute humidity. |
Key Statistics
- Total Records: 9,358 hourly instances
- Features: 15 columns (5 sensor responses + 5 reference values + temp, humidity, time)
- Missing Values: coded as -200 (requires preprocessing)
- File Format: CSV and XLSX
- File Size: 767 KB (CSV), 1.2 MB (XLSX)
- Time Period: March 2004 – February 2005
Use Cases
- Gas concentration regression modeling from low-cost IoT chemical sensors
- Sensor cross-sensitivity analysis and drift correction for environmental IoT
- Air pollution forecasting and anomaly detection in urban environments
- Benchmarking ML regression models (Random Forest, LSTM, XGBoost) on real sensor data
Source & Attribution
Donated to the UCI Machine Learning Repository in 2016 by Saverio De Vito (ENEA — National Agency for New Technologies, Energy and Sustainable Economic Development), based on the work published in Sensors and Actuators B: Chemical. It remains one of the most accessed environmental IoT datasets on UCI.
View Data Structure
To explore column names, data types, and sample rows, visit the official dataset page on UCI.
Preview on UCICite This Dataset
Vito, Saverio (2008). UCI Air Quality — Italian City Chemical Multisensor Device [9,358 Hourly Records]. [Dataset]. UCI. https://archive.ics.uci.edu/ml/datasets/air+quality
Source: UCI (2008)
Indexed by IoTDataset.com on Apr 17, 2026
Ready to Start Your Research?
Download this dataset directly from the official repository and start building your next breakthrough project.