Skip to main content
Kaggle

Bosch Production Line Performance — Assembly Line Fault Detection [1.18M Parts]

Industrial IoT
1 views
2 min read

Abstract

"One of Kaggle's largest IIoT manufacturing datasets with 1.18 million parts measured across Bosch's assembly lines. Thousands of anonymized sensor features split across numeric, categorical, and date files. CSV format. Used for quality control and failure prediction."

Description

Overview

The Bosch Production Line Performance dataset was released as part of a Kaggle competition hosted by Bosch, one of the world's leading manufacturing companies. It captures measurements from internal production line sensors tracking each part as it moves through multiple stations and lines at a real manufacturing facility.

The dataset is one of the largest ever hosted on Kaggle in terms of feature count, with thousands of anonymized sensor measurements per part. Data are split across three file types: numeric measurement features, categorical features, and date/timestamp features — reflecting the multi-dimensional nature of industrial IoT sensor networks in modern assembly lines.

The goal is to predict which parts will fail quality control (binary Response label). Its scale, real-world provenance, and manufacturing context make it a valuable benchmark for IIoT quality assurance, failure mode detection, and industrial machine learning at scale.

Column Schema

ColumnDescription
IdUnique part identifier.
L[line]_S[station]_F[feature]Anonymized sensor measurement — naming encodes production line, station, and feature number (e.g., L3_S36_F3939).
ResponseBinary quality control outcome: 0 = pass, 1 = fail.
Date columnsTimestamps indicating when each part passed through each station.
Categorical columnsCategorical sensor measurements per station.

Key Statistics

  • Total Parts: approximately 1,183,747
  • Features: thousands of anonymized sensor columns across 3 file types
  • File Format: CSV (split into numeric, categorical, and date files)
  • File Size: one of the largest Kaggle datasets by feature count
  • Production Lines: multiple lines and stations represented
  • Time Period: Released 2016

Use Cases

  • Industrial quality control and defect prediction at scale
  • Sparse, high-dimensional sensor data processing techniques
  • Gradient boosting and ensemble methods for manufacturing IIoT analytics
  • Feature selection and dimensionality reduction for large-scale sensor networks

Source & Attribution

The dataset was provided by Bosch and released via a Kaggle competition in 2016. It represents real production line measurements from one of the world's largest industrial manufacturing companies. Access requires a Kaggle account.

Data Preview

IdL0_S0_F0L0_S0_F2L3_S36_F3939Response
40.0300-0.0340NaN0
6NaNNaN0.07140
7NaN-0.0374NaN0
90.0590NaNNaN1
110.0412-0.02110.06010

Showing first few rows for preview

Cite This Dataset

Meg Risdal, Prasanth, RumiGhosh, soundar, Stefanie W., & Will Cukierski (2016). Bosch Production Line Performance — Assembly Line Fault Detection [1.18M Parts]. [Dataset]. Kaggle. https://www.kaggle.com/c/bosch-production-line-performance/data

Source: Kaggle (2016)

Indexed by IoTDataset.com on Apr 10, 2026

Ready to Start Your Research?

Download this dataset directly from the official repository and start building your next breakthrough project.

Download Dataset

Related Topics & Keywords

Share This Research

More in Industrial IoT

View All
Industrial IoT Government

NASA C-MAPSS Turbofan Engine Degradation — 4 Sub-datasets, 21 Sensors [Run-to-Failure]

NASA Prognostics Center run-to-failure simulation dataset for turbofan engines. Four operational sub-datasets with 21 sensor channels and 3 operational settings. TXT/CSV format. Primary benchmark for Remaining Useful Life (RUL) estimation.

Apr 10, 2026
Industrial IoT University

CWRU Bearing Fault Dataset — 2HP Motor Vibration, 4 Fault Diameters [12k & 48k Hz]

Benchmark bearing vibration dataset from Case Western Reserve University with drive-end and fan-end faults at 4 severity levels. Sampled at 12 kHz and 48 kHz. MATLAB MAT and CSV formats. Used for fault diagnosis and vibration-based condition monitoring.

Apr 10, 2026
Industrial IoT UCI

AI4I 2020 Predictive Maintenance — Milling Machine Sensor Failures [10,000 Records]

Synthetic IIoT dataset reflecting real milling machine predictive maintenance scenarios. 10,000 records with 14 features including air temperature, process temperature, rotational speed, torque, and 5 labeled failure types. CSV format. Ideal for multi-label fault classification.

Apr 10, 2026
Industrial IoT Research Paper

IIoT Metalworking Fluid Degradation — Real-World Physicochemical Sensor Monitoring [Multi-Month]

Real-world IIoT multivariate time series dataset tracking physicochemical degradation of metalworking fluid over several months. Includes imputed benchmark variants for 5 methods. CSV format. Designed for predictive maintenance and anomaly detection research in manufacturing.

Apr 10, 2026