Bosch Production Line Performance — Assembly Line Fault Detection [1.18M Parts]
Abstract
"One of Kaggle's largest IIoT manufacturing datasets with 1.18 million parts measured across Bosch's assembly lines. Thousands of anonymized sensor features split across numeric, categorical, and date files. CSV format. Used for quality control and failure prediction."
Description
Overview
The Bosch Production Line Performance dataset was released as part of a Kaggle competition hosted by Bosch, one of the world's leading manufacturing companies. It captures measurements from internal production line sensors tracking each part as it moves through multiple stations and lines at a real manufacturing facility.
The dataset is one of the largest ever hosted on Kaggle in terms of feature count, with thousands of anonymized sensor measurements per part. Data are split across three file types: numeric measurement features, categorical features, and date/timestamp features — reflecting the multi-dimensional nature of industrial IoT sensor networks in modern assembly lines.
The goal is to predict which parts will fail quality control (binary Response label). Its scale, real-world provenance, and manufacturing context make it a valuable benchmark for IIoT quality assurance, failure mode detection, and industrial machine learning at scale.
Column Schema
| Column | Description |
|---|---|
| Id | Unique part identifier. |
| L[line]_S[station]_F[feature] | Anonymized sensor measurement — naming encodes production line, station, and feature number (e.g., L3_S36_F3939). |
| Response | Binary quality control outcome: 0 = pass, 1 = fail. |
| Date columns | Timestamps indicating when each part passed through each station. |
| Categorical columns | Categorical sensor measurements per station. |
Key Statistics
- Total Parts: approximately 1,183,747
- Features: thousands of anonymized sensor columns across 3 file types
- File Format: CSV (split into numeric, categorical, and date files)
- File Size: one of the largest Kaggle datasets by feature count
- Production Lines: multiple lines and stations represented
- Time Period: Released 2016
Use Cases
- Industrial quality control and defect prediction at scale
- Sparse, high-dimensional sensor data processing techniques
- Gradient boosting and ensemble methods for manufacturing IIoT analytics
- Feature selection and dimensionality reduction for large-scale sensor networks
Source & Attribution
The dataset was provided by Bosch and released via a Kaggle competition in 2016. It represents real production line measurements from one of the world's largest industrial manufacturing companies. Access requires a Kaggle account.
Data Preview
| Id | L0_S0_F0 | L0_S0_F2 | L3_S36_F3939 | Response |
|---|---|---|---|---|
| 4 | 0.0300 | -0.0340 | NaN | 0 |
| 6 | NaN | NaN | 0.0714 | 0 |
| 7 | NaN | -0.0374 | NaN | 0 |
| 9 | 0.0590 | NaN | NaN | 1 |
| 11 | 0.0412 | -0.0211 | 0.0601 | 0 |
Showing first few rows for preview
Cite This Dataset
Meg Risdal, Prasanth, RumiGhosh, soundar, Stefanie W., & Will Cukierski (2016). Bosch Production Line Performance — Assembly Line Fault Detection [1.18M Parts]. [Dataset]. Kaggle. https://www.kaggle.com/c/bosch-production-line-performance/data
Source: Kaggle (2016)
Indexed by IoTDataset.com on Apr 10, 2026
Ready to Start Your Research?
Download this dataset directly from the official repository and start building your next breakthrough project.