Project Summary

Project objective: classify steel surface defects from annotated NEU-DET images and export structured evaluation artifacts for downstream manufacturing analysis.
Modeling approach: convert Pascal VOC annotations to defect chips, fine tune a ResNet-18 classifier, and summarize performance with per class and confusion matrix outputs.
Key metrics: validation accuracy, macro and weighted F1, per class precision and recall, and normalized confusion patterns.
Limitations: the current workflow treats localization as given by annotations and evaluates at chip level rather than full image detection; hyperparameter search is limited.
Possible extensions: add full object detection, class balancing experiments, confidence calibration, and deployment oriented inference packaging.

Results Summary

Class	Support	Precision	Recall	F1 score
Crazing	162	0.8308	1.0000	0.9076
Inclusion	159	0.9813	0.9874	0.9843
Patches	193	0.9895	0.9793	0.9844
Pitted surface	87	0.9663	0.9885	0.9773
Rolled-in scale	132	0.9796	0.7273	0.8348
Scratches	121	1.0000	1.0000	1.0000

True class	Main confusion
Rolled-in scale	32 validation chips predicted as crazing
Inclusion	Minor confusion with crazing and patches
Patches	Minor confusion with inclusion and rolled-in scale

Interpretation:

Performance is strong overall, but rolled-in scale is the primary recall bottleneck.
The confusion pattern suggests texture overlap between rolled-in scale and crazing at the chip level.