The semiconductor industry relies heavily on silicon wafers as the foundation for integrated circuit manufacturing, where even minor surface or structural defects can cause substantial yield losses and financial implications. Traditional inspection methods, such as rule-based image processing and manual visual inspection, are labour intensive, prone to human error, and lack adaptability to diverse defect patterns. This study proposes an advanced deep learning–based framework for automated wafer defect classification, leveraging state-of-the-art convolutional neural network (CNN) architectures and generative data augmentation techniques to address class imbalance and enhance classification performance. Using the publicly available WM-811K wafer map dataset comprising 811,457 wafer maps from 46,393 semiconductor lots, nine original defect categories Center, Donut, Edge Loc, Edge Ring, Loc, Random, Scratch, Near-Full, and None were consolidated into four primary classes: Redundant, Crystal, Mechanical, and Defect-Free, simplifying classification and reducing inter-class confusion. The research was conducted in two phases. The first phase evaluated baseline models: WDD-Net, MobileNet-V2, and VGG-16. Results indicated VGG-16 outperformed other baselines with an accuracy of 80%. Building on this, the second phase explored deeper architectures VGG-19 and GoogleNet alongside Style GAN-generated synthetic images to counter dataset imbalance. VGG-19 demonstrated superior generalization and stability over VGG-16, while GoogleNet, utilizing inception modules, achieved competitive accuracy with lower computational complexity. Style GAN-based augmentation notably improved classification in underrepresented defect classes. Performance was measured using accuracy, precision, recall, and F1-score, along with detailed confusion matrix analysis. Among the tested models, MobileNet-V2 achieved the highest accuracy at 92.42% and recall at 92.41%, indicating strong positive-class detection capabilities. GoogleNet and VGG-19 offered balanced performance, with F1-scores of 90.66% and 90.41%, respectively, reflecting robust generalization. WDD-Net, while computationally efficient, suffered from significant overfitting, yielding the lowest accuracy (79.90%).
Single Article