Distributed Denial of Service (DDoS) attacks remain one of the most catastrophic threats within the digital flow of life. They continue to grow in both scale and sophistication within the cybercriminal toolbox. Regardless of whether the DDoS attacks are carried out by cybercriminals, cyberterrorists, or other malicious actors, their impact remains significant. DDoS attacks destroy the reliability and availability of online services, of which IoTs and cloud infrastructures are subject to much greater risk than previous methods. A hybrid deep learning method is proposed in this study that implements a Vision Transformer (ViT) combined with Long Short-Term Memory (LSTM) networks that can learn spatial representations from the ViT and temporal patterns from the LSTM for the purpose of being able to cope with the numerous IoT cyber risks that continue to surface within the modern online environment. Solutions to class imbalance are used by the SMOTE Borderline technique to improve the occurrence of the minority class and therefore improve the robustness of classification. The model is evaluated against the CICIDS2017 dataset, a realistic benchmark for benign and malicious attacks. The experimental results demonstrate the ViT+LSTM framework achieves a very high accuracy of 99.78%, providing robust resiliency from the class imbalance of the data.