Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

.pro-gallery-wix-wrapper {display: block !important;} .pro-gallery-wix-wrapper .gallery-item-container {opacity: 1 !important; display: block !important;}

EmoVision: Real-Time Facial Emotion Detection with Transformers

Project type

Deep Learning, Computer Vision

📹 EmoVision is a real-time facial emotion detection system utilizing Vision Transformer (ViT) architecture, capable of classifying emotions from both webcam feeds and uploaded images, trained on the FER-2013 dataset.

📹 I began with a pretrained ViT model (ImageNet 21K weights) and performed transfer learning by unfreezing the head, training for 20 epochs. This achieved a baseline validation accuracy of 48.69% on FER-2013, a challenging dataset with diverse real-world expressions.

📹 To improve performance, I fine-tuned the model by unfreezing 4 additional layers, adapting it to the facial emotion domain for 10 more epochs. This boosted the validation accuracy to 64.15%, competitive with state-of-the-art results (66–79% on PapersWithCode SOTA leaderboard), while maintaining strong generalization on unseen data, as demonstrated by successful real-time webcam detection and image uploads.

📹 Unlike traditional CNNs, which focus on local patches and may miss broader context, ViT uses self-attention mechanism to analyze the entire face at once, capturing global relationships between features like eyes and mouth for robust emotion detection.

Tech Stack:
- Python, PyTorch, Deep Learning, OpenCV, MediaPipe
- Techniques: Vision Transformer (ViT), Transfer Learning, Fine-tuning, Model Training

.pro-gallery-wix-wrapper {display: block !important;} .pro-gallery-wix-wrapper .gallery-item-container {opacity: 1 !important; display: block !important;}