May 2025 • Machine Learning
Comparative analysis of three neural network architectures for image classification on CIFAR-10: baseline CNN, DenseNet, and a pre-trained Vision Transformer (ViT). The primary goal was to demonstrate the effectiveness of parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA) on state-of-the-art vision transformers while comparing performance against traditional convolutional approaches.
Framework: PyTorch, Torchvision
Fine-Tuning: PEFT library with LoRA
Hardware: NVIDIA Tesla V100 GPU with CUDA
Fine-tuned DINOv2-Small ViT using LoRA, updating only 8.06% of the model's 22.06M parameters. Leveraged pre-trained weights from self-supervised learning and applied parameter-efficient fine-tuning to dramatically reduce training time and computational costs while maintaining high accuracy. Implemented systematic comparison across three architectures with comprehensive benchmarking on CIFAR-10.
| Model | Test Accuracy | Test Loss | Total Parameters | Fine-Tuned Params |
|---|---|---|---|---|
| Shallow CNN | 83.46% | 0.5299 | ~4.8M | 100% |
| DenseNet | 91.22% | 0.3692 | ~7.4M | 100% |
| DINOv2-Small (ViT) | 95.95% | 0.1243 | 22.06M | 8.06% |
Achieved 95.95% test accuracy using parameter-efficient LoRA fine-tuning, demonstrating the superiority of modern transformer architectures and transfer learning for image classification tasks.