Nakul Sharma

I'm building cool stuff! Previously, I was a Research Engineer at SpreeAI, where I worked on developing new training schemes and architectures for virtual try-on using diffusion and flow-based models in close collaboration with Dr. Aayush Bansal and Dr. Minh Vo. Prior to that, I completed my Bachelor's in Technology in AI and Data Science at IIT Jodhpur in 2024. At IIT Jodhpur, I was fortunate to be a part of Prof. Anand Mishra's Vision, Language and Learning Group (VL2G), where I worked on 2D generative models, representation learning and multi-modal LLMs.

Email  /  CV  /  Google Scholar  /  GitHub

profile photo

Research

I'm broadly interested in improving representation learning for various downstream tasks, and multi-modal generative modeling.

Efficient Long-Tail Learning in Latent Space by sampling Synthetic Data
Nakul Sharma
ICCV 2025 Workshop on Curated Data and Efficient Learning
paper

This preliminary study introduces a lighweight way to utilize strong semantic representations of visual foundation models for long-tail recognition.

PatentMME PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
Shreya Shukla*, Nakul Sharma*, Manish Gupta, Anand Mishra
Proceedings of AAAI, 2025
(*: equal contribution)
project page / arXiv / slides / code & dataset

We curate the first large-scale dataset for the task; current models perform poorly due to their visual representations and therefore, we propose to train a weakly supervised visual encoder, PatentMME, using which PatentLMM surpasses GPT-4V significantly.

Sketch-guided image inpainting Sketch-guided Image Inpainting with Partial Discrete Diffusion Process
Nakul Sharma, Aditay Tripathi, Anirban Chakraborty, Anand Mishra
Proceedings of CVPR Workshops, 2024
arXiv / slides / code

We propose Partial Discrete Diffusion Process for sketch-guided inpainting, wherein only regions of interest are corrupted during the forward process — aligning the forward process and the reverse process for inpainting.

clean-usnob Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred Thousand-Scale One-Shot Logo Identification
Nakul Sharma, Abhirama Subramanyam Penamakuri, Anand Mishra
Proceedings of ICVGIP, 2022
project page / arXiv / slides / code / data

We study the problem of identifying logos in an open-set setting and propose a to encode textual and visual information from logos, along with a better contrastive loss.

Miscellanea

Academic Service

Reviewer for CVPR: 2025
Reviewer for ACL ARR: 2025, 2024

Teaching

Teaching Assistant for CSL 2010: Introduction to Machine Learning, taught by Prof. Yashaswi Verma at IIT Jodhpur, Fall 2022.



This website template was ethically stolen from Jon Barron.