Nakul Sharma

I'm building cool stuff! Previously, I was a Research Engineer at SpreeAI, where I worked on developing new training schemes and architectures for virtual try-on using diffusion and flow-based models in close collaboration with Dr. Aayush Bansal and Dr. Minh Vo. Prior to that, I completed my Bachelor's in Technology in AI and Data Science at IIT Jodhpur in 2024. At IIT Jodhpur, I was fortunate to be a part of Prof. Anand Mishra's Vision, Language and Learning Group (VL2G), where I worked on 2D generative models, representation learning and multi-modal LLMs.

Email / CV / Google Scholar / GitHub

Research

I'm broadly interested in improving representation learning for various downstream tasks, and multi-modal generative modeling.

	Efficient Long-Tail Learning in Latent Space by sampling Synthetic Data Nakul Sharma ICCV 2025 Workshop on Curated Data and Efficient Learning paper This preliminary study introduces a lighweight way to utilize strong semantic representations of visual foundation models for long-tail recognition.
	PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures Shreya Shukla, Nakul Sharma*, Manish Gupta, Anand Mishra Proceedings of AAAI, 2025 (: equal contribution) project page / arXiv / slides / code & dataset We curate the first large-scale dataset for the task; current models perform poorly due to their visual representations and therefore, we propose to train a weakly supervised visual encoder, PatentMME, using which PatentLMM surpasses GPT-4V significantly.
	Sketch-guided Image Inpainting with Partial Discrete Diffusion Process Nakul Sharma, Aditay Tripathi, Anirban Chakraborty, Anand Mishra Proceedings of CVPR Workshops, 2024 arXiv / slides / code We propose Partial Discrete Diffusion Process for sketch-guided inpainting, wherein only regions of interest are corrupted during the forward process — aligning the forward process and the reverse process for inpainting.
	Contrastive Multi-View Textual-Visual Encoding: Towards One Hundred Thousand-Scale One-Shot Logo Identification Nakul Sharma, Abhirama Subramanyam Penamakuri, Anand Mishra Proceedings of ICVGIP, 2022 project page / arXiv / slides / code / data We study the problem of identifying logos in an open-set setting and propose a to encode textual and visual information from logos, along with a better contrastive loss.

Miscellanea

Academic Service	Reviewer for CVPR: 2025 Reviewer for ACL ARR: 2025, 2024
Teaching	Teaching Assistant for CSL 2010: Introduction to Machine Learning, taught by Prof. Yashaswi Verma at IIT Jodhpur, Fall 2022.

This website template was ethically stolen from Jon Barron.

Research

Miscellanea

Academic Service

Teaching