Nityanand Mathur
Education
Indian Institute Of Information Technology Guwahati
B. Tech in Computer Science & Engineering
GPA: 8.16/10
Publications
CLIPDrawX: Primitive-based Explanations for Text Guided Sketch Synthesis
Nityanand Mathur, Shyam Marjit, Abhra Chaudhuri, Anjan Dutta
DiffuseKronA: Param. Efficient Finetuning Method for Personalized Diff'sn Models
Shyam Marjit, Harshit Singh, Nityanand Mathur, Sayak Paul, Chia-Mu Yu, Pin-Yu Chen
Work Experience
Smallest AI | Data Scientist
October 2024 - Present
- Developed a real-time, flow-based instant voice cloning model. Built professional voice cloning using LoRA.
- Optimized a TTS model, reducing latency from 300–400ms to 150–250ms, improving real-time performance.
- Implemented an end-to-end text normalization module to handle emails, websites, dates, numbers, etc.
Bosch Research | Machine Learning Intern
Jan 2024 - August 2024
- Created novel algorithms for synthetic dataset generation by object augmentation using latent diffusion models.
- Created parameter-efficient label preserving source-free domain adaptation and localised editing pipelines.
University of Surrey | Research Intern - Dr. Anjan Dutta
Jan 2023 - December 2023
- Worked on introducing explainability to CLIP-based models using simple primitives, with an LDM-powered initialization for faster convergence. Introduced Primitive-level Dropout for noiseless sketch synthesis.
IBM | Research Intern - Dr. Pin-Yu Chen
June 2023 - December 2023
- Worked on adding parameter-efficient Kronecker Product based adapters to personalized T2I models that are ~35% more efficient than SOTA, while generating images with high fidelity and text-alignment.
Osaka University | Data Science Research Intern | Dr. Manas Kala
July 2023 - Sept 2023
- Applied counterfactual machine learning to thermal comfort dataset to simulate the comfort of students in winter conditions to find the impact of clothing, age, grade and gender. — Work undergoing at univ.
CogXR Labs | Computer Vision, MLOps Intern
Nov 2022 - March 2023
- Implemented large-scale image classification algorithms on healthcare datasets with high accuracy.
- Created end-to-end production pipelines using docker containers, DVC, W&B and Pytest
IIIT Guwahati | Research Intern | Dr. Ferdous Ahmed Barbhuiya
May 2022 - August 2022
- Implemented VisualBERT-based multimodal hateful meme classification on social media.
- Integrated CLIP-based embeddings to improve accuracy from 75% to 81%.
Projects
Python, Computer Vision
- Created a powerful tool designed to explore and visualize DINOv2 embeddings. Given a list of folders containing images, DINO Explorer extracts their DINO embeddings and creates an interactive visualization using Voxel51.
Python, Pytorch, Segmentation, GANs, Neural Style Transfer
- Implemented U-net based image segmentation for humans & aerial image segmentation to detect roads in a map
- Implemented object localization for food items and neural style transfer using Efficient-net.
Python, Tensorflow, Computer Vision, Docker, DVC, W&B, Git
- Implemented a deep learning multi-label classifier to process an 11 GB medical image dataset, classifying X-ray images into 13 disease categories. Later, merged with the Classify-Covid project to add Covid.
Skills
Languages:
Python, SQL, Bash, C, Java, LaTeX
Frameworks/Libraries:
PyTorch, TensorFlow, Keras, Pandas, NumPy, Scikit Learn, and OpenCV
Tools:
Docker, Git/GitHub, Unix Shell, PyTest, Weights and Biases, DVC, Hydra.cc, Hugging Face, AWS, Gradio