Samhita Kolluri

Let's create something and scroll down to know more about me.

About Me

A graduate student at Northeastern University specializing in Data Science, Machine Learning, and AI, with a deep focus on Large Language Models (LLMs) and cutting-edge generative AI techniques. Currently, I am working as an AI Research Engineer at Stellis Labs and serve as a Teaching Assistant at Northeastern University.

With over 2-3 years of professional experience as a Data Engineer at Cognizant, I optimized complex property and casualty insurance data pipelines, transforming large-scale data into actionable insights. My passion lies in harnessing AI to solve real-world challenges and building innovative solutions that make a difference.

I’m eager to connect with like-minded innovators and explore opportunities to push the boundaries of technology. Feel free to reach out let’s collaborate and create something!


Contact Details

Samhita Kolluri
Boston, MA
samhita.kolluri@gmail.com

Skills

I have experimented a lot with the aim of finding my niche, and have thus gained varied skills along the way. Some of my major skills are highlighted below.

Python
R
SQL
MySQL
Fine-tuning LLMs
Research
Data Analysis
Data Visualization
Informatica PowerCenter
IBM Tivoli
DBT
Databricks
Deep Learning
Image Processing
Computer Vision
OpenCV
Regression
Classification
Clustering
Random Forest
Naïve Bayes
Support Vector Machine
CNN
LSTM
Multi-Agents
Prompt Engineering
Pre-trained Transformers
BERT
Reinforcement Learning
Natural Language Processing
Vector Databases
Tableau
PowerBI
Snowflake

Work Experience

AI Research Engineer

Stellis Labs - Bear Brown and Company
Jan 2025 - May 2025

  • Developing multi-agent AI systems under the guidance of Prof. Nik brown for autonomous decision-making.
  • Building memory layers, observability tools, and API integrations to enhance agent intelligence.
  • Deploying and fine-tuning LLM-powered agents using LangChain, OpenAI APIs, and vector databases.

Graduate Teaching Assistant

Northeastern University - Graduate Courses
December 2024 - May 2025

  • Coordinate course materials and schedules for IE 5374 Storytelling with Data, Applied Gen AI under the guidance of Prof. Mohammad Dehghani , ensuring efficient class operations and timely resource dissemination.
  • Design and conduct interactive lab sessions to enhance student proficiency in data analysis, dynamic visualizations.
  • Facilitate student learning by assisting with tools such as Power BI, and Excel for data wrangling and interactive dashboards.

Senior Data Engineer

Cognizant - Artificial Intelligence and Analytics
August 2022 - July 2023

  • Achieved a 99% acceleration in ETL reconciliation and optimized data processing efficiency.
  • Implemented audit mails in QA and Production environments, leading to a 25% increase in system efficacy.
  • Delivered data warehousing solutions with Informatica ETL processes, supporting data integration and improving reporting accuracy by 20%.
  • Designed Informatica code, developed SQL queries, and modified mappings and workflows using Informatica PowerCenter, improving system performance by 25%.
  • Streamlined code migration between QA and Production environments by implementing a backup system, reducing deployment time by 40%.
  • Led debugging and issue management using Agile methodology and JIRA, ensuring timely resolution and high-quality delivery.

ETL Developer

Cognizant - Artificial Intelligence and Analytics
August 2021 - August 2022

  • Optimized ETL processes with Informatica PowerCenter mappings, achieving a 27% boost in data transformation efficiency and a 25% enhancement in documentation accuracy.
  • Reduced errors by 20% and improved testing efficiency by 15% through optimized workflows and rigorous unit testing.
  • Transformed multiple SQL files into PySpark for sample data modeling and debugged code on the Databricks community.

Research Experience

Publications and Patent

December 2022

  • December 2022 - An Artificial Intelligence and Internet of Things based Integrated Approach for COVID-19 Prevention, Application Num: 202141054101.
  • September 2022 - AI-based Screening System for COVID-19, IEEE 7th International conference for Convergence in Technology (I2CT) Paper Link
  • September 2021 - Brain-Computer Interface paper presented at Annual Technical Symposium, India

Research Intern

Bennett University
March 2020 - June 2020

  • Managed a cross-functional team throughout the Automated Sign Language Recognition project's lifecycle, fostering effective collaboration and developing a neural networks model resulting in 98.56% validation accuracy. GitHub Link
  • Built a CNN-based facial expression recognition system using the Kirsch compass mask operator for preprocessing.

Education

Northeastern University

Master of Science in Data Analytics Engineering
December 2025

College of Engineering
Related Coursework: Gen AI w/ LLM in Data Engineering, Natural Language Processing, Special Topic: Large-Language-Model based Dialogue Agents, Data Mining in Engineering, Data Management for Analytics, Computation and Visualization for Analytics, Foundations for Data Analytics Engineering

VNR Vignana Jyothi Institute of Engineering and Technology

Bachelor of Technology in Computer Science and Engineering
July 2021

Related Coursework: Business Economics & financial Analysis, Entrepreneurship, Artificial Intelligence & Neural Network, Computer Graphics and Animation, Introduction to Internet of Things, Cognitive Science, Information Security Assessment and Audits, Cyber Security.

Applied Projects

PhysioPro

PhysioPro

Generative AI, Snowflake Cloud, OpenCV, Cortex, Data Engineering, Large Language Models (LLM)

  • Developed a physiotherapy assistance tool that improved movement correction accuracy by 30% using Procrustes analysis and Dynamic Time Warping (DTW) for keypoint alignment.
  • Developed a 3D motion pipeline with MediaPipe and OpenCV, cutting feedback cycles by 50%.
  • Integrated AI recommendations with Cortex Mistral 7B and Snowflake, boosting adherence by 25%.

View on GitHub Read on Medium

contrastive idea search module

contrastive idea search module

Generative AI, Chroma DB, LLaMA, Hugging face, Fine Tuning, Agent-based Modeling

  • Designed and Developed a Contrastive Ideas Search Module to identify semantically opposing viewpoints on shared topics.
  • Built a hybrid embedding framework leveraging mxbai-embed-large and a fine-tuned LLaMA 3 model trained on SNLI for semantic opposition detection.
  • Implemented multi-vector indexing with ChromaDB, enabling efficient retrieval and ranking.

View on GitHub Read on Medium

SEMANTIC

SEMANTIC

Python, Deep Learning, PyTorch, NLP, Transformers (BERT)

  • Developed a multi-class classification system "SEMANTIC" to identify and categorize news articles using an IAB-labelled dataset from Hugging Face.
  • Fine-tuned transformer models for efficient, context-aware classification in imbalanced data scenarios.

View on GitHub

SysTune:LLM-Based Hardware-Software Parameter Optimization

SysTune: LLM-based Hardware-Software

Python, OpenAI GPT-4, HPC

  • Developed an LLM-based autotuning system to optimize hardware and software parameters iteratively, enhancing resource utilization and throughput for high-performance computing by 30%.
  • Built a Prompt Generator for context-aware LLM prompts, enhancing parameter suggestion quality.
  • Designed an Option Evaluator to parse LLM responses and extract optimal parameters.

View on GitHub

Thrift Store Inventory Management System

Thrift Store Inventory Management System

Python, R, SQL/NoSQL, Machine Learning

Designed a full-stack system for inventory tracking, donation management, and customer loyalty programs, using dynamic pricing and customer analytics to increase revenue and efficiency by 35%.

Data Unveiled: NYC Airbnb Analysis

Data Unveiled: NYC Airbnb Analysis

Python, Data Analysis, Machine Learning, Visualization, XGBoost/Random Forest

Performed predictive modeling and interactive geo-mapping of NYC Airbnb data to identify price-driving factors, improve booking rates, and empower hosts with data-driven insights.