Hey, I am Sourabh Zanwar

A Data Scientist based in Germany, with a Master's degree in Computer Science and extensive experience in NLP, ML, Data Analysis, and Cloud Computing, seeking to utilize my expertise in a growth-driven organization to drive product innovation, operational efficiency, and meaningful insights, ultimately making a positive impact in society.

About Me

Get to know me!

Hey! It's Sourabh Zanwar and I'm a Data Scientist based in Germany. With a Master’s degree in Computer Science from RWTH Aachen University, I am a seasoned Machine Learning and NLP enthusiast who thrives in a research-driven and collaborative environment. My experience spans from the conception to the deployment of intelligent systems. As an innovative and solutions-oriented Computer Science professional with extensive experience in NLP, ML, Data Analysis, and Cloud Computing, I seek to employ my expertise in a growth-driven organization to create a significant impact. I aim to contribute through cutting-edge technology, research, and collaboration to drive product innovation, operational efficiency, and meaningful insights, ultimately enhancing organizational goals and making a positive difference in society. You can contact me at sourabh.zanwar@rwth-aachen.de.

My Skills

Python
LLMs
Generative AI
Databricks
PyTorch
Tensorflow
Machine Learning
Deep Learning
NLP
Computer Vision
MLOps
Docker
AWS
GCP
BigQuery
Data Visualization
SQL
Tableau
Web Development
Javascript
HTML-CSS
A/B testing
Git
SpaCy

Work Experience

Machine Learning Reply GmbH, Munich, Germany

Data Science Consultant

October 2023 - Current

  • - Worked on multiple projects in various fields such as Data Engineering, NLP, Generative AI, etc.
  • - For one of the biggest DIY companies in europe, I supported building and migrating previous legacy data pipelines to more efficient and modular ETL transformations using databricks.
  • - As part of internal offerings and developments, I lead the creation of a Q&A tool that is powered by any LLM that one wants, locally or over the cloud service to answer questions based on the documents which could have the answers. This tool helped generate sales to 3 big telecom clients which further developed the service to use as a customer service agent. RAG technique was used here. This tool could be easily self hosted and used on any system.
  • - Apart from technical tasks, I also worked on preparing and presenting sales pitches to multiple companies, I also worked on teaching over 30 participants in a lab camp for prompt engineering
  • - I was also part of the trend scouting team where we tried to analyse upcoming trends in tech and were able to successfully identify and present trends such as RAG; LLM Agents, Platform engineering, to the company, which helped us stay up to date on our tech stack and implement these trends early on.

RWTH Aachen University, Aachen, Germany

Research Assistant, NLP and ML

June 2020 - May 2023

  • - Trained prediction models using advanced approaches such as RNNs, Information infusion, and Multi-task models, utilising features extracted from the ATS and pretrained language models like BERT and RoBERTa.
  • - Collaborated with psychiatrist and psychologist to build a large dataset for mental health patients, processed and stored the data and built classification models for early detection of mental health conditions like Alzheimers.
  • - Executed various projects like building predictive models and dashboards using technologies such as Python, Javascript, AWS, REST API, Pytorch, SpaCy, etc.
  • - Published and presented research at prestigious conferences like ACL, EACL, COLING, LREC, EMNLP, etc.
  • - Supervised five undergraduate students from the U.S. in their research projects under the UROP program for tasks such as Ideology detection, Author Profiling, Style Change Detection, and abuse detection using ML.
  • Zeugnis anzeigen

BMW Group, Munich, Germany

Machine Learning Intern

October 2021 - April 2022

  • - Developed a triplet network-based application for efficient detection of duplicate defect tickets using Python (PyTorch).
  • - Employed various Machine Learning techniques, including dimensionality reduction for feature selection from tabular data and feature extraction using DistilBERT for textual multilingual data.
  • - Collaborated with a multidisciplinary team to integrate the developed model into a web application dashboard.
  • - Presented and elucidated the implemented techniques and results to technical and non-technical stakeholders.
  • Zeugnis anzeigen

Renwegreen Solar Energy, Remote

Machine Learning Engineer

May 2019 - June 2020

  • - Developed a predictive maintenance model using Python (PyTorch) to identify potential failures in solar energy equipment, resulting in a 25% reduction in maintenance costs.
  • - Integrated machine learning models into a real-time monitoring system and dashboard, enabling proactive identification of anomalies and immediate action to prevent equipment failures.
  • - Implemented automated model retraining and deployment schedules based on predefined triggers, such as data drift or performance degradation.

Sun Computers, Pune, India

Software Engineer

June 2018 - February 2019

  • - Developed and maintained Python-based web applications for the company's clients, ensuring high-quality code and adherence to project requirements.
  • - Collaborated with cross-functional teams to design and implement scalable and secure cloud-based solutions, leveraging services such as AWS.
  • - Conducted research and implemented programming best practices and standards, incorporating programming references and documentation to enhance code quality and maintainability.

Education

RWTH Aachen University

M.Sc. Computer Science

April 2019 - August 2023

  • Relevant Courses
      Machine Learning
      Data Science
      Artificial Intelligence
      Computer Vision
      Text Mining
      Business Process Intelligence
      Social Computing
      Automatic Speech Recognition
      ML applications in Process Mining
      Empirical research methods and experiment design
      Applied Statistics and Stochastics
  • Abschlussarbeit: Analysis of the wikipedia talk pages
    • - Develop and evaluate machine learning models for sentiment analysis of Wikipedia talk pages, utilizing strategies such as lexicons, neural networks, and transfer learning.
    • - Incorporate network analysis and clustering techniques to enhance understanding of sentiment dynamics in Wikipedia talk pages, contributing to the advancement of sentiment analysis research.
    • - Utilize a range of evaluation strategies, including clustering evaluation metrics and unsupervised learning techniques, and employ crowdsourcing for human annotations to build accurate models and provide realistic evaluation of predicted sentiment.

University of Pune

B.Engineering Computer Engineering

August 2014 - June 2018

  • Relevant Courses
      Data structures
      Design and analysis of algorithms
      Artificial Intelligence
      Data management systems and application
      business analytics and intelligence
      Business Process Intelligence
      Data mining techniques and applications
      Software Engineering
      Software Design Methodology and Testing
      Theory of Computation
  • Abschlussarbeit: Cloud Cost Analyser and Optimiser
    • - Developed and implemented a monitoring scheme using CloudWatch API for VMs (EC2 / Elastic Compute) on private clouds like AWS or GCP, aimed at reducing infrastructure costs from the customer's perspective.
    • - Contributed to the optimization of cloud infrastructure by identifying underutilized resources, proposing cost-saving measures, and ensuring efficient allocation of cloud resources for maximum profitability.
    • - Published a paper Cloud Cost Analyser and Optimizer in IRJET, 2018

ANN Junior College, Jaysingpur

High School Certificate (12th Class)

May 2012 - June 2014

  • Focus courses
      Mathematics
      Physics
      Chemistry
      Electronics

Latest Publications

What to Fuse and How to Fuse: Exploring Emotion and Personality Fusion Strategies for Explainable Mental Disorder Detection, ACL 2023, July 2023

View

SMHD-GER: A Large-Scale Benchmark Dataset for Automatic Mental Health Detection from Social Media in German, EACL 2023, May 2023

View

Improving the Generalizability of Text-Based Emotion Detection by Leveraging Transformers with Psycholinguistic Features, NLP+CSS 2022, November 2022

View

Certifications & Projects

Certification: Coursera Deep Learning Specialization

Topics: Neural Networks and Deep Learning; Improving Deep Neural Networks: Hyperparameter tuning, Regularization, and Optimization; Structuring Machine Learning Projects; Convolutional Neural Networks and Sequence Models

Certification: Google Data Analytics

Topics: Practices and processes of a data analyst; Key analytical skills like data cleaning, analysis and visualisation and tools like SQL, and Tableau; Analyse data using spreadsheets, SQL and R programming and visualise and present findings in dashboards, presentations and visualisation platforms.

Certification: MLOps Specialization

Topics: Design end-to-end ML Production system; project scoping; establish model baselines, address concept drift; develop, deploy and continuously improve a productionized ML app, apply best practices and progressive delivery techniques to maintain and monitor continuously operating production system

Project: Mental health detection in Social Media posts in English and German

Built explainable AI models for detection of 5 mental health conditions from social media, used techniques such as attention based importance scores and SP-LIME for explainability of Deep Learning models. Created, processed and managed large datasets (>2M data points) for this project. Trained information infused models with additional information in the form of personality and emotion of the text.

Project: Vehicle fleet health detection using Computer Vision

Built a dashboard to keep a track on a fleet of vehicles and external damage caused to them using Computer Vision ( Pytorch, Detectron 2 algorithm, self-curated dataset). Fine-tuned the Detectron 2 model on COCO dataset along with additional self curated dataset. Built a frontend using React and python backend for prediction and tracking.

Project: Credit Risk Prediction Using MLOps

Collected and preprocessed credit risk data using web scraping tools, implemented feature engineering, and developed predictive machine learning models. Evaluated model performance using precision, recall, and AUC-ROC metrics, and applied hyperparameter tuning to achieve optimal results. Deployed the optimized model into a scalable production environment using Docker and Kubernetes, implemented Jenkins-based CI/CD pipelines for integration and deployment, and monitored model performance using MLflow, incorporating automated retraining to keep the model updated.

Project: Personality prediction from text and speech

Built personality prediction models from crowd sourced datasets (speech and text) and its various features such as fluency, hesitation words and over 400 linguistic markers. Built hybrid models to benefit from LLMs like BERT along with these features.