My Intro

About Me

Hello! This is Shwetha Sivakumar. I am a Data Engineer with experience building end-to-end data pipelines across transit operations, IoT telemetry, and ML analytics. I have consistently improved system performance and dashboard responsiveness through data modeling, query optimization, and data quality automation. I am strong in Python, SQL, PySpark, Denodo, Informatica IICS, Snowflake, Azure Data Explorer, and Power BI, with AWS Solutions Architect and Denodo Developer certifications. I have partnered effectively with operations, and BI teams to translate technical insights into business decisions.


Experience

Aug 2025

Associate Data EngineerIQZ Systems, Atlanta, GA | Aug 2025 - Present

• Built and maintained enterprise Denodo data virtualization layer integrating 400K+ daily records from Amazon Redshift, SQL Server, Databricks, and MongoDB for a passenger rail network, enabling unified analytics across ridership, revenue, and operational performance.

• Designed and optimized Denodo base, derived, and summary views, applying cached views with scheduled refreshes (daily, monthly, yearly), cost-based optimization, and query tuning to improve Power BI dashboard load times by 30%.

• Developed Power BI dashboards consuming Denodo views for revenue, ridership, route performance, and ticket journey analytics; automated metadata propagation by modifying TMDL (.SemanticModel) files.

• Implemented RBAC and global security policies across Denodo and Power BI, managing multiple user roles and enabling GitHub-based version control for Fabric workspaces to support secure and reproducible BI development.

Aug 2024

Software DeveloperMyUI.AI (Clemson University), Clemson, SC | Aug 2024 - Dec 2024

• Built scalable PySpark pipelines on AWS S3 to process 75K monthly accessibility sessions for a B2B mobile/kiosk application, powering reinforcement learning models generating personalized UI adaptations for users with visual, motor, and cognitive impairments.

• Designed incremental data processing and partitioned data models (by event date and impairment type) enabling efficient trend analysis across accessibility metrics using Spark window functions.

• Optimized downstream analytics and model-training workloads via partition pruning, reducing feature read times by 40%.

• Orchestrated end-to-end Spark workflows using Apache Airflow, ensuring reliable daily execution, reproducibility, and backfill support.

Jun 2024

Data EngineerBPM BI, Tallahassee, FL | Jun 2024 - Aug 2024

• Built end-to-end Informatica IICS ETL pipelines ingesting 500K+ GPS, GTFS, and fare records daily into Snowflake.

• Engineered transformations in Informatica Mapping Designer, calculating route-level delay metrics and operational KPIs for 200+ transit routes; embedded data quality checks reducing reporting errors by 25%.

• Automated near real-time and batch orchestration with Taskflows, ensuring data availability within 5 minutes of ingestion.

• Enabled Power BI dashboards providing actionable insights on route delays, ridership trends, and revenue metrics.

Jan 2023

Data EngineerCaterpillar Inc., India | Jan 2023 - Jun 2023

• Designed 15+ real-time operational dashboards in Azure Data Explorer (ADX) to visualize telemetry from 500+ autonomous haul trucks, providing fleet health, utilization, and data quality insights.

• Built IoT telemetry ingestion pipelines into ADX by defining table schemas and JSON ingestion mappings to structure high-frequency GPS and operational data at scale.

• Developed data transformations using ADX update policies to convert epoch timestamps, normalize vehicle identifiers, and derive operational states; created materialized views that pre-aggregated metrics and improved dashboard performance by 40%.

• Implemented ingestion monitoring using KQL queries and Azure Monitor alert rules to detect missing or delayed telemetry, ensuring pipeline reliability.

May 2022

Machine Learning InternC3iHub, Indian Institute of Technology, Kanpur, India | May 2022 - Jul 2022

• Led a team of 3 to develop a cyber-attack detection system using ensemble and deep learning models, achieving 97% accuracy in classifying normal traffic, automated attacks, and manual intrusions.

• Built robust data preprocessing pipelines using Python and regex to parse Apache web logs into structured pandas DataFrames for threat analysis.

• Conducted global cyber-attack pattern analysis across 87 countries using protocol analysis, attack categorization, and subnet tracking.

• Implemented XGBoost and LSTM/GRU neural networks for feature engineering, model training, and evaluation; visualized insights using seaborn and Plotly.

Education

2023

Master of Science in Computer and Information SciencesClemson University, USA

Coursework: Deep Learning in Computer Vision, Applied Data Science, Cloud Computing, Statistics

2019

Bachelor of Technology in Computer Science EngineeringAmrita University, India

Coursework: Data Structures and Algorithms, Operating Systems, Computer Networks, Computer Architecture

Accomplishments

My Projects

Honeypot Weblogs Cyber Attack Detection, Classification and Analysis using Machine Learning

Machine Learning, DATA EXTRACTION, DATA CLEANING AND DATA ANALYSIS

Breast Cancer Survival Analysis by Clustering Gene Expression Data

MACHINE LEARNING, DATA VISUALIZATION, MODELLING AND DATA ANALYSIS

Digital Payment Wallet Mobile App

FLUTTER MOBILE APPLICATION

Habit Tracker Android App

ANDROID APPLICATION

Sentiment Analysis of IMDb Movie Reviews using NLP

DATA SCIENCE, MACHINE LEARNING (FEATURE ENGINEERING AND HYPOTHESIS TESTING)
Skills

My Skills

Programming & Processing: Python, SQL, PySpark, Java, Pandas, NumPy, scikit-learn, Bash

Data Engineering: JApache Airflow, dbt, Denodo, Informatica IICS, Data Modeling

Databases & Warehouses: Snowflake, Redshift, PostgreSQL, SQL Server, MongoDB, Firebase

Cloud & Infrastructure:AWS (Certified), Microsoft Azure, GCP, Docker, Git

Visualization & BI:Power BI (DAX, TMDL), KQL, Matplotlib, Seaborn, Plotly

Tools & Technologies:Jira, Postman, Agile/Scrum, REST APIs, Unit Testing

SQL

Enterprise-Ready

Python

Enterprise-Ready

Apache Spark / PySpark

Advanced

Snowflake

Advanced

ETL / Informatica IICS

Proficient

AWS

Proficient

Denodo

Enterprise-Ready

Apache Airflow

Proficient

Power BI

Proficient
PAPERS PUBLISHED

My Published Papers in International Journals and Conferences

CERTIFICATIONS AND APPRENTICESHIPS

My Certifications