Hello! I’m Sreenath Vemireddy.
I am a Senior Big Data and Azure Data Engineer with over 10 years of progressive experience in designing, building, and managing end-to-end data engineering solutions across diverse industries including finance, healthcare, insurance, banking, and geospatial mapping. My core strength lies in building scalable, high-performance data pipelines using cutting-edge technologies like Hadoop, Spark, PySpark, and Hive, and integrating them seamlessly with cloud platforms such as Microsoft Azure and AWS.

Proficianal Summary
I am a Senior Big Data and Azure Data Engineer with over 10 years of progressive experience in designing, building, and managing end-to-end data engineering solutions across diverse industries including finance, healthcare, insurance, banking, and geospatial mapping. My core strength lies in building scalable, high-performance data pipelines using cutting-edge technologies like Hadoop, Spark, PySpark, and Hive, and integrating them seamlessly with cloud platforms such as Microsoft Azure and AWS.
I have a proven track record in delivering enterprise-level data migrations—moving terabytes of data from legacy systems to cloud environments—while ensuring data quality, lineage tracking, regulatory compliance, and performance optimization. I specialize in metadata-driven automation frameworks for schema validation, ingestion, transformation, audit logging, and alerting, significantly reducing manual effort and enhancing scalability.
My expertise includes tools like Azure Data Factory (ADF), Azure Blob Storage, Azure SQL Database, Azure Key Vault, and orchestration tools like Apache Airflow, Zena, and Autosys. I have implemented complex DAGs for ETL pipelines, automated failure handling, and ensured secure cloud integration with strict governance practices.
I’ve collaborated with top-tier global organizations such as DBS Bank, PayPal, ICICI Bank, AbbVie, Country Financial, Apple, and Nokia. My work has contributed to mission-critical systems including MAS 637 and PILLAR3 regulatory reporting, SAS exit transformations, CRM cloud migrations, HANA-to-Hadoop reporting, and spatial data integration for Apple and Nokia Maps.
I am well-versed in data governance practices, using tools like Collibra to manage metadata and ensure compliance. I regularly optimize performance of Spark jobs, Hive queries, and ADF pipelines and have utilized Power BI, Adobe Analytics, and Apache Superset for data visualization and reporting.
My approach is guided by Agile methodologies and DevOps principles, including CI/CD automation with Git and Azure DevOps. I’ve also taken on leadership roles, mentoring junior engineers, conducting code reviews, and guiding teams in the delivery of scalable and robust solutions.
I am passionate about transforming raw data into actionable intelligence and continuously strive to enhance data value, accessibility, and governance across organizations. For me, data engineering is not just about infrastructure—it's about enabling smarter, data-driven decisions that drive business success.
My Skills
Experience
Hadoop Lead
Led the development of a metadata-driven data migration solution from Guidewire S3 to Azure SQL, implementing SCD2 methodology and optimizing data pipelines using Azure Data Factory and Hadoop ecosystems.
Consultant (Big Data Engineer)
Worked on MAS 637 and Pillar3 Reporting projects for regulatory compliance and risk evaluation. Led SAS Exit project to migrate legacy SAS scripts into the ADA platform using PySpark and Airflow.
Consultant – Big Data Migration
Delivered end-to-end CRM data migration to Azure and HANA to Hadoop report migrations, improving system performance and data delivery pipelines across cloud and big data platforms.
Software Engineer – Data Analytics
Developed Spark-based applications for centralized patient data analytics on Hadoop platform to support data-driven decision making in the healthcare domain.
Engineer – Geospatial Data Solutions
Ingested and processed large volumes of geospatial data for Apple’s Maps platform using Hadoop and Shell, enhancing spatial data accuracy and performance.
Trainee Engineer – Data Engineering
Assisted in building data ingestion pipelines for large-scale geospatial data integration into Hadoop, supporting Nokia's navigation and mapping initiatives.
Latest Projects
Comm-Agg End-to-End Data Migration Solution
Description: This project focused on developing a complete end-to-end solution for migrating data from Guidewire S3 to Azure SQL, using Blob Storage as an intermediary. The goal was to ensure high-quality, scalable data pipelines that adhered to the Slowly Changing Dimension Type 2 (SCD2) methodology for historical tracking. Automation and metadata-driven processes played a vital role in minimizing manual efforts while ensuring high data integrity. Python scripts were created for robust error handling, email notifications, and data comparison reports between Hive and ODS tables, ensuring precise validation. The final solution involved integrating Azure SQL data into Hadoop and processing it through multiple layers to prepare it for business consumption.
Technologies: Azure Data Factory (ADF), Blob Storage, Azure SQL, Python, Hadoop, Hive, Impala, Zena, Shell Scripting
Responsibilities:
• Developed and optimized data pipelines in ADF to extract data from Guidewire S3 and load into Azure SQL.
• Implemented SCD2 for effective historical tracking of data changes.
• Designed and deployed automated incremental data load processes to reduce manual efforts.
• Automated schema management using metadata-driven frameworks.
• Created Python scripts for error detection, email alerts, and data validation reports.
• Migrated data from Azure SQL to Hadoop ecosystem and maintained structured zones: landing, core, and active views.
• Orchestrated pipeline execution and stored procedures using Zena for improved automation and monitoring.
MAS 637 and PILLAR3 Reporting
Description: The MAS 637 project involved generating reports for the Monetary Authority of Singapore (MAS) to ensure regulatory compliance. This project analyzed pool statuses across various user types and tracked the changes over a 12-month period. PILLAR3 reporting focused on risk management, using predictive analytics to evaluate user behavior and default risks based on historical data.
Technologies: ADA (In-house framework), PySpark, Spark SQL, Presto, Hive, Hadoop, Airflow, Jupyter, Collibra
Responsibilities:
• Developed and optimized PySpark scripts within the ADA (Advanced DBS Analytics) framework to analyze user pool statuses across different time periods.
• Automated trend analysis and default risk evaluations for users, improving regulatory compliance and decision-making.
• Provided insights into financial transactions and default status for users by integrating various data sources within the bank’s system.
• Delivered high-quality, compliant reports to the Monetary Authority of Singapore, ensuring that data and analysis met regulatory standards.
• Applied predictive analytics techniques to evaluate potential default risks, improving the bank's risk management strategies.
• Enhanced the reporting process by implementing Presto for faster data retrieval and analysis.
• Created and managed metadata structures within Collibra to support proper data governance, lineage tracking, and regulatory compliance.
SAS Exit
Description: The SAS Exit project was designed to migrate legacy SAS scripts into DBS Bank’s ADA platform. The migration involved converting complex SAS reports into PySpark-based solutions, improving the performance and scalability of the reporting system.
Technologies: SAS, ADA (In-house framework), PySpark, Airflow, Collibra, Python, Spark SQL, Presto, Hive, Hadoop
Responsibilities:
• Led the migration of complex SAS reporting scripts to PySpark within the ADA framework, significantly improving the performance of data processing.
• Designed a robust architecture for the ADA platform that met the requirements of the SAS reports and ensured seamless integration with existing systems.
• Developed and maintained metadata structures in Collibra, improving data governance and ensuring compliance.
• Automated the execution of PySpark jobs using Apache Airflow, ensuring timely report generation and reducing manual intervention.
• Collaborated with key stakeholders to understand business requirements and ensure the migration of reports aligned with their expectations.
CRM Data Migration
Description: This project involved migrating CRM data from multiple on-premises and cloud environments to an Azure-based platform. The goal was to ensure smooth data transfer while generating insightful reports for the business.
Technologies: Azure Data Factory (ADF), Blob Storage, SQL Database, Oracle, Key-Vault, Vertica, Azure Logic Apps
Responsibilities:
• Led the migration of CRM data to Azure SQL using Azure Data Factory (ADF), ensuring reliable and efficient data processing.
• Developed scalable ADF pipelines to handle data ingestion from multiple sources, including Vertica and Oracle databases.
• Managed a team of junior engineers, providing guidance on best practices for cloud migration and data governance.
• Automated the scheduling of data transfer tasks using Azure Logic Apps, improving overall efficiency and reducing delays in data processing.
H2H (HANA to Hadoop)
Description: The H2H project involved migrating SAP HANA reports to a Hadoop-based environment, ensuring improved performance and scalability for reporting and analytics.
Technologies: PySpark, HDFS, Hive, MongoDB, GIT, Shell, Custom PayPal Frameworks
Responsibilities:
• Led the design and migration of SAP HANA reports to the Hadoop ecosystem, focusing on Spark and Hive for data processing.
• Developed Spark and Hive-based reports to improve data accessibility and reporting performance.
• Utilized Hadoop, HDFS, and MongoDB to store and manage large volumes of data, improving data storage and processing efficiency.
• Collaborated with business teams to deliver reports through various channels, including email, web UI, and dashboards.
• Optimized the data migration process to ensure accurate and timely delivery of reports to end-users.
IAP (Integrated Analytics Platform)
Description: The IAP project was designed to centralize patient data from various applications into Hadoop for analysis. The goal was to process and analyze patient activity data, allowing healthcare providers to make informed decisions.
Technologies: Apache Spark, Hadoop, Hive, Scala, Impala, Shell Scripting, Autosys
Responsibilities:
• Ingested data from various source applications into Hadoop, ensuring seamless integration and data processing.
• Developed Spark applications to analyze patient activity data, providing a comprehensive view of patient behavior and outcomes.
• Built efficient Hive and Impala queries to transform and analyze the data, enabling the creation of actionable insights for healthcare providers.
• Automated batch data processing using Autosys, ensuring timely updates and reducing manual interventions.
• Managed the end-to-end pipeline for data ingestion, processing, and reporting.
SDC (Spatial Data Collaborator)
Description: This project involved working with geospatial data sources to support the development of Apple’s map features. We processed and integrated data from multiple sources to create accurate and up-to-date maps.
Technologies: Hadoop, Hive, HDFS, Python, Shell Scripting, MapReduce, USGS, TIGER Data Sources
Responsibilities:
• Ingested geospatial data from multiple sources (USGS, TIGER) into Hadoop for processing and integration into Apple’s mapping systems.
• Developed Hadoop-based pipelines to process large geospatial datasets, enabling more accurate and comprehensive map features.
• Worked with Apple’s mapping team to ensure the geospatial data was accurately merged, improving map development.
• Automated the data ingestion process from SFTP servers, reducing manual data processing time.
Mapping Data Integration
Description: This project focused on integrating Nokia's geospatial data into Hadoop for large-scale analytics, enabling the creation of advanced mapping and navigation solutions.
Technologies: Hadoop, Hive, HDFS, Python, Shell, Spark
Responsibilities:
• Assisted in the integration of Nokia's geospatial data into Hadoop for large-scale analysis.
• Worked on developing data ingestion pipelines to process geospatial data from various sources, improving data quality and consistency.
• Contributed to developing tools to automate the ingestion process, reducing manual effort and improving operational efficiency.