bio.

I am a full stack Data Engineer with over 10+ years of Industry experience.

I have worked in a variety of settings to build systems that collect, manage, and convert raw data into usable information for data scientists and business analysts to interpret. The ultimate goal was to make data accessible so that organizations can use it to evaluate and optimize their performance.

Data allows organizations to measure the effectiveness of a given strategy: When strategies are put into place to overcome a challenge, collecting data will allow you to determine how well your solution is performing, and whether or not your approach needs to be tweaked or changed over the long-term.

We provide Data Engineering services to help do the heavy lifting of filtering and fine graining the data so that the stakeholders of data such as Data Scientists and Data Analysts can focus on their area of expertise.

experience

2021-Present

Encore Data Intelligence Services

• Data Transformation Service : Python ( Amazon EC2, Amazon ECS, AWS Lambda ), PySpark ( Amazon EMR, AWS Glue )

• Data Storage Service: Amazon Elastic Block Store (EBS), Amazon Elastic File System (EFS), Amazon Simple Storage Service (S3)

• Analytics Service: AWS Glue, Amazon Athena, Delta Lake, Amazon EMR, Amazon Quicksight

• Databases Service: Amazon RDS, Amazon Redshift, Amazon Aurora, Amazon DynamoDB

• Orchestration Service: Apache Airflow, AWS MWAA ( Managed Workflows for Apache Airflow)

2021

Amazon Web Services (AWS)

• Handled, maintained and supported ETL data going into/out of Redshift (legacy system moving into Data Lake architecture)
• Offloaded subset of data from Redshift in Parquet format and transformed newer data into parquet format using Apache Spark running on EMR (moving towards Glue) and, loaded it into Redshift Spectrum (Data Lake implementation of Data Warehouse)
• Worked on WIP in-house AWS Glue catalog sharing infrastructure that minimizes data duplication and increases data consistency using AWS Glue, Lakeformation, Resource Access Manager (RAM) and Redshift Spectrum.

2018-2021

Meredith Corp. (Time Inc.)

Data Engineer:
• Designed and developed data workflow model to create a Data Lake in AWS infrastructure using AWS Glue Jobs, Development endpoint, Glue Crawler, Data Catalog, Athena and EMR
• Developed PySpark code with RDDs and UDFs to perform necessary transformations
• Orchestrated computational workflows and data processing pipelines using Apache Airflow
• Worked on building in-house data deletion framework to delete data in Data lake for CCPA/GDPR use cases.
Senior Software Development Engineer In Test:
• Big Data/Automation Testing: Verifying and validating ETL Big Data in Databases, Data Warehouses and AWS S3 using Python and PySpark using AWS Glue, EMR, Zeppelin Notebook, Jupyter Notebook.
• Manual/Functional/Automation Testing: Testing Data pipeline built in AWS infrastructure using Python and Pytest by building the framework from ground up.

2014-2018

Sonos Inc.

• Developed and deployed data pipeline using Python, PySpark, MySQL, AWS platforms, Ganglia, Ansible, NiFi, Docker, GitHub
• Big Data/Automation Testing: Tested Spark processing using PySpark, using Zeppelin, Pycharm, Jenkins, Perforce, AWS platform, Interana.
• Player Automation Testing: Tested the data flow from Speakers to the back end system using Python as Object Oriented Language, using Eclipse, Jenkins, perforce, Sonos player testbed
• Manual/Functional/Automation Testing: Tested Big data using python as scripting language using Jupyter Notebook, Jenkins, Perforce, Splunk, AWS Cli.
• Performance Testing: Tested websocket connections using Java, Netbeans, JMeter, Perfmon & Nginx
• Pitched in, implemented and presented 5 new ideas which involved Testing tools, android, voice integration using android, voice integration using Python, Pyspark, at Internal hackathons.

2010-2012

CGI

Performance Testing Engineer - Bell Client at CGI Group Inc (bangalore, India)
• Involved in the Requirements Gathering and Test Plan Preparation stages of SDLC.
• Developed scripts using Web (HTTP/HTML), Web (Click and Script) and Ajax (Click and Script) protocols in HP LoadRunner and embedded C and Java code for error handling.
• Collected performance metrics by setting up monitors in the Application and Web server using Sitescope.
• Conducted load ,stress, spike or/and endurance testing depending on the requirement.
• Analyzed collected metrics and test results, and prepared reports.
• Involved in performing tuning activities by finding and eliminating bottlenecks, and enhancing performance.
• Prepared final report and presented it to client.
• Conducted first round of technical interview and handled training sessions for new joiners.

Education

2012-2014

Worcester Polytechnic Institute (WPI)

Master of Science (MS), Learning Sciences & Technologies (Major in Computer Science)

2005-2009

Visvesvaraya Technological University (VTU)

Bachelor of Engineering (BE), Computer Science