About
- Collaborative data engineering and software developer professional with substantial knowledge and experience in analysis, design, development, implementation, migration, convergence, management, and support of large scale databases, data warehouses, and big data systems by creating intuitive architectures and frameworks that help organizations effectively capture, store, process, visualize and analyze huge volume of structured, semi-structured, unstructured and stream of heterogeneous data set.
- Proven talent for aligning business strategy and objectives with established analytical paradigms to achieve maximum operational impacts with minimum resource expenditures. Results-focused leader with expertise spanning data engineering, software development, business analytics, cross-functional team leadership, complex problem-solving.
- I am currently pursuing my Masters in Computer Science at the University of Texas at Dallas specializing in Intelligent Systems.
- I have interned at Amazon as Data Engineer where I gained knowledge and experience working with design and development of streaming data pipelines.
- I have previously worked as a Data Engineer for Onward Technologies which is a global IT service provider in domains such as data analytics, data science, Artificial Intelligence (AI) and Machine Learning (ML). Before that I was working with Cognizant on their flagship Core Banking and Insurance customer - Suncorp.
- I am a Microsoft Certified Azure Data Engineer and Databricks Certified Data Engineer Associate.
- I am interested in Big Data Engineering, Cloud Data Warehousing, DevOps and Full Stack Development.
Education
Master of Science, Computer Science
2021 - 2023
University of Texas at Dallas, Richardson, TX
Relevant Courses : Database Design, Machine Learning, Artificial Intelligence, Natural Language Processing, Big Data Management and Analytics, Design and Analysis of Algorithms, Information Retrieval, Human Computer Interaction
Bachelor of Technology, Mechanical Engineering
2014 - 2018
Motilal Nehru National Institute of Technology, Prayagraj, India
Skills
Professional Experience
Data Engineer Intern
May 2023 - Present
Trinity Industries, Dallas, Texas, United States
- Working on a building change data capture solution for migrating data from MS SQL server to AWS using Databricks, Spark Streaming, Debezium, and Kafka.
- Designed and implemented streaming data pipelines using Delta Lake to perform upsert (merge) operations on incoming data streams, ensuring real-time data updates and maintaining data consistency.
- Leveraged Azure Data Factory to seamlessly integrate data across various Azure services, such as Azure Databricks, Azure SQL Database, Azure Blob Storage, and Azure Synapse Analytics.
- Connected Tableau to AWS Athena and created reports and dashboards for railroad mileage comparison.
Data Engineer Intern
May 2022 - Aug 2022
Amazon, Boulder, Colarado, United States
- Created data producers and consumers for Kinesis Streams, ensuring uninterrupted end-to-end data flow.
- Authored ETL scripts using PySpark and AWS Glue APIs, implementing complex transformations and business logic to cleanse and enrich data during the ETL process.
- Created Jupyter notebooks on AWS EMR clusters to enable exploratory data analysis of large-scale datasets.
Data Engineer
Jan 2021 - Aug 2021
Onward Technologies Limited, Chennai, India
- Migrated 250 spark jobs from on-premise Hadoop to Google Cloud Platform, reducing the processing time and increasing the computational limit by more than 60%.
- Designed and implemented a scalable data pipeline to process structured and semi-structured data by integrating 550 million raw records from different data sources using Kafka and PySpark and storing processed data in MongoDB.
- Authored Airfow DAGs for daily data ingestion and processing from Google Cloud Storage to BigQuery.
- Written hive queries to parse the raw data and store the refined data in partitioned and bucketed tables.
Data Engineer
Nov 2018 - Jan 2021
Cognizant Technology Solutions India Pvt Ltd, Chennai, India
- Handled sqoop parallelism, incremental data load from Oracle to HDFS, and Hive tables for daily data growth.
- Designed Nifi workflows for data ingestion from various sources such as RDMS, REST API, Kafka topic, etc.
- Improved runtime of slow-running spark jobs by 60% by optimizing Spark SQL joins.
- Developed a notification-based system using SNS, SQS, lambda, and DynamoDB to automate its deployment to AWS via GitLab.
- Stored data from spark as wide tables in Elastic Search for real-time aggregation and visualization in Kibana.
- Involved in integrating back-end systems in NodeJS with the dashboards created using React.
Portfolio Projects
- Developed a search engine for desserts (sweets) that can crawl and index 100,000+ web pages from the internet and create a web graph.
- The index and the graph will be used to develop two relevance models - page ranking and HITS, which will rank the search results.
- The project will focus on clustering web pages to improve the search results by using flat clustering and two agglomerative clustering methods.
- Also, query expansion through pseudo-relevance feedback will be implemented using the Rocchio algorithm.
- Created databricks notebooks to ingest, transform, analyze and create reports on Formula 1 racing data.
- Written Spark SQL queries to find the dominant drivers and teams for visualization.
- Scheduled the pipeline using Azure Data Factory (ADF) for monitoring and alerts.
- Built functional python script to load songs and logs data from S3 bucket.
- Transformed them to create and store as fact and dimension tables in redshift.
- Orchestrated the data pipeline using Airflow DAGs and enforced data quality checks.
- Designed and implemented a real-time streaming and classification system for sentiment analysis on Twitter data.
- Pulled live tweets using Nifi (Twitter API) into Kafka topic for cleaning, parsing, and filtering using Spark.
- Applied Stanford NLP to get sentiment score for each tweet and visualized using ES-Kibana.
- data pipeline created for unification and consolidation of real-time customer web events, weblogs, and profile data into a hive warehouse for ad-hoc analysis.
- data pipeline created for a retail store called ABC-Stores using Hadoop for storage and spark for data processing to produce reports to perform analytics in Power BI.
- Scheduled the pipeline for daily batch data using airflow.
- Batch ETL pipeline project on GCP to load and transform daily flight data using Spark to update tables in BigQuery.
- Scheduled the pipeline for daily batch data using airflow.
- Read a list of words from a file and store it in a Hash Table.
- Generated a random 2-D array based on user input for rows and columns.
- Iterated the 2-D puzzle in 8 directions and checked the existence of a word formed in the Hash Table.
- Created a simple Unix shell using C Programing.
- Implemented build in commands, parallel commands, and redirection of outputs in the Shell.
- Created a 2-D maze and solved it using DisJoint Sets operations in Java.
- Built a graph of Texas cities from a file.
- Found the minimum spanning tree of the graph by applying Kruskal's algorithm.
- Read a text file using MIPS and produced statistics such as upper case letters, lower case letters, number symbols, other symbols, lines of text, and signed numbers.
- Assembly program which reads any number in binary, decimal, or hexadecimal format and converts it into any required format based on user input.
- Replace the current round-robin scheduler in xv6 with a lottery scheduler.
- Assigned each running process a slice of the processor in proportion to the number of tickets it has. The more tickets a process has, the more it runs. Each time slice, a randomized lottery determines the winner of the lottery; that winning process is the one that runs for that time slice.
- Code solution for classic synchronization problem of seeking Tutor.
- Solution is implemented using concurrent programming in C.
- C program to read a file system image and check its consistency using a set of 12 rules.
- When the image is not consistent, the check should output an appropriate error message.
- Data modeling project which involves design of database for ebay.com.
- Created an Entity-Relationship model for the Ebay database by identifying and analyzing data requirements.
- Converted the Entity-Relationship model to a relational model by applying mapping and normalization techniques.
- Written DDL, and DML statements for the relational model and defined relevant stored procedures and trigger.
- Created an Android application to help new parents manage their children and personal hobbies.
- Utilized Redux, a library for managing application state, to improve the scalability and maintainability of the app.
- Utilized third-party libraries and APIs to add features such as screen routing, calendar, image upload and date-picker.
Recommendations
These are few recommendations from the people I have worked with such as my colleagues, mentors, friends, etc.
Mehroos is an amazing person to work with. His patience and his way of solving complex scenarios motivated me to follow his way. Good team player and such a cool guy. I would like to work with him again if I get a chance. All the best for your future endeavors.
Karthickeyan Sundarajam
Tech Lead
I have known Mehroos for the past 2 years since he joined Cognizant as a fresh graduate out of college post which I was his senior colleague. My personal experience in supervising and mentoring him has been quite satisfactory. Coming for a non-CS background, Mehroos has worked hard from day one to acquire the necessary skill set to transition into an IT professional. He is a quick learner and with limited training experience, he was able to learn and implement various technologies such as Hadoop, Spark, Hive, etc. as required by the projects.
Sheikh Mohammed
Sr Developer
I have known Mr. Mehroos Ali for the past one year as an enthusiastic and self-motivated person. He is one of the most creative, intelligent and capable individuals I have had the good fortune to work with. I have been his Iteration manager for the various projects since he has been part of at Cognizant and have witnessed his career progress from a junior resource to a good developer in a limited time period.
Saravana Prabhu
Manager
Mehroos as a sincere student, working tirelessly towards his goal of learning. He has a good understanding of the subjects which was reflected both in theoretical and practical knowledge. He is articulate and can express himself very well both in formal teaching contexts and in informal discussions. With all my interactions, I have had various moments to evaluate him.