View on GitHub

Naveed_Khan_Portfolio

Data Scientist and an Enthusiast Machine Learning Professional

BACKGROUND

6+ years broad-based experience in Data Analytics with emphasis on Data Visualization.
Data scientist capable of building data driven machine learning models using scripting languages like Python and R.
25+ years’ experience in Quality Management, Product Engineering and Program Management.
Passionate and influential leader with 10+ years’ experience in people management.
Technical mentor capable to coach small to bigger teams on various topics.
Delivered substantial projects from concept to complete deployments.

DATA SCIENCE EXPERIENCE

BIG BANG DATA SCIENCE INSTITUTE

Data Mining for Feature Selection and Feature Engineering and visualization using Pyhton, R, Power BI & Tableau.
Extensive data analysis with Python NumPy, Pandas, Seaborn, Scikit-Learn, Matplotlib, TensorFlow and other libraries.
Completed various Supervised Learning projects on Classification and Regression utilizing Decision Tree, Logistic Regression, Simple and Multiple Regressions.
Applied Ensemble Methods using Bagging, Boosting and Ada-Boost.
Trained on Natural Language Processing, Time Series and Deep Learning with focus on Artificial Neural Network (ANN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN).
Formalized lecture and conducted training on Visual Data Sensemaking principles.

VISUAL DATA SENSEMAKING

An interesting journey to the exciting world of data visualization. It’s an art that needs to be learnt and perfected with continuous learning. You need to be a visual thinker to imagine how data speaks to you and apply scientific techniques to interpret and transform them into an easy to follow visuals. I have read many literatures on this topic and one of my last favorite read was by a wonderful author, Stephen Few. His new book, Now You See it, really explained the principle concept behind the data and visualization using modern graphs to find and examine the meaningful patterns and relationships that reside in quantitative data. I was also fortunate enough to connect with Stephen during my review of his book and able to exchange our thoughts on his excellent work.

Thanks to my mentor, Mo Medwani on Data Science, I was invited to present the summary of this book to the Big Bang Data Science Institute in February 2022 during their Bootcamp week for Data Visualization foundation training. It was an honor and remarkable gesture by Mo Medwani to allow me to share my experience and understanding of this topic to his class.

DATA SCIENCE PROJECTS

1. AAA Northeast Member Analysis (in-progress)

Develop a ML model to provide a market segmentation of 21000 AAA members.
Applied data visualization and statistics to reduce feature dimensionality. Achieved 49% features reduction.
Apply One Hot encoding to transform Categorical features.
Develop multiple algorithms to identify members Clustering using K-Means.
Code a series of “look-alike” predictive models: Logistic Regression, K Nearest Neighbors (KNN), Random Forrest, Bagging.

2. Credit Card ML Anomaly Detection

Architected an Agile Unsupervised anomaly detection model. Able to flag 20% anomalous transactions.
Developed algorithms and prevented up to 15% credit card holders to exceed their credit limits.
Performed Feature Engineering and Feature Selection using Geocoder APIs.
Analyzed and compared various anomaly detection model: Isolation Forest, Local Outlier Factor (LOF), and Support Vector Machine (SVM) and compared for the accuracy. Achieved 91% accuracy with Isolation Forest.
Presented optimal anomaly detection technique to the client along with key recommendations to enhance fraud detection capabilities and development of additional data features collection for future model enhancements.

3. Bitcoin Time Series Analysis

Developed a ML model to predict Bitcoin price growth by 2% based on only changes from the previous day.
Built Facebook (FB) Prophet, Regression for Time Series, SARIMA & RNN for time series predictive models.
Compared several models for their performances and recommended FB Prophet and data insight to client for deployment.

4. Secondary Car Market ML Prediction

Developed and optimized ML Regression model to predict secondary car market pricing.
Trained and tested data and compared using various regressions models like, Multiple Linear Regression, Decision Tree, Random Forrest, Ridge, Lasso, KNN and Elastic Net.
Obtained 68 to 78% accuracy on test data set.
Summarized and presented optimal Regression model and future model improvement plan