Modern Data Mining with Python

A risk-managed approach to developing and deploying explainable and efficient algorithms using ModelOps (English Edition)

**Description**

**Table of contents**

**User Reviews**

Description

Table of contents

Table of Contents

- Cover
- Title Page
- Copyright Page
- Dedication Page
- Foreword
- About the Authors
- About the Reviewers
- Acknowledgement
- Preface
- Table of Contents
- 1. Understanding Data Mining in a Nutshell
- Introduction
- Structure
- Objectives
- What defines modern data mining
- The lifecycle: Data to insights consumption
- Understanding pattern recognition
- Significance of the human learning process
- The human learning process and mental models
- Data: The key ingredient for meaningful patterns and relationships

- How machines leverage data to build models
- Machine learning process
- Two dominant strategies: Classification and regression
- Biases and learning shortfalls
- Measuring learning accuracy and balancing trade-offs
- Can data size and sample impact learning

- How do humans benefit from data and learning
- Modern-day data mining challenges and possible remediation

- Machine learning process
- Conclusion
- Points to remember

- 2. Basic Statistics and Exploratory Data Analysis
- Introduction
- Structure
- Objectives
- Setting up Python 3.x
- Data mining and statistics
- Statistics: Foundation, key terms, needs, and types

- Descriptive statistics
- Graphical and non-graphical exploratory data analysis
- Non-graphical and graphical representation of univariate data
- Non-graphical representation of multivariate data
- Graphical representation of multivariate data

- Probability theory
- Probability distribution

- Inferential statistics
- Hypothesis testing with commonly used statistical tests

- Introduction to Time Series Data
- Exploratory data analysis: HMDA case study
- Conclusion
- Points to remember

- 3. Digging into Linear Regression
- Introduction
- Structure
- Objectives
- Linear regression
- Background
- Under the hood
- Challenges and assumptions including multi-collinearity
- Detailed EDA
- Dataset description
- Missing value treatment
- Outlier analysis
- Correlation
- Checking on the assumptions of linear regression

- Feature selection
- Regression execution and results
- Regression result interpretation

- Optimization algorithm
- Gradient descent

- Regularization
- Lasso regression
- Ridge regression
- Elastic-Net regression

- MLflow introduction: Need and implementation
- MLflow experiment tracking

- Case study
- Conclusion
- Points to remember

- 4. Exploring Logistic Regression
- Introduction
- Structure
- Objectives
- Logistic regression
- Background
- Under the hood
- Data
- Estimating probabilities
- Loss function

- Challenges and assumptions
- Logistic regression result and interpretation
- Model interpretability and explainability
- Performance metrics
- Model generalization
- K-fold cross-validation
- Ensemble learning

- Model lifecycle processes
- Model development process
- Case study: Loan repayment likelihood prediction
- Conclusion
- Points to remember

- 5. Decision Trees with Bagging and Boosting
- Introduction
- Structure
- Objectives
- Decision trees
- Background
- Under the hood
- Data
- Model
- Loss function

- Challenges and assumptions
- Decision tree result and interpretation

- Ensembling: Bagging, boosting, and stacking
- Random forest
- Gradient boosting
- Ensembling using the stacking method

- Conclusion
- Points to remember

- 6. Support Vector Machines and K-Nearest Neighbors
- Introduction
- Structure
- Objectives
- Classification algorithms with a twist
- Background
- Under the hood
- Data
- Model
- Loss function: Achieving optimal algorithmic results

- Challenges and assumptions
- Case study: Predicting customer propensity to subscribe to a term deposit
- Conclusion
- Points to remember

- 7. Putting Dimensionality Reduction into Action
- Introduction
- Structure
- Objectives
- Dimensionality reduction
- Background
- Under the dimensionality reduction hood
- Data
- Model: Reducing dimensions and variance
- Principal component analysis
- Linear discriminant analysis
- t-distributed Stochastic Neighbor Embedding

- Loss: Measuring Variance Reduction

- Challenges and assumptions
- Case study: Predicting loan repayment propensity using logistic regression, PCA, and LDA
- PCA parameters and interpretation
- LDA parameters and interpretation
- Logistic regression
- Conclusion
- Further reading
- Points to remember

- 8. Beginning with Unsupervised Models
- Introduction
- Structure
- Objectives
- Unsupervised learning
- Background
- Unsupervised learning techniques
- Data
- Model: Building meaningful clusters and profiling them
- K-means clustering
- Density-based spatial clustering of applications with noise
- Hierarchical clustering

- Loss: Efficiently achieving the optimal number of clusters

- Challenges and assumptions
- Case study: Bank customer portfolio segmentation
- Advanced unsupervised learning: A primer
- Conclusion
- Points to remember

- 9. Structured Data Classification using Artificial Neural Networks
- Introduction
- Structure
- Objectives
- Artificial neural network
- Background
- Under the hood of neural networks
- Data
- Model
- Loss function: Achieving optimal results
- Back-propagation and regularization

- Challenges and assumptions
- Case study: Explainable and Interpretable ANN Model
- Interpretable and explainable AI using SHAP and PiML

- Conclusion
- Points to remember

- 10. Language Modeling with Recurrent Neural Networks
- Introduction
- Structure
- Objectives
- Language modeling
- Background
- Under the hood of language modeling
- Data: From spoken languages to modeling datasets
- Model: The language with context
- Recurrent neural network
- Long short term memory

- Loss: Quest for the best model

- Challenges and assumptions related to text data and model
- Case study: Customer complaint classification explained with LIME
- Rise of transformers: A primer on BERT and GPT
- Conclusion
- Further reading
- Points to remember

- 11. Image Processing with Convolutional Neural Networks
- Introduction
- Structure
- Objectives
- Deep learning for computer vision tasks
- Background
- Under the hood of CNN models
- Data
- Model
- Loss: How to achieve optimal results

- Challenges and assumptions
- The race for the best model and transfer learning: A primer
- Case study: PDF document parser
- Conclusion
- Further reading
- Points to remember

- 12. Understanding Model Risk Management for Data Mining Models
- Introduction
- Structure
- Objectives
- Data mining challenges and risks
- Why do model risks occur

- Introduction to Model Risk Management
- Key regulatory frameworks
- Pillars of Model Risk Management

- Introduction to Model Operations
- ModelOps: Product first vs. model first mindset
- How ModelOps facilitates MRM
- Case study: Regulatory requirement fulfillment using MRM and ModelOps
- Conclusion
- Points to remember

- 13. Adopting ModelOps to Manage Model Risk
- Introduction
- Structure
- Objectives
- Model risk management for fair banking
- Background
- Case study: Fair lending model lifecycle implementation - concept to inference
- Fair lending model lifecycle
- Data
- Model Operations tools primer
- Architecting the model lifecycle using ModelOps
- Fair Lending Risk Assessment: The application

- Challenges and assumptions
- Future of AI and its practitioners
- Conclusion
- Further reading
- Points to remember

- Index