Design, develop, and validate machine learning models with streaming data using the Scikit-Multiflow framework. This book is a quick start guide for data scientists and machine learning engineers looking to implement machine learning models for streaming data with Python to generate real-time insights.
You'll start with an introduction to streaming data, the various challenges associated with it, some of its real-world business applications, and various windowing techniques. You'll then examine incremental and online learning algorithms, and the concept of model evaluation with streaming data and get introduced to the Scikit-Multiflow framework in Python. This is followed by a review of the various change detection/concept drift detection algorithms and the implementation of various datasets using Scikit-Multiflow. Introduction to the various supervised and unsupervised algorithms for streaming data, and their implementation on various datasets using Python are also covered. The book concludes by briefly covering other open-source tools available for streaming data such as Spark, MOA (Massive Online Analysis), Kafka, and more.
What You'll Learn
Understand machine learning with streaming data concepts Review incremental and online learning Develop models for detecting concept drift Explore techniques for classification, regression, and ensemble learning in streaming data contexts Apply best practices for debugging and validating machine learning models in streaming data context Get introduced to other open-source frameworks for handling streaming data.
Who This Book Is For
Machine learning engineers and data science professionals
Chapter 1: An Introduction to Streaming Data
Chapter Goal: Introduce the readers to the concept of streaming data, the various challenges associated with it, some of its real-world business applications, various windowing techniques along with the concepts of incremental and online learning algorithms. This chapter will also help in understanding the concept of model evaluation in case of streaming data and provide and introduction to the Scikit-Multiflow framework in Python.
No of pages- 35
Sub -Topics
1. Streaming data
2. Challenges of streaming data
3. Concept drift
4. Applications of streaming data
5. Windowing techniques
6. Incremental learning and online learning
7. Illustration : Adopting batch learners into incremental learners
8. Introduction to Scikit-Multiflow framework
9. Evaluation of streaming algorithms
Chapter 2: Change Detection
Chapter Goal: Help the readers to understand the various change detection/concept drift detection algorithms and its implementation on various datasets using Scikit-Multiflow.
No of pages : 35
Sub - Topics:
1. Change detection problem
2. Concept drift detection algorithms
3. ADWIN
4. DDM
5. EDDM
6. Page Hinkley
Chapter 3: Supervised and Unsupervised Learning for Streaming Data
Chapter Goal: Help the readers to understand the various regression and classification (including Ensemble Learning) algorithms for streaming data and its implementation on various datasets using Scikit-Multiflow. Also, discuss some approaches for clustering with streaming data and its implementation using Python.
No of pages: 35
Sub - Topics:
1. Regression with streaming data
2. Classification with streaming data
3. Ensemble Learning with streaming data
4. Clustering with streaming data
Chapter 4: Other Tools and the Path Forward
Chapter Goal: Introduce the readers to the other open source tools for handling streaming data such as Spark streaming, MOA and more. Also, educate the reader about additional reading for advanced topics within streaming data analysis.
No of pages: 35
Sub - Topics:
1. Other tools for handling streaming data
1.1.1. Apache Spark
1.1.2. Massive Online Analysis (MOA)
1.1.3. Apache Kafka
2. Active research areas and breakthroughs in streaming data analysis
3. Conclusion
Show more