Hardback : £89.97
Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro(r) presents an applied and interactive approach to data mining. Featuring hands-on applications with JMP Pro(r), a statistical package from the SAS Institute, the book uses engaging, real-world examples to build a theoretical and practical understanding of key data mining methods, especially predictive models for classification and prediction. Topics include data visualization, dimension reduction techniques, clustering, linear and logistic regression, classification and regression trees, discriminant analysis, naive Bayes, neural networks, uplift modeling, ensemble models, and time series forecasting. Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro(r) also includes: * Detailed summaries that supply an outline of key topics at the beginning of each chapter * End-of-chapter examples and exercises that allow readers to expand their comprehension of the presented material * Data-rich case studies to illustrate various applications of data mining techniques * A companion website with over two dozen data sets, exercises and case study solutions, and slides for instructors Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro(r) is an excellent textbook for advanced undergraduate and graduate-level courses on data mining, predictive analytics, and business analytics. The book is also a one-of-a-kind resource for data scientists, analysts, researchers, and practitioners working with analytics in the fields of management, finance, marketing, information technology, healthcare, education, and any other data-rich field. Galit Shmueli, PhD, is Distinguished Professor at National Tsing Hua University's Institute of Service Science. She has designed and instructed data mining courses since 2004 at University of Maryland, Statistics.com, Indian School of Business, and National Tsing Hua University, Taiwan. Professor Shmueli is known for her research and teaching in business analytics, with a focus on statistical and data mining methods in information systems and healthcare. She has authored over 70 journal articles, books, textbooks, and book chapters, including Data Mining for Business Analytics: Concepts, Techniques, and Applications in XLMiner(r), Third Edition, also published by Wiley. Peter C. Bruce is President and Founder of the Institute for Statistics Education at www.statistics.com He has written multiple journal articles and is the developer of Resampling Stats software. He is the author of Introductory Statistics and Analytics: A Resampling Perspective and co-author of Data Mining for Business Analytics: Concepts, Techniques, and Applications in XLMiner (r), Third Edition, both published by Wiley. Mia Stephens is Academic Ambassador at JMP(r), a division of SAS Institute. Prior to joining SAS, she was an adjunct professor of statistics at the University of New Hampshire and a founding member of the North Haven Group LLC, a statistical training and consulting company. She is the co-author of three other books, including Visual Six Sigma: Making Data Analysis Lean, Second Edition, also published by Wiley. Nitin R. Patel, PhD, is Chairman and cofounder of Cytel, Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years. He is co-author of Data Mining for Business Analytics: Concepts, Techniques, and Applications in XLMiner(r), Third Edition, also published by Wiley.
Dedication i Foreword xvii Preface xviii Acknowledgments xx PART I PRELIMINARIES CHAPTER 1 Introduction 3 1.1 What is Business Analytics? 3 1.2 What is Data Mining? 5 1.3 Data Mining and Related Terms 5 1.4 Big Data 6 1.5 Data science 7 1.6 Why Are There So Many Different Methods? 8 1.7 Terminology and Notation 9 1.8 Road Maps to This Book 11 Order of Topics 12 CHAPTER 2 Overview of the Data Mining Process 15 2.1 Introduction 15 2.2 Core Ideas in Data Mining 16 2.3 The Steps in Data Mining 19 2.4 Preliminary Steps 20 2.5 Predictive Power and Overfitting 28 2.6 Building a Predictive Model with JMP Pro 33 2.7 Using JMP Pro for Data Mining 42 2.8 Automating Data Mining Solutions 42 Data Mining Software Tools (Herb Edelstein) 44 Problems 47 PART II DATA EXPLORATION AND DIMENSION REDUCTION CHAPTER 3 Data Visualization 52 3.1 Uses of Data Visualization 52 3.2 Data Examples 54 Example 1: Boston Housing Data 54 Example 2: Ridership on Amtrak Trains 55 3.3 Basic Charts: Bar Charts, Line Graphs, and Scatterplots 55 Distribution Plots 58 Heatmaps: visualizing correlations and missing values 61 3.4 Multi-Dimensional Visualization 63 Adding Variables: Color, Hue, Size, Shape, Multiple Panels, Animation 63 Manipulations: Re-scaling, Aggregation and Hierarchies, Zooming and Panning, Filtering 67 Reference: Trend Line and Labels 70 Scaling Up: Large Datasets 72 Multivariate Plot: Parallel Coordinates Plot 73 Interactive Visualization 74 3.5 Specialized Visualizations 76 Visualizing Networked Data 76 Visualizing Hierarchical Data: Treemaps 77 Visualizing Geographical Data: Maps 78 3.6 Summary of Major Visualizations and Operations, According to Data Mining Goal 80 Prediction 80 Classification 81 Time Series Forecasting 81 Unsupervised Learning 82 Problems 83 CHAPTER 4 Dimension Reduction 85 4.1 Introduction 85 4.2 Curse of Dimensionality 86 4.3 Practical Considerations 86 Example 1: House Prices in Boston 87 4.4 Data Summaries 88 4.5 Correlation Analysis 91 4.6 Reducing the Number of Categories in Categorical Variables 92 4.7 Converting A Categorical Variable to A Continuous Variable 94 4.8 Principal Components Analysis 94 Example 2: Breakfast Cereals 95 Principal Components 101 Normalizing the Data 102 Using Principal Components for Classification and Prediction 104 4.9 Dimension Reduction Using Regression Models 104 4.10 Dimension Reduction Using Classification and Regression Trees 106 Problems 107 PART III PERFORMANCE EVALUATION CHAPTER 5 Evaluating Predictive Performance 111 5.1 Introduction 111 5.2 Evaluating Predictive Performance 112 Benchmark: The Average 112 Prediction Accuracy Measures 113 5.3 Judging Classifier Performance 115 Benchmark: The Naive Rule 115 Class Separation 115 The Classification Matrix 116 Using the Validation Data 117 Accuracy Measures 117 Cutoff for Classification 118 Performance in Unequal Importance of Classes 122 Asymmetric Misclassification Costs 123 5.4 Judging Ranking Performance 127 5.5 Oversampling 131 Problems 138 PART IV PREDICTION AND CLASSIFICATION METHODS CHAPTER 6 Multiple Linear Regression 141 6.1 Introduction 141 6.2 Explanatory vs. Predictive Modeling 142 6.3 Estimating the Regression Equation and Prediction 143 Example: Predicting the Price of Used Toyota Corolla Automobiles . 144 6.4 Variable Selection in Linear Regression 149 Reducing the Number of Predictors 149 How to Reduce the Number of Predictors 150 Manual Variable Selection 151 Automated Variable Selection 151 Problems 160 CHAPTER 7 k-Nearest Neighbors (kNN) 165 7.1 The k-NN Classifier (categorical outcome) 165 Determining Neighbors 165 Classification Rule 166 Example: Riding Mowers 166 Choosing k 167 Setting the Cutoff Value 169 7.2 k-NN for a Numerical Response 171 7.3 Advantages and Shortcomings of k-NN Algorithms 172 Problems 174 CHAPTER 8 The Naive Bayes Classifier 176 8.1 Introduction 176 Example 1: Predicting Fraudulent Financial Reporting 177 8.2 Applying the Full (Exact) Bayesian Classifier 178 8.3 Advantages and Shortcomings of the Naive Bayes Classifier 187 Advantages and Shortcomings of the naive Bayes Classifier 187 Problems 191 CHAPTER 9 Classification and Regression Trees 194 9.1 Introduction 194 9.2 Classification Trees 195 Example 1: Riding Mowers 196 9.3 Growing a Tree 198 Growing a Tree Example 198 Growing a Tree with CART 203 9.4 Evaluating the Performance of a Classification Tree 203 Example 2: Acceptance of Personal Loan 203 9.5 Avoiding Overfitting 204 Stopping Tree Growth: CHAID 205 Pruning the Tree 207 9.6 Classification Rules from Trees 208 9.7 Classification Trees for More Than two Classes 210 9.8 Regression Trees 210 Prediction 213 Evaluating Performance 214 9.9 Advantages and Weaknesses of a Tree 214 9.10 Improving Prediction: Multiple Trees 216 9.11 CART, and Measures of Impurity 218 Measuring Impurity 218 Problems 221 CHAPTER 10 Logistic Regression 224 10.1 Introduction 224 10.2 The Logistic Regression Model 226 Example: Acceptance of Personal Loan 227 Model with a Single Predictor 229 Estimating the Logistic Model from Data: Computing Parameter Estimates 231 10.3 Evaluating Classification Performance 234 Variable Selection 236 10.4 Example of Complete Analysis: Predicting Delayed Flights 237 Data Preprocessing 240 Model Fitting, Estimation and Interpretation - A Simple Model 240 Model Fitting, Estimation and Interpretation - The Full Model 241 Model Performance 243 Variable Selection 245 10.5 Appendix: Logistic Regression for Profiling 249 Appendix A: Why Linear Regression Is Inappropriate for a Categorical Response 249 Appendix B: Evaluating Explanatory Power 250 Appendix C: Logistic Regression for More Than Two Classes 253 Problems 257 CHAPTER 11 Neural Nets 260 11.1 Introduction 260 11.2 Concept and Structure of a Neural Network 261 11.3 Fitting a Network to Data 261 Example 1: Tiny Dataset 262 Computing Output of Nodes 263 Preprocessing the Data 266 Training the Model 267 Using the Output for Prediction and Classification 272 Example 2: Classifying Accident Severity 273 Avoiding overfitting 275 11.4 User Input in JMP Pro 277 11.5 Exploring the Relationship Between Predictors and Response 280 11.6 Advantages and Weaknesses of Neural Networks 281 Problems 282 CHAPTER 12 Discriminant Analysis 284 12.1 Introduction 284 Example 1: Riding Mowers 285 Example 2: Personal Loan Acceptance 285 12.2 Distance of an Observation from a Class 286 12.3 From Distances to Propensities and Classifications 288 12.4 Classification Performance of Discriminant Analysis 292 12.5 Prior Probabilities 293 12.6 Classifying More Than Two Classes 294 Example 3: Medical Dispatch to Accident Scenes 294 12.7 Advantages and Weaknesses 296 Problems 299 CHAPTER 13 Combining Methods: Ensembles and Uplift Modeling 302 13.1 Ensembles 303 Why Ensembles Can Improve Predictive Power 303 Simple Averaging 305 Bagging 306 Boosting 306 Advantages and Weaknesses of Ensembles 307 13.2 Uplift (Persuasion) Modeling 308 A-B Testing 308 Uplift 308 Gathering the Data 309 A Simple Model 310 Modeling Individual Uplift 311 Using the Results of an Uplift Model 312 Creating Uplift Models in JMP Pro 313 13.3 Summary 315 Problems 316 PART V MINING RELATIONSHIPS AMONG RECORDS CHAPTER 14 Cluster Analysis 320 14.1 Introduction 320 Example: Public Utilities 322 14.2 Measuring Distance Between Two Observations 324 Euclidean Distance 324 Normalizing Numerical Measurements 324 Other Distance Measures for Numerical Data 326 Distance Measures for Categorical Data 327 Distance Measures for Mixed Data 327 14.3 Measuring Distance Between Two Clusters 328 14.4 Hierarchical (Agglomerative) Clustering 330 Single Linkage 332 Complete Linkage 332 Average Linkage 333 Centroid Linkage 333 Dendrograms: Displaying Clustering Process and Results 334 Validating Clusters 335 Limitations of Hierarchical Clustering 339 14.5 Nonhierarchical Clustering: The k-Means Algorithm 340 Initial Partition into k Clusters 342 Problems 350 PART VI FORECASTING TIME SERIES CHAPTER 15 Handling Time Series 355 15.1 Introduction 355 15.2 Descriptive vs. Predictive Modeling 356 15.3 Popular Forecasting Methods in Business 357 Combining Methods 357 15.4 Time Series Components 358 Example: Ridership on Amtrak Trains 358 15.5 Data Partitioning and Performance Evaluation 362 Benchmark Performance: Naive Forecasts 362 Generating Future Forecasts 363 Problems 365 CHAPTER 16 Regression-Based Forecasting 368 16.1 A Model with Trend 368 Linear Trend 368 Exponential Trend 372 Polynomial Trend 374 16.2 A Model with Seasonality 375 16.3 A Model with Trend and Seasonality 378 16.4 Autocorrelation and ARIMA Models 378 Computing Autocorrelation 380 Computing Autocorrelation 380 Improving Forecasts by Integrating Autocorrelation Information 383 Improving Forecasts by Integrating Autocorrelation Information383 Fitting AR Models to Residuals 384 Fitting AR Models to Residuals 384 Evaluating Predictability 387 Evaluating Predictability 387 Problems 389 CHAPTER 17 Smoothing Methods 399 17.1 Introduction 399 17.2 Moving Average 400 Centered Moving Average for Visualization 400 Trailing Moving Average for Forecasting 401 Choosing Window Width (w) 404 17.3 Simple Exponential Smoothing 405 Choosing Smoothing Parameter 406 Relation Between Moving Average and Simple Exponential Smoothing 408 17.4 Advanced Exponential Smoothing 409 Series with a trend 409 Series with a Trend and Seasonality 410 Problems 414 PART VII CASES CHAPTER 18 Cases 425 18.1 Charles Book Club 425 18.2 German Credit 434 Background 434 Data 434 18.3 Tayko Software Cataloger 439 18.4 Political Persuasion 442 Background 442 Predictive Analytics Arrives in US Politics 442 Political Targeting 442 Uplift 443 Data 444 Assignment 444 18.5 Taxi Cancellations 446 Business Situation 446 Assignment 446 18.6 Segmenting Consumers of Bath Soap 448 Appendix 451 18.7 Direct-Mail Fundraising 452 18.8 Predicting Bankruptcy 455 18.9 Time Series Case: Forecasting Public Transportation Demand 458 References 460 Data Files Used in the Book 461 Index 463
Show moreData Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro(r) presents an applied and interactive approach to data mining. Featuring hands-on applications with JMP Pro(r), a statistical package from the SAS Institute, the book uses engaging, real-world examples to build a theoretical and practical understanding of key data mining methods, especially predictive models for classification and prediction. Topics include data visualization, dimension reduction techniques, clustering, linear and logistic regression, classification and regression trees, discriminant analysis, naive Bayes, neural networks, uplift modeling, ensemble models, and time series forecasting. Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro(r) also includes: * Detailed summaries that supply an outline of key topics at the beginning of each chapter * End-of-chapter examples and exercises that allow readers to expand their comprehension of the presented material * Data-rich case studies to illustrate various applications of data mining techniques * A companion website with over two dozen data sets, exercises and case study solutions, and slides for instructors Data Mining for Business Analytics: Concepts, Techniques, and Applications with JMP Pro(r) is an excellent textbook for advanced undergraduate and graduate-level courses on data mining, predictive analytics, and business analytics. The book is also a one-of-a-kind resource for data scientists, analysts, researchers, and practitioners working with analytics in the fields of management, finance, marketing, information technology, healthcare, education, and any other data-rich field. Galit Shmueli, PhD, is Distinguished Professor at National Tsing Hua University's Institute of Service Science. She has designed and instructed data mining courses since 2004 at University of Maryland, Statistics.com, Indian School of Business, and National Tsing Hua University, Taiwan. Professor Shmueli is known for her research and teaching in business analytics, with a focus on statistical and data mining methods in information systems and healthcare. She has authored over 70 journal articles, books, textbooks, and book chapters, including Data Mining for Business Analytics: Concepts, Techniques, and Applications in XLMiner(r), Third Edition, also published by Wiley. Peter C. Bruce is President and Founder of the Institute for Statistics Education at www.statistics.com He has written multiple journal articles and is the developer of Resampling Stats software. He is the author of Introductory Statistics and Analytics: A Resampling Perspective and co-author of Data Mining for Business Analytics: Concepts, Techniques, and Applications in XLMiner (r), Third Edition, both published by Wiley. Mia Stephens is Academic Ambassador at JMP(r), a division of SAS Institute. Prior to joining SAS, she was an adjunct professor of statistics at the University of New Hampshire and a founding member of the North Haven Group LLC, a statistical training and consulting company. She is the co-author of three other books, including Visual Six Sigma: Making Data Analysis Lean, Second Edition, also published by Wiley. Nitin R. Patel, PhD, is Chairman and cofounder of Cytel, Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years. He is co-author of Data Mining for Business Analytics: Concepts, Techniques, and Applications in XLMiner(r), Third Edition, also published by Wiley.
Dedication i Foreword xvii Preface xviii Acknowledgments xx PART I PRELIMINARIES CHAPTER 1 Introduction 3 1.1 What is Business Analytics? 3 1.2 What is Data Mining? 5 1.3 Data Mining and Related Terms 5 1.4 Big Data 6 1.5 Data science 7 1.6 Why Are There So Many Different Methods? 8 1.7 Terminology and Notation 9 1.8 Road Maps to This Book 11 Order of Topics 12 CHAPTER 2 Overview of the Data Mining Process 15 2.1 Introduction 15 2.2 Core Ideas in Data Mining 16 2.3 The Steps in Data Mining 19 2.4 Preliminary Steps 20 2.5 Predictive Power and Overfitting 28 2.6 Building a Predictive Model with JMP Pro 33 2.7 Using JMP Pro for Data Mining 42 2.8 Automating Data Mining Solutions 42 Data Mining Software Tools (Herb Edelstein) 44 Problems 47 PART II DATA EXPLORATION AND DIMENSION REDUCTION CHAPTER 3 Data Visualization 52 3.1 Uses of Data Visualization 52 3.2 Data Examples 54 Example 1: Boston Housing Data 54 Example 2: Ridership on Amtrak Trains 55 3.3 Basic Charts: Bar Charts, Line Graphs, and Scatterplots 55 Distribution Plots 58 Heatmaps: visualizing correlations and missing values 61 3.4 Multi-Dimensional Visualization 63 Adding Variables: Color, Hue, Size, Shape, Multiple Panels, Animation 63 Manipulations: Re-scaling, Aggregation and Hierarchies, Zooming and Panning, Filtering 67 Reference: Trend Line and Labels 70 Scaling Up: Large Datasets 72 Multivariate Plot: Parallel Coordinates Plot 73 Interactive Visualization 74 3.5 Specialized Visualizations 76 Visualizing Networked Data 76 Visualizing Hierarchical Data: Treemaps 77 Visualizing Geographical Data: Maps 78 3.6 Summary of Major Visualizations and Operations, According to Data Mining Goal 80 Prediction 80 Classification 81 Time Series Forecasting 81 Unsupervised Learning 82 Problems 83 CHAPTER 4 Dimension Reduction 85 4.1 Introduction 85 4.2 Curse of Dimensionality 86 4.3 Practical Considerations 86 Example 1: House Prices in Boston 87 4.4 Data Summaries 88 4.5 Correlation Analysis 91 4.6 Reducing the Number of Categories in Categorical Variables 92 4.7 Converting A Categorical Variable to A Continuous Variable 94 4.8 Principal Components Analysis 94 Example 2: Breakfast Cereals 95 Principal Components 101 Normalizing the Data 102 Using Principal Components for Classification and Prediction 104 4.9 Dimension Reduction Using Regression Models 104 4.10 Dimension Reduction Using Classification and Regression Trees 106 Problems 107 PART III PERFORMANCE EVALUATION CHAPTER 5 Evaluating Predictive Performance 111 5.1 Introduction 111 5.2 Evaluating Predictive Performance 112 Benchmark: The Average 112 Prediction Accuracy Measures 113 5.3 Judging Classifier Performance 115 Benchmark: The Naive Rule 115 Class Separation 115 The Classification Matrix 116 Using the Validation Data 117 Accuracy Measures 117 Cutoff for Classification 118 Performance in Unequal Importance of Classes 122 Asymmetric Misclassification Costs 123 5.4 Judging Ranking Performance 127 5.5 Oversampling 131 Problems 138 PART IV PREDICTION AND CLASSIFICATION METHODS CHAPTER 6 Multiple Linear Regression 141 6.1 Introduction 141 6.2 Explanatory vs. Predictive Modeling 142 6.3 Estimating the Regression Equation and Prediction 143 Example: Predicting the Price of Used Toyota Corolla Automobiles . 144 6.4 Variable Selection in Linear Regression 149 Reducing the Number of Predictors 149 How to Reduce the Number of Predictors 150 Manual Variable Selection 151 Automated Variable Selection 151 Problems 160 CHAPTER 7 k-Nearest Neighbors (kNN) 165 7.1 The k-NN Classifier (categorical outcome) 165 Determining Neighbors 165 Classification Rule 166 Example: Riding Mowers 166 Choosing k 167 Setting the Cutoff Value 169 7.2 k-NN for a Numerical Response 171 7.3 Advantages and Shortcomings of k-NN Algorithms 172 Problems 174 CHAPTER 8 The Naive Bayes Classifier 176 8.1 Introduction 176 Example 1: Predicting Fraudulent Financial Reporting 177 8.2 Applying the Full (Exact) Bayesian Classifier 178 8.3 Advantages and Shortcomings of the Naive Bayes Classifier 187 Advantages and Shortcomings of the naive Bayes Classifier 187 Problems 191 CHAPTER 9 Classification and Regression Trees 194 9.1 Introduction 194 9.2 Classification Trees 195 Example 1: Riding Mowers 196 9.3 Growing a Tree 198 Growing a Tree Example 198 Growing a Tree with CART 203 9.4 Evaluating the Performance of a Classification Tree 203 Example 2: Acceptance of Personal Loan 203 9.5 Avoiding Overfitting 204 Stopping Tree Growth: CHAID 205 Pruning the Tree 207 9.6 Classification Rules from Trees 208 9.7 Classification Trees for More Than two Classes 210 9.8 Regression Trees 210 Prediction 213 Evaluating Performance 214 9.9 Advantages and Weaknesses of a Tree 214 9.10 Improving Prediction: Multiple Trees 216 9.11 CART, and Measures of Impurity 218 Measuring Impurity 218 Problems 221 CHAPTER 10 Logistic Regression 224 10.1 Introduction 224 10.2 The Logistic Regression Model 226 Example: Acceptance of Personal Loan 227 Model with a Single Predictor 229 Estimating the Logistic Model from Data: Computing Parameter Estimates 231 10.3 Evaluating Classification Performance 234 Variable Selection 236 10.4 Example of Complete Analysis: Predicting Delayed Flights 237 Data Preprocessing 240 Model Fitting, Estimation and Interpretation - A Simple Model 240 Model Fitting, Estimation and Interpretation - The Full Model 241 Model Performance 243 Variable Selection 245 10.5 Appendix: Logistic Regression for Profiling 249 Appendix A: Why Linear Regression Is Inappropriate for a Categorical Response 249 Appendix B: Evaluating Explanatory Power 250 Appendix C: Logistic Regression for More Than Two Classes 253 Problems 257 CHAPTER 11 Neural Nets 260 11.1 Introduction 260 11.2 Concept and Structure of a Neural Network 261 11.3 Fitting a Network to Data 261 Example 1: Tiny Dataset 262 Computing Output of Nodes 263 Preprocessing the Data 266 Training the Model 267 Using the Output for Prediction and Classification 272 Example 2: Classifying Accident Severity 273 Avoiding overfitting 275 11.4 User Input in JMP Pro 277 11.5 Exploring the Relationship Between Predictors and Response 280 11.6 Advantages and Weaknesses of Neural Networks 281 Problems 282 CHAPTER 12 Discriminant Analysis 284 12.1 Introduction 284 Example 1: Riding Mowers 285 Example 2: Personal Loan Acceptance 285 12.2 Distance of an Observation from a Class 286 12.3 From Distances to Propensities and Classifications 288 12.4 Classification Performance of Discriminant Analysis 292 12.5 Prior Probabilities 293 12.6 Classifying More Than Two Classes 294 Example 3: Medical Dispatch to Accident Scenes 294 12.7 Advantages and Weaknesses 296 Problems 299 CHAPTER 13 Combining Methods: Ensembles and Uplift Modeling 302 13.1 Ensembles 303 Why Ensembles Can Improve Predictive Power 303 Simple Averaging 305 Bagging 306 Boosting 306 Advantages and Weaknesses of Ensembles 307 13.2 Uplift (Persuasion) Modeling 308 A-B Testing 308 Uplift 308 Gathering the Data 309 A Simple Model 310 Modeling Individual Uplift 311 Using the Results of an Uplift Model 312 Creating Uplift Models in JMP Pro 313 13.3 Summary 315 Problems 316 PART V MINING RELATIONSHIPS AMONG RECORDS CHAPTER 14 Cluster Analysis 320 14.1 Introduction 320 Example: Public Utilities 322 14.2 Measuring Distance Between Two Observations 324 Euclidean Distance 324 Normalizing Numerical Measurements 324 Other Distance Measures for Numerical Data 326 Distance Measures for Categorical Data 327 Distance Measures for Mixed Data 327 14.3 Measuring Distance Between Two Clusters 328 14.4 Hierarchical (Agglomerative) Clustering 330 Single Linkage 332 Complete Linkage 332 Average Linkage 333 Centroid Linkage 333 Dendrograms: Displaying Clustering Process and Results 334 Validating Clusters 335 Limitations of Hierarchical Clustering 339 14.5 Nonhierarchical Clustering: The k-Means Algorithm 340 Initial Partition into k Clusters 342 Problems 350 PART VI FORECASTING TIME SERIES CHAPTER 15 Handling Time Series 355 15.1 Introduction 355 15.2 Descriptive vs. Predictive Modeling 356 15.3 Popular Forecasting Methods in Business 357 Combining Methods 357 15.4 Time Series Components 358 Example: Ridership on Amtrak Trains 358 15.5 Data Partitioning and Performance Evaluation 362 Benchmark Performance: Naive Forecasts 362 Generating Future Forecasts 363 Problems 365 CHAPTER 16 Regression-Based Forecasting 368 16.1 A Model with Trend 368 Linear Trend 368 Exponential Trend 372 Polynomial Trend 374 16.2 A Model with Seasonality 375 16.3 A Model with Trend and Seasonality 378 16.4 Autocorrelation and ARIMA Models 378 Computing Autocorrelation 380 Computing Autocorrelation 380 Improving Forecasts by Integrating Autocorrelation Information 383 Improving Forecasts by Integrating Autocorrelation Information383 Fitting AR Models to Residuals 384 Fitting AR Models to Residuals 384 Evaluating Predictability 387 Evaluating Predictability 387 Problems 389 CHAPTER 17 Smoothing Methods 399 17.1 Introduction 399 17.2 Moving Average 400 Centered Moving Average for Visualization 400 Trailing Moving Average for Forecasting 401 Choosing Window Width (w) 404 17.3 Simple Exponential Smoothing 405 Choosing Smoothing Parameter 406 Relation Between Moving Average and Simple Exponential Smoothing 408 17.4 Advanced Exponential Smoothing 409 Series with a trend 409 Series with a Trend and Seasonality 410 Problems 414 PART VII CASES CHAPTER 18 Cases 425 18.1 Charles Book Club 425 18.2 German Credit 434 Background 434 Data 434 18.3 Tayko Software Cataloger 439 18.4 Political Persuasion 442 Background 442 Predictive Analytics Arrives in US Politics 442 Political Targeting 442 Uplift 443 Data 444 Assignment 444 18.5 Taxi Cancellations 446 Business Situation 446 Assignment 446 18.6 Segmenting Consumers of Bath Soap 448 Appendix 451 18.7 Direct-Mail Fundraising 452 18.8 Predicting Bankruptcy 455 18.9 Time Series Case: Forecasting Public Transportation Demand 458 References 460 Data Files Used in the Book 461 Index 463
Show moreFOREWORD xvii
PREFACE xix
ACKNOWLEDGMENTS xxi
PART I PRELIMINARIES
1 Introduction 3
1.1 What Is Business Analytics? 3
Who Uses Predictive Analytics? 4
1.2 What Is Data Mining? 5
1.3 Data Mining and Related Terms 5
1.4 Big Data 6
1.5 Data Science 7
1.6 Why Are There So Many Different Methods? 7
1.7 Terminology and Notation 8
1.8 Roadmap to This Book 10
Order of Topics 11
Using JMP Pro, Statistical Discovery Software from SAS 11
2 Overview of the Data Mining Process 14
2.1 Introduction 14
2.2 Core Ideas in Data Mining 15
Classification 15
Prediction 15
Association Rules and Recommendation Systems 15
Predictive Analytics 16
Data Reduction and Dimension Reduction 16
Data Exploration and Visualization 16
Supervised and Unsupervised Learning 16
2.3 The Steps in Data Mining 17
2.4 Preliminary Steps 19
Organization of Datasets 19
Sampling from a Database 19
Oversampling Rare Events in Classification Tasks 19
Preprocessing and Cleaning the Data 20
Changing Modeling Types in JMP 20
Standardizing Data in JMP 25
2.5 Predictive Power and Overfitting 25
Creation and Use of Data Partitions 25
Partitioning Data for Crossvalidation in JMP Pro 27
Overfitting 27
2.6 Building a Predictive Model with JMP Pro 29
Predicting Home Values in a Boston Neighborhood 29
Modeling Process 30
Setting the Random Seed in JMP 34
2.7 Using JMP Pro for Data Mining 38
2.8 Automating Data Mining Solutions 40
Data Mining Software Tools: the State of theMarket by Herb Edelstein 41
Problems 44
PART II DATA EXPLORATION AND DIMENSION REDUCTION
3 Data Visualization 51
3.1 Uses of Data Visualization 51
3.2 Data Examples 52
Example 1: Boston Housing Data 53
Example 2: Ridership on Amtrak Trains 53
3.3 Basic Charts: Bar Charts, Line Graphs, and Scatterplots 54
Using The JMP Graph Builder 54
Distribution Plots: Boxplots and Histograms 56
Tools for Data Visualization in JMP 59
Heatmaps (Color Maps and Cell Plots): Visualizing Correlations and Missing Values 59
3.4 Multidimensional Visualization 61
Adding Variables: Color, Size, Shape, Multiple Panels, and Animation 62
Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering 65
Reference: Trend Lines and Labels 68
Adding Trendlines in the Graph Builder 69
Scaling Up: Large Datasets 70
Multivariate Plot: Parallel Coordinates Plot 71
Interactive Visualization 72
3.5 Specialized Visualizations 73
Visualizing Networked Data 74
Visualizing Hierarchical Data: More on Treemaps 75
Visualizing Geographical Data: Maps 76
3.6 Summary of Major Visualizations and Operations, According to Data
Mining Goal 77
Prediction 77
Classification 78
Time Series Forecasting 78
Unsupervised Learning 79
Problems 79
4 Dimension Reduction 81
4.1 Introduction 81
4.2 Curse of Dimensionality 82
4.3 Practical Considerations 82
Example 1: House Prices in Boston 82
4.4 Data Summaries 83
Summary Statistics 83
Tabulating Data (Pivot Tables) 85
4.5 Correlation Analysis 87
4.6 Reducing the Number of Categories in Categorical Variables 87
4.7 Converting a Categorical Variable to a Continuous Variable 90
4.8 Principal Components Analysis 90
Example 2: Breakfast Cereals 91
Principal Components 95
Normalizing the Data 97
Using Principal Components for Classification and Prediction 100
4.9 Dimension Reduction Using Regression Models 100
4.10 Dimension Reduction Using Classification and Regression Trees 100
Problems 101
PART III PERFORMANCE EVALUATION
5 Evaluating Predictive Performance 105
5.1 Introduction 105
5.2 Evaluating Predictive Performance 106
Benchmark: The Average 106
Prediction Accuracy Measures 107
Comparing Training and Validation Performance 108
5.3 Judging Classifier Performance 109
Benchmark: The Naive Rule 109
Class Separation 109
The Classification Matrix 109
Using the Validation Data 111
Accuracy Measures 111
Propensities and Cutoff for Classification 112
Cutoff Values for Triage 112
Changing the Cutoff Values for a Confussion Matrix in JMP 114
Performance in Unequal Importance of Classes 115
False-Positive and False-Negative Rates 116
Asymmetric Misclassification Costs 116
Asymmetric Misclassification Costs in JMP 119
Generalization to More Than Two Classes 120
5.4 Judging Ranking Performance 120
Lift Curves 120
Beyond Two Classes 122
Lift Curves Incorporating Costs and Benefits 122
5.5 Oversampling 123
Oversampling the Training Set 126
Stratified Sampling and Oversampling in JMP 126
Evaluating Model Performance Using a Nonoversampled Validation Set 126
Evaluating Model Performance If Only Oversampled Validation Set Exists 127
Applying Sampling Weights in JMP 128
Problems 129
PART IV PREDICTION AND CLASSIFICATION METHODS
6 Multiple Linear Regression 133
6.1 Introduction 133
6.2 Explanatory versus Predictive Modeling 134
6.3 Estimating the Regression Equation and Prediction 135
Example: Predicting the Price of Used Toyota Corolla Automobiles 136
Coding of Categorical Variables in Regression 138
Additional Options for Regression Models in JMP 140
6.4 Variable Selection in Linear Regression 141
Reducing the Number of Predictors 141
How to Reduce the Number of Predictors 142
Manual Variable Selection 142
Automated Variable Selection 142
Coding of Categorical Variables in Stepwise Regression 143
Working with the All Possible Models Output 145
When Using a Stopping Algorithm in JMP 147
Other Regression Procedures in JMP Pro—Generalized Regression 149
Problems 150
7 k-Nearest Neighbors (k-NN) 155
7.1 The 𝑘-NN Classifier (Categorical Outcome) 155
Determining Neighbors 155
Classification Rule 156
Example: Riding Mowers 156
Choosing 𝑘 157
𝑘 Nearest Neighbors in JMP Pro 158
The Cutoff Value for Classification 159
𝑘-NN Predictions and Prediction Formulas in JMP Pro 161
𝑘-NN with More Than Two Classes 161
7.2 𝑘-NN for a Numerical Response 161
Pandora 161
7.3 Advantages and Shortcomings of 𝑘-NN Algorithms 163
Problems 164
8 The Naive Bayes Classifier 167
8.1 Introduction 167
Naive Bayes Method 167
Cutoff Probability Method 168
Conditional Probability 168
Example 1: Predicting Fraudulent Financial Reporting 168
8.2 Applying the Full (Exact) Bayesian Classifier 169
Using the ‘‘Assign to the Most Probable Class’’ Method 169
Using the Cutoff Probability Method 169
Practical Difficulty with the Complete (Exact) Bayes Procedure 170
Solution: Naive Bayes 170
Example 2: Predicting Fraudulent Financial Reports, Two Predictors 172
Using the JMP Naive Bayes Add-in 174
Example 3: Predicting Delayed Flights 174
8.3 Advantages and Shortcomings of the Naive Bayes Classifier 179
Spam Filtering 179
Problems 180
9 Classification and Regression Trees 183
9.1 Introduction 183
9.2 Classification Trees 184
Recursive Partitioning 184
Example 1: Riding Mowers 185
Categorical Predictors 186
9.3 Growing a Tree 187
Growing a Tree Example 187
Classifying a New Observation 188
Fitting Classification Trees in JMP Pro 191
Growing a Tree with CART 192
9.4 Evaluating the Performance of a Classification Tree 192
Example 2: Acceptance of Personal Loan 192
9.5 Avoiding Overfitting 193
Stopping Tree Growth: CHAID 194
Growing a Full Tree and Pruning It Back 194
How JMP Limits Tree Size 196
9.6 Classification Rules from Trees 196
9.7 Classification Trees for More Than Two Classes 198
9.8 Regression Trees 199
Prediction 199
Evaluating Performance 200
9.9 Advantages and Weaknesses of a Tree 200
9.10 Improving Prediction: Multiple Trees 204
Fitting Ensemble Tree Models in JMP Pro 206
9.11 CART and Measures of Impurity 207
Problems 207
10 Logistic Regression 211
10.1 Introduction 211
Logistic Regression and Consumer Choice Theory 212
10.2 The Logistic Regression Model 213
Example: Acceptance of Personal Loan (Universal Bank) 214
Indicator (Dummy) Variables in JMP 216
Model with a Single Predictor 216
Fitting One Predictor Logistic Models in JMP 218
Estimating the Logistic Model from Data: Multiple Predictors 218
Fitting Logistic Models in JMP with More Than One Predictor 221
10.3 Evaluating Classification Performance 221
Variable Selection 222
10.4 Example of Complete Analysis: Predicting Delayed Flights 223
Data Preprocessing 225
Model Fitting, Estimation and Interpretation---A Simple Model 226
Model Fitting, Estimation and Interpretation---The Full Model 227
Model Performance 229
Variable Selection 230
Regrouping and Recoding Variables in JMP 232
10.5 Appendixes: Logistic Regression for Profiling 234
Appendix A: Why Linear Regression Is Problematic for a
Categorical Response 234
Appendix B: Evaluating Explanatory Power 236
Appendix C: Logistic Regression for More Than Two Classes 238
Nominal Classes 238
Problems 241
11 Neural Nets 245
11.1 Introduction 245
11.2 Concept and Structure of a Neural Network 246
11.3 Fitting a Network to Data 246
Example 1: Tiny Dataset 246
Computing Output of Nodes 248
Preprocessing the Data 251
Activation Functions and Data Processing Features in JMP Pro 251
Training the Model 251
Fitting a Neural Network in JMP Pro 254
Using the Output for Prediction and Classification 256
Example 2: Classifying Accident Severity 258
Avoiding overfitting 259
11.4 User Input in JMP Pro 260
Unsupervised Feature Extraction and Deep Learning 263
11.5 Exploring the Relationship between Predictors and Response 264
Understanding Neural Models in JMP Pro 264
11.6 Advantages and Weaknesses of Neural Networks 264
Problems 265
12 Discriminant Analysis 268
12.1 Introduction 268
Example 1: Riding Mowers 269
Example 2: Personal Loan Acceptance (Universal Bank) 269
12.2 Distance of an Observation from a Class 270
12.3 From Distances to Propensities and Classifications 272
Linear Discriminant Analysis in JMP 275
12.4 Classification Performance of Discriminant Analysis 275
12.5 Prior Probabilities 277
12.6 Classifying More Than Two Classes 278
Example 3: Medical Dispatch to Accident Scenes 278
Using Categorical Predictors in Discriminant Analysis in JMP 279
12.7 Advantages and Weaknesses 280
Problems 282
13 Combining Methods: Ensembles and Uplift Modeling 285
13.1 Ensembles 285
Why Ensembles Can Improve Predictive Power 286
The Wisdom of Crowds 287
Simple Averaging 287
Bagging 288
Boosting 288
Creating Ensemble Models in JMP Pro 289
Advantages and Weaknesses of Ensembles 289
13.2 Uplift (Persuasion) Modeling 290
A-B Testing 290
Uplift 290
Gathering the Data 291
A Simple Model 292
Modeling Individual Uplift 293
Using the Results of an Uplift Model 294
Creating Uplift Models in JMP Pro 294
Using the Uplift Platform in JMP Pro 295
13.3 Summary 295
Problems 297
PART V MINING RELATIONSHIPS AMONG RECORDS
14 Cluster Analysis 301
14.1 Introduction 301
Example: Public Utilities 302
14.2 Measuring Distance between Two Observations 305
Euclidean Distance 305
Normalizing Numerical Measurements 305
Other Distance Measures for Numerical Data 306
Distance Measures for Categorical Data 308
Distance Measures for Mixed Data 308
14.3 Measuring Distance between Two Clusters 309
Minimum Distance 309
Maximum Distance 309
Average Distance 309
Centroid Distance 309
14.4 Hierarchical (Agglomerative) Clustering 311
Hierarchical Clustering in JMP and JMP Pro 311
Hierarchical Agglomerative Clustering Algorithm 312
Single Linkage 312
Complete Linkage 313
Average Linkage 313
Centroid Linkage 313
Ward’s Method 314
Dendrograms: Displaying Clustering Process and Results 314
Validating Clusters 316
Two-Way Clustering 318
Limitations of Hierarchical Clustering 319
14.5 Nonhierarchical Clustering: The 𝑘-Means Algorithm 320
𝑘-Means Clustering Algorithm 321
Initial Partition into 𝐾 Clusters 322
𝐾-Means Clustering in JMP 322
Problems 329
PART VI FORECASTING TIME SERIES
15 Handling Time Series 335
15.1 Introduction 335
15.2 Descriptive versus Predictive Modeling 336
15.3 Popular Forecasting Methods in Business 337
Combining Methods 337
15.4 Time Series Components 337
Example: Ridership on Amtrak Trains 337
15.5 Data Partitioning and Performance Evaluation 341
Benchmark Performance: Naive Forecasts 342
Generating Future Forecasts 342
Partitioning Time Series Data in JMP and Validating
Time Series Models 342
Problems 343
16 Regression-Based Forecasting 346
16.1 A Model with Trend 346
Linear Trend 346
Fitting a Model with Linear Trend in JMP 348
Creating Actual versus Predicted Plots and Residual Plots in JMP 350
Exponential Trend 350
Computing Forecast Errors for Exponential Trend Models 352
Polynomial Trend 352
Fitting a Polynomial Trend in JMP 353
16.2 A Model with Seasonality 353
16.3 A Model with Trend and Seasonality 356
16.4 Autocorrelation and ARIMA Models 356
Computing Autocorrelation 356
Improving Forecasts by Integrating Autocorrelation Information 360
Fitting AR (Autoregression) Models in the JMP Time Series
Platform 361
Fitting AR Models to Residuals 361
Evaluating Predictability 363
Summary: Fitting Regression-Based Time Series Models in JMP 365
Problems 366
17 Smoothing Methods 377
17.1 Introduction 377
17.2 Moving Average 378
Centered Moving Average for Visualization 378
Trailing Moving Average for Forecasting 379
Computing a Trailing Moving Average Forecast in JMP 380
Choosing Window Width (𝑤) 382
17.3 Simple Exponential Smoothing 382
Choosing Smoothing Parameter 𝛼 383
Fitting Simple Exponential Smoothing Models in JMP 384
Creating Plots for Actual versus Forecasted Series and Residuals Series Using the Graph Builder 386
Relation between Moving Average and Simple Exponential Smoothing 386
17.4 Advanced Exponential Smoothing 387
Series with a Trend 387
Series with a Trend and Seasonality 388
Problems 390
PART VII CASES
18 Cases 402
18.1 Charles Book Club 401
The Book Industry 401
Database Marketing at Charles 402
Data Mining Techniques 403
Assignment 405
18.2 German Credit 409
Background 409
Data 409
Assignment 409
18.3 Tayko Software Cataloger 410
Background 410
The Mailing Experiment 413
Data 413
Assignment 413
18.4 Political Persuasion 415
Background 415
Predictive Analytics Arrives in US Politics 415
Political Targeting 416
Uplift 416
Data 417
Assignment 417
18.5 Taxi Cancellations 419
Business Situation 419
Assignment 419
18.6 Segmenting Consumers of Bath Soap 420
Business Situation 420
Key Problems 421
Data 421
Measuring Brand Loyalty 421
Assignment 421
18.7 Direct-Mail Fundraising 423
Background 423
Data 424
Assignment 425
18.8 Predicting Bankruptcy 425
Predicting Corporate Bankruptcy 426
Assignment 428
18.9 Time Series Case: Forecasting Public Transportation Demand 428
Background 428
Problem Description 428
Available Data 428
Assignment Goal 429
Assignment 429
Tips and Suggested Steps 429
References 431
Data Files Used in the Book 433
Index 435
Galit Shmueli, PhD, is Distinguished Professor at National Tsing Hua University's Institute of Service Science. She has designed and instructed data mining courses since 2004 at University of Maryland, Statistics.com, Indian School of Business, and National Tsing Hua University, Taiwan. Professor Shmueli is known for her research and teaching in business analytics, with a focus on statistical and data mining methods in information systems and healthcare. She has authored over 70 journal articles, books, textbooks, and book chapters, including Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner®, Third Edition, also published by Wiley.
Peter C. Bruce is President and Founder of the Institute for Statistics Education at www.statistics.com He has written multiple journal articles and is the developer of Resampling Stats software. He is the author of Introductory Statistics and Analytics: A Resampling Perspective and co-author of Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner ®, Third Edition, both published by Wiley.
Mia Stephens is Academic Ambassador at JMP®, a division of SAS Institute. Prior to joining SAS, she was an adjunct professor of statistics at the University of New Hampshire and a founding member of the North Haven Group LLC, a statistical training and consulting company. She is the co-author of three other books, including Visual Six Sigma: Making Data Analysis Lean, Second Edition, also published by Wiley.
Nitin R. Patel, PhD, is Chairman and cofounder of Cytel, Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years. He is co-author of Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner®, Third Edition, also published by Wiley.
![]() |
Ask a Question About this Product More... |
![]() |