Hardback : £92.19
Wide coverage of traditional unsupervised and supervised methods and newer contemporary approaches that help researchers handle the rapid growth of classification methods in DNA microarray studies
Proliferating classification methods in DNA microarray studies have resulted in a body of information scattered throughout literature, conference proceedings, and elsewhere. This book unites many of these classification methods in a single volume. In addition to traditional statistical methods, it covers newer machine-learning approaches such as fuzzy methods, artificial neural networks, evolutionary-based genetic algorithms, support vector machines, swarm intelligence involving particle swarm optimization, and more.
Classification Analysis of DNA Microarrays provides highly detailed pseudo-code and rich, graphical programming features, plus ready-to-run source code. Along with primary methods that include traditional and contemporary classification, it offers supplementary tools and data preparation routines for standardization and fuzzification; dimensional reduction via crisp and fuzzy c-means, PCA, and non-linear manifold learning; and computational linguistics via text analytics and n-gram analysis, recursive feature extraction during ANN, kernel-based methods, ensemble classifier fusion.
This powerful new resource:
Classification Analysis of DNA Microarrays is useful for professionals and graduate students in computer science, bioinformatics, biostatistics, systems biology, and many related fields.
Show moreWide coverage of traditional unsupervised and supervised methods and newer contemporary approaches that help researchers handle the rapid growth of classification methods in DNA microarray studies
Proliferating classification methods in DNA microarray studies have resulted in a body of information scattered throughout literature, conference proceedings, and elsewhere. This book unites many of these classification methods in a single volume. In addition to traditional statistical methods, it covers newer machine-learning approaches such as fuzzy methods, artificial neural networks, evolutionary-based genetic algorithms, support vector machines, swarm intelligence involving particle swarm optimization, and more.
Classification Analysis of DNA Microarrays provides highly detailed pseudo-code and rich, graphical programming features, plus ready-to-run source code. Along with primary methods that include traditional and contemporary classification, it offers supplementary tools and data preparation routines for standardization and fuzzification; dimensional reduction via crisp and fuzzy c-means, PCA, and non-linear manifold learning; and computational linguistics via text analytics and n-gram analysis, recursive feature extraction during ANN, kernel-based methods, ensemble classifier fusion.
This powerful new resource:
Classification Analysis of DNA Microarrays is useful for professionals and graduate students in computer science, bioinformatics, biostatistics, systems biology, and many related fields.
Show morePreface xix
Abbreviations xxiii
1 Introduction 11.1 Class Discovery 2
1.2 Dimensional Reduction 4
1.3 Class Prediction 4
1.4 Classification Rules of Thumb 5
1.5 DNA Microarray Datasets Used 9
References 11
PART I CLASS DISCOVERY 13 2 Crisp K-Means Cluster Analysis 152.1 Introduction 15
2.2 Algorithm 16
2.3 Implementation 18
2.4 Distance Metrics 20
2.5 Cluster Validity 24
2.6 V-Fold Cross-Validation 35
2.7 Cluster Initialization 37
2.8 Cluster Outliers 44
2.9 Summary 44
References 45
3 Fuzzy K-Means Cluster Analysis 473.1 Introduction 47
3.2 Fuzzy K-Means Algorithm 47
3.3 Implementation 49
3.4 Summary 54
References 54
4 Self-Organizing Maps 574.1 Introduction 57
4.2 Algorithm 57
4.3 Implementation 63
4.4 Cluster Visualization 67
4.5 Unified Distance Matrix (U Matrix) 71
4.6 Component Map 71
4.7 Map Quality 73
4.8 Nonlinear Dimension Reduction 75
References 79
5 Unsupervised Neural Gas 815.1 Introduction 81
5.2 Algorithm 82
5.3 Implementation 82
5.4 Nonlinear Dimension Reduction 85
5.5 Summary 87
References 88
6 Hierarchical Cluster Analysis 916.1 Introduction 91
6.2 Methods 91
6.3 Algorithm 96
6.4 Implementation 96
References 105
7 Model-Based Clustering 1077.1 Introduction 107
7.2 Algorithm 110
7.3 Implementation 111
7.4 Summary 116
References 117
8 Text Mining: Document Clustering 1198.1 Introduction 119
8.2 Duo-Mining 119
8.3 Streams and Documents 120
8.4 Lexical Analysis 120
8.5 Stemming 121
8.6 Term Weighting 121
8.7 Concept Vectors 124
8.8 Main Terms Representing Concept Vectors 124
8.9 Algorithm 125
8.10 Preprocessing 127
8.11 Summary 137
References 137
9 Text Mining: N-Gram Analysis 1399.1 Introduction 139
9.2 Algorithm 140
9.3 Implementation 141
9.4 Summary 154
References 156
PART II DIMENSION REDUCTION 159 10 Principal Components Analysis 16110.1 Introduction 161
10.2 Multivariate Statistical Theory 161
10.3 Algorithm 170
10.4 When to Use Loadings and PC Scores 170
10.5 Implementation 171
10.6 Rules of Thumb For PCA 182
10.7 Summary 186
References 187
11 Nonlinear Manifold Learning 18911.1 Introduction 189
11.2 Correlation-Based PCA 190
11.3 Kernel PCA 191
11.4 Diffusion Maps 192
11.5 Laplacian Eigenmaps 192
11.6 Local Linear Embedding 193
11.7 Locality Preserving Projections 194
11.8 Sammon Mapping 195
11.9 NLML Prior to Classification Analysis 195
11.10 Classification Results 197
11.11 Summary 200
References 203
PART III CLASS PREDICTION 205 12 Feature Selection 20712.1 Introduction 207
12.2 Filtering versus Wrapping 208
12.3 Data 209
12.4 Data Arrangement 211
12.5 Filtering 213
12.6 Selection Methods 254
12.7 Multicollinearity 259
12.8 Summary 270
References 270
13 Classifier Performance 27313.1 Introduction 273
13.2 Input-Output, Speed, and Efficiency 273
13.3 Training, Testing, and Validation 277
13.4 Ensemble Classifier Fusion 280
13.5 Sensitivity and Specificity 283
13.6 Bias 284
13.7 Variance 285
13.8 Receiver-Operator Characteristic (ROC) Curves 286
References 295
14 Linear Regression 29714.1 Introduction 297
14.2 Algorithm 299
14.3 Implementation 299
14.4 Cross-Validation Results 300
14.5 Bootstrap Bias 303
14.6 Multiclass ROC Curves 306
14.7 Decision Boundaries 308
14.8 Summary 310
References 310
15 Decision Tree Classification 31115.1 Introduction 311
15.2 Features Used 314
15.3 Terminal Nodes and Stopping Criteria 315
15.4 Algorithm 315
15.5 Implementation 315
15.6 Cross-Validation Results 318
15.7 Decision Boundaries 326
15.8 Summary 327
References 329
16 Random Forests 33116.1 Introduction 331
16.2 Algorithm 333
16.3 Importance Scores 334
16.4 Strength and Correlation 338
16.5 Proximity and Supervised Clustering 342
16.6 Unsupervised Clustering 345
16.7 Class Outlier Detection 348
16.8 Implementation 350
16.9 Parameter Effects 350
16.10 Summary 357
References 358
17 K Nearest Neighbor 36117.1 Introduction 361
17.2 Algorithm 362
17.3 Implementation 363
17.4 Cross-Validation Results 364
17.5 Bootstrap Bias 369
17.6 Multiclass ROC Curves 373
17.7 Decision Boundaries 374
17.8 Summary 377
References 378
18 Nayve Bayes Classifier 37918.1 Introduction 379
18.2 Algorithm 380
18.3 Cross-Validation Results 380
18.4 Bootstrap Bias 384
18.5 Multiclass ROC Curves 386
18.6 Decision Boundaries 386
18.7 Summary 389
References 391
19 Linear Discriminant Analysis 39319.1 Introduction 393
19.2 Multivariate Matrix Definitions 394
19.3 Linear Discriminant Analysis 396
19.4 Quadratic Discriminant Analysis 403
19.5 Fisher's Discriminant Analysis 406
19.6 Summary 411
References 412
20 Learning Vector Quantization 41520.1 Introduction 415
20.2 Cross-Validation Results 417
20.3 Bootstrap Bias 417
20.4 Multiclass ROC Curves 426
20.5 Decision Boundaries 428
20.6 Summary 428
References 430
21 Logistic Regression 43321.1 Introduction 433
21.2 Binary Logistic Regression 434
21.3 Polytomous Logistic Regression 439
21.4 Cross-Validation Results 443
21.5 Decision Boundaries 444
21.6 Summary 444
References 447
22 Support Vector Machines 44922.1 Introduction 449
22.2 Hard-Margin SVM for Linearly Separable Classes 449
22.3 Kernel Mapping into Nonlinear Feature Space 452
22.4 Soft-Margin SVM for Nonlinearly Separable Classes 452
22.5 Gradient Ascent Soft-Margin SVM 454
22.6 Least-Squares Soft-Margin SVM 465
22.7 Summary 481
References 483
23 Artificial Neural Networks 48723.1 Introduction 487
23.2 ANN Architecture 488
23.3 Basics of ANN Training 488
23.4 ANN Training Methods 497
23.5 Algorithm 502
23.6 Batch versus Online Training 504
23.7 ANN Testing 504
23.8 Cross-Validation Results 504
23.9 Bootstrap Bias 506
23.10 Multiclass ROC Curves 506
23.11 Decision Boundaries 513
23.12 RPROP versus Backpropagation 513
23.13 Summary 522
References 522
24 Kernel Regression 52524.1 Introduction 525
24.2 Algorithm 527
24.3 Cross-Validation Results 527
24.4 Bootstrap Bias 528
24.5 Multiclass ROC Curves 536
24.6 Decision Boundaries 537
24.7 Summary 540
References 542
25 Neural Adaptive Learning with Metaheuristics 54325.1 Multilayer Perceptrons 544
25.2 Genetic Algorithms 544
25.3 Covariance Matrix Self-Adaptation-Evolution Strategies 549
25.4 Particle Swarm Optimization 556
25.5 ANT Colony Optimization 560
25.6 Summary 567
References 567
26 Supervised Neural Gas 57326.1 Introduction 573
26.2 Algorithm 574
26.3 Cross-Validation Results 574
26.4 Bootstrap Bias 582
26.5 Multiclass ROC Curves 582
26.6 Class Decision Boundaries 584
26.7 Summary 586
References 588
27 Mixture of Experts 59127.1 Introduction 591
27.2 Algorithm 595
27.3 Cross-Validation Results 596
27.4 Decision Boundaries 597
27.5 Summary 597
References 599
28 Covariance Matrix Filtering 60128.1 Introduction 601
28.2 Covariance and Correlation Matrices 601
28.3 Random Matrices 602
28.4 Component Subtraction 608
28.5 Covariance Matrix Shrinkage 610
28.6 Covariance Matrix Filtering 613
28.7 Summary 621
References 622
APPENDIXES 625 A Probability Primer 627 B Matrix Algebra 639 C Mathematical Functions 655 D Statistical Primitives 665 E Probability Distributions 679 F Symbols And Notation 699Index 703
LEIF E. PETERSON, PhD, is Associate Professor of Public Health, Weill Cornell Medical College, Cornell University, and is with the Center for Biostatistics, The Methodist Hospital Research Institute (Houston). He is a member of the IEEE Computational Intelligence Society, and Editor-in-Chief of the BioMed Central Source Code for Biology and Medicine.
![]() |
Ask a Question About this Product More... |
![]() |