Data Mining Mehmed Kantardzic (good english books to read .txt) 📖
- Author: Mehmed Kantardzic
Book online «Data Mining Mehmed Kantardzic (good english books to read .txt) 📖». Author Mehmed Kantardzic
(a) typically assume an underlying distribution for the data,
(b) are better able to deal with missing and noisy data,
(c) are not able to explain their behavior, and
(d) have trouble with large-sized data sets.
17. Explain the difference between sensitivity and specificity.
18. When do you need to use a separate validation set, in addition to train and test sets?
19. In this question we will consider learning problems where each instance x is some integer in the set X = {1, 2, … , 127}, and where each hypothesis h ∈ H is an interval of the form a ≤ x ≤ b, where a and b can be any integers between 1 and 127 (inclusive), so long as a ≤ b. A hypothesis a ≤ x ≤ b labels instance x positive if x falls into the interval defined by a and b, and labels the instance negative otherwise. Assume throughout this question that the teacher is only interested in teaching concepts that can be represented by some hypothesis in H.
(a) How many distinct hypotheses are there in H?
(b) Suppose the teacher is trying to teach the specific target concept 32 ≤ x ≤ 84. What is the minimum number of training examples the teacher must present to guarantee that any consistent learner will learn this concept exactly?
20. Is it true that the SVM learning algorithm is guaranteed to find the globally optimal hypothesis with respect to its object function? Discuss your answer.
4.11 REFERENCES FOR FURTHER STUDY
Alpaydin, A, Introduction to Machine Learning, 2nd edition, The MIT Press, Boston, 2010.
The goal of machine learning is to program computers to use example data or past experience to solve a given problem. Many successful applications of machine learning exist already, including systems that analyze past sales data to predict customer behavior, optimize robot behavior so that a task can be completed using minimum resources, and extract knowledge from bioinformatics data. Introduction to Machine Learning is a comprehensive textbook on the subject, covering a broad array of topics not usually included in introductory machine-learning texts. In order to present a unified treatment of machine-learning problems and solutions, it discusses many methods from different fields, including statistics, pattern recognition, neural networks, artificial intelligence, signal processing, control, and data mining. All learning algorithms are explained so that the student can easily move from the equations in the book to a computer program.
Berthold, M., D. J. Hand, eds., Intelligent Data Analysis—An Introduction, Springer, Berlin, Germany, 1999.
The book is a detailed, introductory presentation of the key classes of intelligent data-analysis methods including all common data-mining techniques. The first half of the book is devoted to the discussion of classical statistical issues, ranging from basic concepts of probability and inference to advanced multivariate analyses and Bayesian methods. The second part of the book covers theoretical explanations of data-mining techniques that have their roots in disciplines other than statistics. Numerous illustrations and examples enhance the readers’ knowledge about theory and practical evaluations of data-mining techniques.
Cherkassky, V., F. Mulier, Learning from Data: Concepts, Theory and Methods, 2nd edition, John Wiley, New York, 2007.
The book provides a unified treatment of the principles and methods for learning dependencies from data. It establishes a general conceptual framework in which various learning methods from statistics, machine learning, and other disciplines can be applied—showing that a few fundamental principles underlie most new methods being proposed today. An additional strength of this primarily theoretical book is the large number of case studies and examples that simplify and make understandable concepts in SLT.
Engel, A., C. Van den Broeck, Statistical Mechanics of Learning, Cambridge University Press, Cambridge, UK, 2001.
The subject of this book is the contribution of machine learning over the last decade by researchers applying the techniques of statistical mechanics. The authors provide a coherent account of various important concepts and techniques that are currently only found scattered in papers. They include many examples and exercises, making this a book that can be used with courses, or for self-teaching, or as a handy reference.
Haykin, S., Neural Networks: A Comprehensive Foundation, Prentice Hall, Upper Saddle River, NJ, 1999.
The book provides a comprehensive foundation for the study of artificial neural networks, recognizing the multidisciplinary nature of the subject. The introductory part explains the basic principles of SLT and the concept of VC dimension. The main part of the book classifies and explains artificial neural networks as learning machines with and without a teacher. The material presented in the book is supported with a large number of examples, problems, and computer-oriented experiments.
5
STATISTICAL METHODS
Chapter Objectives
Explain methods of statistical inference commonly used in data-mining applications.
Identify different statistical parameters for assessing differences in data sets.
Describe the components and the basic principles of Naïve Bayesian classifier and the logistic regression method.
Introduce log-linear models using correspondence analysis of contingency tables.
Discuss the concepts of analysis of variance (ANOVA) and linear discriminant analysis (LDA) of multidimensional samples.
Statistics is the science of collecting and organizing data and drawing conclusions from data sets. The organization and description of the general characteristics of data sets is the subject area of descriptive statistics. How to draw conclusions from data is the subject of statistical inference. In this chapter, the emphasis is on the basic principles of statistical inference; other related topics will be described briefly enough to understand the basic concepts.
Statistical data analysis is the most well-established set of methodologies for data mining. Historically, the first computer-based applications of data analysis were developed with the support of statisticians. Ranging from one-dimensional data analysis to multivariate data analysis, statistics offered a variety of methods for data mining, including different types of regression and discriminant analysis. In this short overview of statistical methods that support the data-mining process, we will not cover all approaches and methodologies; a selection has been made of the techniques used most often in real-world data-mining applications.
5.1 STATISTICAL INFERENCE
The totality of the observations with which we are concerned in statistical analysis, whether their number is finite or infinite, constitutes what we call a population. The term refers
Comments (0)