ISBN:#9780596529321
Gerade bestellt. Hat das vielleicht schon wer zuhause?
Die Kritiken sind sehr gut, und die Leseproben ebenfalls. Thema ist wohl Machine Learning/Datamining, mit Fokus auf Social Software als Datenbasis. Den Untertitel fand ich etwas abschreckend (Web2.0 in einem Buchtitel...), aber es geht das Geruecht um, das das Buch sehr praxisorientiert und auch unterhaltsam geschrieben ist. Die Codebeispiele sind in Python, es werden unter anderem webcrawler, bayesian filter, support vector machines und genetische algorithmen implementiert.
Table of Contents
Chapter 1: Introduction to Collective Intelligence
- What Is Collective Intelligence?
- What Is Machine Learning?
- Limits of Machine Learning
- Real-Life Examples
- Other Uses for Learning Algorithms
Chapter 2: Making Recommendations
- Collaborative Filtering
- Collecting Preferences
- Finding Similar Users
- Recommending Items
- Matching Products
- Building a del.icio.us Link Recommender
- Item-Based Filtering
- Using the MovieLens Dataset
- User-Based or Item-Based Filtering?
- Exercises
Chapter 3: Discovering Groups
- Supervised versus Unsupervised Learning
- Word Vectors
- Hierarchical Clustering
- Drawing the Dendrogram
- Column Clustering
- K-Means Clustering
- Clusters of Preferences
- Viewing Data in Two Dimensions
- Other Things to Cluster
- Exercises
Chapter 4: Searching and Ranking
- What's in a Search Engine?
- A Simple Crawler
- Building the Index
- Querying
- Content-Based Ranking
- Using Inbound Links
- Learning from Clicks
- Exercises
Chapter 5: Optimization
- Group Travel
- Representing Solutions
- The Cost Function
- Random Searching
- Hill Climbing
- Simulated Annealing
- Genetic Algorithms
- Real Flight Searches
- Optimizing for Preferences
- Network Visualization
- Other Possibilities
- Exercises
Chapter 6: Document Filtering
- Filtering Spam
- Documents and Words
- Training the Classifier
- Calculating Probabilities
- A Naïve Classifier
- The Fisher Method
- Persisting the Trained Classifiers
- Filtering Blog Feeds
- Improving Feature Detection
- Using Akismet
- Alternative Methods
- Exercises
Chapter 7: Modeling with Decision Trees
- Predicting Signups
- Introducing Decision Trees
- Training the Tree
- Choosing the Best Split
- Recursive Tree Building
- Displaying the Tree
- Classifying New Observations
- Pruning the Tree
- Dealing with Missing Data
- Dealing with Numerical Outcomes
- Modeling Home Prices
- Modeling "Hotness"
- When to Use Decision Trees
- Exercises
Chapter 8: Building Price Models
- Building a Sample Dataset
- k-Nearest Neighbors
- Weighted Neighbors
- Cross-Validation
- Heterogeneous Variables
- Optimizing the Scale
- Uneven Distributions
- Using Real Data—the eBay API
- When to Use k-Nearest Neighbors
- Exercises
Chapter 9: Advanced Classification: Kernel Methods and SVMs
- Matchmaker Dataset
- Difficulties with the Data
- Basic Linear Classification
- Categorical Features
- Scaling the Data
- Understanding Kernel Methods
- Support-Vector Machines
- Using LIBSVM
- Matching on Facebook
- Exercises
Chapter 10: Finding Independent Features
- A Corpus of News
- Previous Approaches
- Non-Negative Matrix Factorization
- Displaying the Results
- Using Stock Market Data
- Exercises
Chapter 11: EVOLVING INTELLIGENCE
- What Is Genetic Programming?
- Programs As Trees
- Creating the Initial Population
- Testing a Solution
- Mutating Programs
- Crossover
- Building the Environment
- A Simple Game
- Further Possibilities
- Exercises
Chapter 12: Algorithm Summary
- Bayesian Classifier
- Decision Tree Classifier
- Neural Networks
- Support-Vector Machines
- k-Nearest Neighbors
- Clustering
- Multidimensional Scaling
- Non-Negative Matrix Factorization
- Optimization
Appendix : Third-Party Libraries
- Universal Feed Parser
- Python Imaging Library
- Beautiful Soup
- pysqlite
- NumPy
- matplotlib
- pydelicious
Appendix : Mathematical Formulas
- Euclidean Distance
- Pearson Correlation Coefficient
- Weighted Mean
- Tanimoto Coefficient
- Conditional Probability
- Gini Impurity
- Entropy
- Variance
- Gaussian Function
- Dot-Products