This book introduces the reader to methods of data mining on the web, including uncovering patterns in web content (classification, clustering, language processing), structure (graphs, hubs, metrics), and usage (modeling, sequence analysis, performance). .
Learn How To Convert Web Data Into Web Knowledge
This text demonstrates how to extract knowledge by finding meaningful connections among data spread throughout the Web. Readers learn methods and algorithms from the fields of information retrieval, machine learning, and data mining which, when combined, provide a solid framework for mining the Web. The authors walk readers through the algorithms with the aid of examples and exercises.
This text is divided into three parts:
*
Part One, Web Structure, presents basic concepts and techniques for extracting information from the Web. Readers learn how to collect and index Web documents as well as search and rank Web pages according to their textual content and hyperlink structure.
*
Part Two, Web Content Management, offers two approaches, clustering and classification, for organizing Web content. For both approaches, the authors set forth specific algorithms that enable readers to convert Web data into knowledge.
*
Part Three, Web Usage Mining, demonstrates the application of data mining methods to uncover meaningful patterns of Internet usage.
Methods and algorithms are illustrated by simple examples. More than 100 exercises help readers assess their grasp of the material. Further, thirty-four hands-on analysis problems ask readers to use their new data mining expertise to solve real problems, working with large data sets. All the data sets needed for the examples, exercises, and analysis problems are available on the companion Web site.
The extensive use of examples, along with the opportunity to test and apply data mining skills, makes this text ideal for graduate and upper-level undergraduates in computer science and engineering. Web designers and researchers will find that this text gives them a new set of tools to further mine the Web for knowledge and move well beyond the capabilities of standard search engines.