《搜索引擎信息检索实践(英文版)/经典原版书库》(美)克罗夫特机械工业出版社PDF电子书网盘迅雷下载、免费在线阅读-兰台网

1 Search Engines and Information Retrieval

　1.1 What Is Information Retrieval？

　1.2 The Big Issues

　1.3 Search Engines

　1.4 Search Engineers

2 Architecture of a Search Engine

　2.1 What Is an Architecture？

　2.2 Basic Building Blocks

　2.3 Breaking It Down

　　2.3.1 Text Acquisition

　　2.3.2 Text Transformation

　　2.3.3 Index Creation

　　2.3.4 User Interaction

　　2.3.5 Ranking

　　2.3.6 Evaluation

　2.4 How Does It Really Work？

3 Crawls and Feeds

　3.1 Deciding What to Search

　3.2 Crawling the Web

　　3.2.1 Retrieving Web Pages

　　3.2.2 The Web Crawler

　　3.2.3 Freshness

　　3.2.4 Focused Crawling

　　3.2.5 Deep Web

　 3.2.6 Sitemaps

　 3.2.7 Distributed Crawling

3.3 Crawling Documents and Email

3.4 Document Feeds

3.5 The Conversion Problem

　 3.5.1 Character Encodings

3.6 Storing the Documents

　 3.6.1 Using a Database System

　 3.6.2 Random Access

　 3.6.3 Compression and Large Files

　 3.6.4 Update

　 3.6.5 BigTable

　3.7 Detecting Duplicates

　3.8 Removing Noise

4 Processing Text

　4.1 From Words to Terms

　4.2 Text Statistics

　 4.2.1 Vocabulary Growth

　 4.2.2 Estimating Collection and Result Set Sizes

4.3 Document Parsing

　 4.3.1 Overview

　 4.3.2 Tokenizing

　 4.3.3 Stopping

　 4.3.4 Stemming

　 4.3.5 Phrases and N-grams

4.4 Document Structure and Markup

4.5 Link Analysis

　 4.5.1 Anchor Text

　 4.5.2 PageRank

　 4.5.3 Link Quality

4.6 Information Extraction

　 4.6.1 Hidden Markov Models for Extraction

4.7 Internationalization

5 Ranking with Indexes

5.1 Overview

5.2 Abstract Model of Ranking

5.3 Inverted Indexes

5.3.1 Documents

5.3.2 Counts

5.3.3 Positions

5.3.4 Fields and Extents

5.3.5 Scores

5.3.6 Ordering

5.4 Compression

5.4.1 Entropy and Ambiguity

5.4.2 Delta Encoding

5.4.3 Bit-Aligned Codes

5.4.4 Byte-Aligned Codes

5.4.5 Compression in Practice

5.4.6 Looking Ahead

5.4.7 Skipping and Skip Pointers

5.5 Auxiliary Structures

5.6 Index Construction

5.6.1 Simple Construction

5.6.2 Merging

5.6.3 Parallelism and Distribution

5.6.4 Update

5.7 Query Processing

5.7.1 Document-at-a-time Evaluation

5.7.2 Term-at-a-time Evaluation

5.7.3 Optimization Techniques

5.7.4 Structured Queries

5.7.5 Distributed Evaluation

5.7.6 Caching

6 Queries and Interfaces

6.1 Information Needs and Queries

6.2 Query Transformation and Refinement

6.2.1 Stopping and Stemming Revisited

6.2.2 Spell Checking and Suggestions

6.2.3 Query Expansion

6.2.4 Relevance Feedback

6.2.5 Context and Personalization

6.3 Showing the Results

6.3.1 Result Pages and Snippets

6.3.2 Advertising and Search

6.3.3 Clustering the Results

6.4 Cross-Language Search

7 Retrieval Models

7.1 Overview of Retrieval Models

7.1.1 Boolean Retrieval

7.1.2 The Vector Space Model

7.2 Probabilistic Models

7.2.1 Information Retrieval as Classification

7.2.2 The BM25 Ranking Algorithm

7.3 Ranking Based on Language Models

7.3.1 Query Likelihood Ranking

7.3.2 Relevance Models and Pseudo-Relevance Feedback

7.4 Complex Queries and Combining Evidence

7.4.1 The Inference Network Model

7.4.2 The Galago Query Language

7.5 Web Search

7.6 Machine Learning and Information Retrieval

7.6.1 Learningto Rank

7.6.2 Topic Models and Vocabulary Mismatch

7.7 Application-Based Models

8 Evaluating Search Engines

8.1 Why Evaluate ？

8.2 The Evaluation Corpus

8.3 Logging

8.4 Effectiveness Metrics

8.4.1 Recall and Precision

8.4.2 Averaging and Interpolation

8.4.3 Focusing on the Top Documents

8.4.4 Using Preferences

8.5 Efficiency Metrics

8.6 Training, Testing, and Statistics

8.6.1 Significance Tests

8.6.2 Setting Parameter Values

8.6.3 Online Testing

8.7 The Bottom Line

9 Classification and Clustering

9.1 Classification and Categorization

9.1.1 Naive Bayes

9.1.2 Support Vector Machines

9.1.3 Evaluation

9.1.4 Classifier and Feature Selection

9.1.5 Spam, Sentiment, and Online Advertising

9.2 Clustering

9.2.1 Hierarchical and K-Means Clustering

9.2.2 K Nearest Neighbor Clustering

9.2.3 Evaluation

9.2.4 How to Choose K

9.2.5 Clustering and Search

10 Social Search

10.1 What Is Social Search？

10.2 User Tags and Manual Indexing

10.2.1 Searching Tags

10.2.2 Inferring Missing Tags

10.2.3 Browsing and Tag Clouds

10.3 Searching with Communities

10.3.1 What Is a Community？

10.3.2 Finding Communities

10.3.3 Community-Based Question Answering

10.3.4 Collaborative Searching

10.4 Filtering and Recommending

10.4.1 Document Filtering

10.4.2 Collaborative Filtering

10.5 Peer-to-Peer and Metasearch

10.5.1 Distributed Search

10.5.2 P2P Networks

11 Beyond Bag of Words

11.1 Overview

11.2 Feature-Based Retrieval Models

11.3 Term Dependence Models

11.4 Structure Revisited

11.4.1 XML Retrieval

11.4.2 Entity Search

11.5 Longer Questions, Better Answers

11.6 Words, Pictures, and Music

11.7 One Search Fits All？

References

Index

图书	搜索引擎信息检索实践(英文版)/经典原版书库
内容	编辑推荐这是本全英文版本的信息检索知识读本。主要介绍了信息检索(IR)中的11个关键问题以及其如何影响搜索引擎的设计与实现，并且用数学模型强化了重要的概念。本书内容丰富，针对性、实用性较强，适合作为高等院校计算机科学或计算机工程专业的本科生、研究生的教材使用。内容推荐本书介绍了信息检索(IR)中的关键问题，以及这些问题如何影响搜索引擎的设计与实现，并且用数学模型强化了重要的概念。对于网络搜索引擎这一重要的话题，书中主要涵盖了在网络上广泛使用的搜索技术。本书适用于高等院校计算机科学或计算机工程专业的本科生、研究生，对于专业人士而言，本书也不失为一本理想的入门教材。目录 1 Search Engines and Information Retrieval 　1.1 What Is Information Retrieval？　1.2 The Big Issues 　1.3 Search Engines 　1.4 Search Engineers 2 Architecture of a Search Engine 　2.1 What Is an Architecture？　2.2 Basic Building Blocks 　2.3 Breaking It Down 　　2.3.1 Text Acquisition 　　2.3.2 Text Transformation 　　2.3.3 Index Creation 　　2.3.4 User Interaction 　　2.3.5 Ranking 　　2.3.6 Evaluation 　2.4 How Does It Really Work？ 3 Crawls and Feeds 　3.1 Deciding What to Search 　3.2 Crawling the Web 　　3.2.1 Retrieving Web Pages 　　3.2.2 The Web Crawler 　　3.2.3 Freshness 　　3.2.4 Focused Crawling 　　3.2.5 Deep Web 　 3.2.6 Sitemaps 　 3.2.7 Distributed Crawling 3.3 Crawling Documents and Email 3.4 Document Feeds 3.5 The Conversion Problem 　 3.5.1 Character Encodings 3.6 Storing the Documents 　 3.6.1 Using a Database System 　 3.6.2 Random Access 　 3.6.3 Compression and Large Files 　 3.6.4 Update 　 3.6.5 BigTable 　3.7 Detecting Duplicates 　3.8 Removing Noise 4 Processing Text 　4.1 From Words to Terms 　4.2 Text Statistics 　 4.2.1 Vocabulary Growth 　 4.2.2 Estimating Collection and Result Set Sizes 4.3 Document Parsing 　 4.3.1 Overview 　 4.3.2 Tokenizing 　 4.3.3 Stopping 　 4.3.4 Stemming 　 4.3.5 Phrases and N-grams 4.4 Document Structure and Markup 4.5 Link Analysis 　 4.5.1 Anchor Text 　 4.5.2 PageRank 　 4.5.3 Link Quality 4.6 Information Extraction 　 4.6.1 Hidden Markov Models for Extraction 4.7 Internationalization 5 Ranking with Indexes 5.1 Overview 5.2 Abstract Model of Ranking 5.3 Inverted Indexes 5.3.1 Documents 5.3.2 Counts 5.3.3 Positions 5.3.4 Fields and Extents 5.3.5 Scores 5.3.6 Ordering 5.4 Compression 5.4.1 Entropy and Ambiguity 5.4.2 Delta Encoding 5.4.3 Bit-Aligned Codes 5.4.4 Byte-Aligned Codes 5.4.5 Compression in Practice 5.4.6 Looking Ahead 5.4.7 Skipping and Skip Pointers 5.5 Auxiliary Structures 5.6 Index Construction 5.6.1 Simple Construction 5.6.2 Merging 5.6.3 Parallelism and Distribution 5.6.4 Update 5.7 Query Processing 5.7.1 Document-at-a-time Evaluation 5.7.2 Term-at-a-time Evaluation 5.7.3 Optimization Techniques 5.7.4 Structured Queries 5.7.5 Distributed Evaluation 5.7.6 Caching 6 Queries and Interfaces 6.1 Information Needs and Queries 6.2 Query Transformation and Refinement 6.2.1 Stopping and Stemming Revisited 6.2.2 Spell Checking and Suggestions 6.2.3 Query Expansion 6.2.4 Relevance Feedback 6.2.5 Context and Personalization 6.3 Showing the Results 6.3.1 Result Pages and Snippets 6.3.2 Advertising and Search 6.3.3 Clustering the Results 6.4 Cross-Language Search 7 Retrieval Models 7.1 Overview of Retrieval Models 7.1.1 Boolean Retrieval 7.1.2 The Vector Space Model 7.2 Probabilistic Models 7.2.1 Information Retrieval as Classification 7.2.2 The BM25 Ranking Algorithm 7.3 Ranking Based on Language Models 7.3.1 Query Likelihood Ranking 7.3.2 Relevance Models and Pseudo-Relevance Feedback 7.4 Complex Queries and Combining Evidence 7.4.1 The Inference Network Model 7.4.2 The Galago Query Language 7.5 Web Search 7.6 Machine Learning and Information Retrieval 7.6.1 Learningto Rank 7.6.2 Topic Models and Vocabulary Mismatch 7.7 Application-Based Models 8 Evaluating Search Engines 8.1 Why Evaluate ？ 8.2 The Evaluation Corpus 8.3 Logging 8.4 Effectiveness Metrics 8.4.1 Recall and Precision 8.4.2 Averaging and Interpolation 8.4.3 Focusing on the Top Documents 8.4.4 Using Preferences 8.5 Efficiency Metrics 8.6 Training, Testing, and Statistics 8.6.1 Significance Tests 8.6.2 Setting Parameter Values 8.6.3 Online Testing 8.7 The Bottom Line 9 Classification and Clustering 9.1 Classification and Categorization 9.1.1 Naive Bayes 9.1.2 Support Vector Machines 9.1.3 Evaluation 9.1.4 Classifier and Feature Selection 9.1.5 Spam, Sentiment, and Online Advertising 9.2 Clustering 9.2.1 Hierarchical and K-Means Clustering 9.2.2 K Nearest Neighbor Clustering 9.2.3 Evaluation 9.2.4 How to Choose K 9.2.5 Clustering and Search 10 Social Search 10.1 What Is Social Search？ 10.2 User Tags and Manual Indexing 10.2.1 Searching Tags 10.2.2 Inferring Missing Tags 10.2.3 Browsing and Tag Clouds 10.3 Searching with Communities 10.3.1 What Is a Community？ 10.3.2 Finding Communities 10.3.3 Community-Based Question Answering 10.3.4 Collaborative Searching 10.4 Filtering and Recommending 10.4.1 Document Filtering 10.4.2 Collaborative Filtering 10.5 Peer-to-Peer and Metasearch 10.5.1 Distributed Search 10.5.2 P2P Networks 11 Beyond Bag of Words 11.1 Overview 11.2 Feature-Based Retrieval Models 11.3 Term Dependence Models 11.4 Structure Revisited 11.4.1 XML Retrieval 11.4.2 Entity Search 11.5 Longer Questions, Better Answers 11.6 Words, Pictures, and Music 11.7 One Search Fits All？ References Index
标签
缩略图
书名	搜索引擎信息检索实践(英文版)/经典原版书库
副书名
原作名
作者	(美)克罗夫特
译者
编者
绘者
出版社	机械工业出版社
商品编码（ISBN）	9787111282471
开本	16开
页数	520
版次	1
装订	平装
字数
出版时间	2009-10-01
首版时间	2009-10-01
印刷时间	2010-11-01
正文语种	英
读者对象	青年(14-20岁),研究人员,普通成人
适用范围
发行范围	公开发行
发行模式	实体书
首发网站
连载网址
图书大类	人文社科-社会科学-社会科学总论
图书小类
重量	0.534
CIP核字
中图分类号	G354.4
丛书名
印张	16.75
印次	1
出版地	北京
长	213
宽	147
高	22
整理
媒质	图书
用纸	普通纸
是否注音	否
影印版本	原版
出版商国别	CN
是否套装	单册
著作权合同登记号	图字01-2009-4966
版权提供者	Pearson Education Asia Ltd.
定价
印数
出品方
作品荣誉
主角
配角
其他角色
一句话简介
立意
作品视角
所属系列
文章进度
内容简介
作者简介
目录
文摘
安全警示	适度休息有益身心健康，请勿长期沉迷于阅读小说。
随便看	京城81号程缘薇了［一吻定情］我才不要痴汉他玉术（奇犽）钓猫记空梦情鎖南洋见君如故 [末世]丧尸百日门神四月是你的名字（七五+仙三）神剑里的妹妹！曾今的歌一次十把大宝剑！南方北方不值得一提的观后感姑苏杂记之王屠 [黑篮\|绿间BG]即逝综我有一块封神榜龙灵宿罪谁独醒逆云路[修真] 晨昏线古风微故事弟弟很萌 eXPert PDF Editor Professional Swift XML2Excel FastPaste SubPad ASPcodePrint Visual Charting 图表制作软件 TreeDBNotes 文字动画 Polyglot 3000 (x32) 笔录执行官极光世界新战国英雄打字游戏昆仑online 新天下无双蜀山新传西瓜太郎劲乐团闪客快打真三国无双OL 守婚如玉将界麻辣变形计我家的方程式超能老豆僵尸国度第三季浮士德的微笑天伦陆军一号好先生