计算机信息检索
Computerized Information Retrieval
(工程硕士)
2005.3
2
信息检索词汇( terms)
Information retrieval (IR)
Information access (obtain)
Information search (look for)
Information searching (look for)
Information seeking (focus on users,active)
locate
hit
3
1,信息检索发展阶段
● 手工操作 (manual)
● 计算机化 (computerized)
● 网络化 (networked)
● 智能化 (intelligentized)
● 认知化 (cognized)
What’s searching like?
“Finding a needle in a haystack”
4
2,主要检索系统类型
联机检索 (online search)
脱机检索( offline search)
光盘检索( CD search)
网络检索( Interne/Web search)
全球数字图书馆系统( digital global
system)
5
通信网络 联机检索中心检索终端数据库
2.1 联机检索 (online search)
The process of locating specific pieces of info
from one or more databases that reside on the
computers(hosts),The search is a true
interaction between you and search system.
主机
Feature,interaction,real time
remote,Direct Internet
WAN微机
6
Online databases
Features
¨ A test-bed for early IR experiments &
development
¨ The showcase of IR technology (e.g,relevance
retrieval) until the Internet,the Web became popular
¨ A laboratory for acquiring information retrieval
skills
IR capability
¨ Covers virtually every type of database structures
¨ Implement all different retrieval models and
techniques
-----Hahn
7
OPACs
Features
¨ An extension of MARC records
¨ A product of library automation
¨ A bibliographic database of library resources for
an institution at various levels,e.g,local,regional,
national
IR capability
¨ Easy to browse (resemble shelf structure)
¨ Based on well-established hierarchical database
structure
¨ May provide sophisticated searching capabilities
but users may not benefit from them
8
光驱检索终端微机
2.2 光盘检索 (CD search)
光盘单机检索
9
检索终端局域网 服务器 光驱LAN微机
tower
jukebox
光盘联机检索 (CD online)
10
CD-ROMs
Features
¨ Online databases in CD-ROM medium
¨ Portability
¨ Cheaper & more convenient access until the Web
becomes the gateway for CD-ROM access
¨ Noticeable problem in updating the database
¨ Hypertext or hypermedia is heavily used in CD-ROM
¨ Becoming an obsolete technology?
IR capabilities
¨ Introduce the browsing feature
¨ Other IR techniques similar to those of online databases
¨ Slower in speed
11
2.3 网络( Internet)信息检索
Features
¨ Initial intention was more on resources sharing than IR
¨ Rapid development and constant improvement
¨ Wide usage
¨ Good for presenting information,less so for organizing
formation
IR capability
¨ Retrieval capability is an addition,not included in the
original design for the system except in the case of WAIS
¨ Quality (uneven retrieval performance) and quantity vary
from one tool to another
¨ Replacing online systems and becoming the lab &
showcase for new,advanced and sophisticated IR
techniques
12
传统的联机网络检索与 Internet网络检索比较传统的联机网络检索 Internet网络检索系统网络结构体系集中、封闭、同种机;
主 /仆模式
(目前已经被改造)
分布、开放、异种机;
客户机 /服务器模式,
浏览器 /服务器模式信息资源高质量 的学术信息;
人工筛选;收费服务信息量大,无质量控制;
自动发掘、采集;
免费服务居多检索方式专家(或中介)检索模式;
完备的检索命令语言;
受控语言检索为主个人用户检索模式;
WIMP (浏览 +检索);
自然语言 检索为主
13
3,数据库的基本概念
3.1 数据库的定义至少由一种文档组成,能满足特定目的的或特定数据处理系统需要的数据集合 。
14
3.2 数据库的类型
1、参考数据库
书目数据 (bibliographic database)
目录数据库 (catalog database)
文摘索引数据库 (abstract/index
database)
指南数据库 (directory database)
15
2、源数据库 (source database,data
bank)
数值数据库 (numeric database)
文本 -数值数据库 (text-numeric database)
属性数据库 (property database)
术语数据库 (terminology bank)
全文数据库 (full-text database)
图象数据库 (graphic database)
多媒体数据库 (multimedia database)

16
3.3 书目数据库的结构
文档 (file)
数据库组织的基本形式
记录 (record)
文档的构成单位
(对应一个书目条目 )
17
字段 (field)
记录的构成单位
(条目中的一个信息项 )
△ 子字段 (subfield)
字段的构成单位
18
4,计算机检索基本方法
4.1,检索策略检索步骤的科学安排
4.2,检索步骤
19
确定检索点 /词检索课题 用户主题分析选择检索系统选择数据库制定检索式计算机处理 结果检验索引词表
N Y
20
内容 Content
范围 Coverage
时效 Currency
费用 Cost
选库的 4C原则,
21
检索点 (access points):
主题 subject
分类 classification
著者 author
名称 title
号码 code,coden,..
22
检索词 /语言,
非规范词 (非受控词,自由词 )
uncontrolled,free-term
规范词 (受控词 )
controlled
如:汉语主题词表 (中文 )
LCSH词表 (英文 )
23
禁用词,
Stop words,在记录中出现的频度太高,不能用以检索。
如:
A ARE FOR OF THE WITH
AN AS FORM ON THIS WOULD
AND BY IN THAT TO,…
24
4.3 检索方法
1、命令检索 (command search)
算符 (operator)
检索式 (query,statement,formula,
profile)
2、菜单检索 (menu search)
提示 选项 填空
25
3、浏览检索 (browse,WIMP)
hypertext,超文本链接 (hyperlink)
4,Web方式检索 (综合应用 )
检索方式:
基本检索 (basic,easy,simple,quick)
高级检索 (advanced,expended,guided)
专家检索 (advanced,expert)
26
4.4 检索式的表达
1、常用算符
逻辑算符 (boolean operator)
位置算符 *
截词符
字段符
27
逻辑与 and
solar and energy
28
逻辑或 or
solar or energy
29
逻辑非 not
solar not energy