NORWEGIAN UNIVERSITY OF SCIENCE AND
TECHNOLOGY
FACULTY FOR PHYSICS, INFORMATICS
AND MATHEMATICS
DIPLOMA THESIS
Name of Candidate: Bjørn Christian
Tørrissen
Student at the Information Systems Group
The Title of the Thesis: Dewey Goes Surfing: Agent-Based Information
Retrieval and Classification Support
« The Internet and the World Wide Web (WWW) are becoming major
factors in the world of information, education, trade and entertainment.
The number of resources offered by various sites on the WWW is growing
fast, and they all scream for attention. Due to the vast amounts of information
available, it is difficult both for the sites to make themselves visible
to the users they want to attract, and for the users to locate the sites
they really want to find.
The thesis should aim to come up with suggestions for how a search tool
can be created to improve the situation for both service providers and
Internet users. »
Thesis started: August 25, 1997
Thesis delivered: February 9, 1998
Thesis written at: Department of
Computer and Information Science
Supervisor: Babak Amin Farshchian
Trondheim, February 9, 1998
Table of Contents
FRONT PAGE
ABSTRACT
PREFACE
TABLE OF CONTENTS
LIST OF FIGURES AND LIST OF TABLES
1 INTRODUCTION
THE STRUCTURE OF THE REPORT
2 SEARCHING ON THE INTERNET - BACKGROUND
2.1 THE TOOLS OF TODAY
2.1.1 Clouds on the Horizon
2.2 PROVIDING CONTEXT TO INFORMATION
2.2.1 The Dublin Core
2.2.2 The Meta Content Framework (MCF)
2.3 HOW IT USED TO BE DONE
2.3.1 The MARCs
2.3.2 The Dewey Decimal Classification
2.4 PROBLEM SUMMARY
3 INTRODUCING LIBRARIANS TO THE LIBRARY
3.1 THE LIBRARIAN’S TASKS
3.2 BUILDING A LIBRARY INDEX
3.3 THE MAIN CATEGORIES OF WEB PAGES
3.4 MULTILINGUAL WEBRARIES
3.5 WEB PAGE RATINGS
3.6 BUILDING AND MAINTAINING THE WEBRARY
3.7 AUTOMATING THE WEBRARY MAINTENANCE
3.8 SEARCH SYSTEM SCENARIO
4 INTELLIGENT AGENTS – NEW TECHNOLOGY TO THE RESCUE
4.1 INTELLIGENT AGENTS
4.2 INTELLIGENCE
4.3 WHY AGENTS ARE USED
4.4 AN AGENT EXAMPLE: E-MAIL INFORMATION AGENT
4.5 HOW AGENTS TRAVEL AND TALK
4.6 A LOOK IN THE REARVIEW MIRROR: LETIZIA
5 A LOOK AHEAD: CONTEXT COMPUTING AND AUTOMATIC INDEXING
5.1 HOW IT ALL CONNECTS - AN INDEXING SYSTEM OVERVIEW
5.1.1 URL Analysis
5.1.2 Header Analysis
5.1.3 Automatic Indexing Agents
5.1.4 Agent-Supported Manual Indexing
5.2 IDENTIFYING KEYWORDS
5.3 RESOLVING THE CONTEXT FROM THE KEYS
5.3.1 Ten Reasons For Choosing Dewey
5.3.2 Dewey – Status Quo
5.3.3 Suggestions for Techniques to Improve Classification Efficiency
6 THE EDDIC CODE FORMAT
6.1 CRITICAL FACTORS FOR THE CODE DESIGN
6.2 IMPORTANT DOCUMENT PROPERTIES
6.2.1 Unique Identificator
6.2.2 Topic / Category
6.2.3 Web Page Class
6.2.4 Language
6.2.5 Contents Ratings
6.2.6 Graphics Use
6.2.7 Periodicity
6.2.8 Keywords
6.2.9 Future Extensions
6.2.10 Index Maintenance Data
6.3 AUTOMATIC AND ANALYTIC META INFORMATION
6.4 EDDIC – THE CODE
6.4.1 Meeting the Requirements
7 ANOTHER LOOK AHEAD : AGENT-SUPPORTED CONTEXT SEARCH
7.1 THE GOAL
7.2 A TYPOLOGY OF SEARCHING
7.3 NAVIGATING THE INDEX
7.3.1 Navigation For Novice Users
7.3.2 Navigation For Expert Users
7.4 QUERYING THE INDEX
7.4.1 Querying For Novice Users
7.4.2 Querying For Expert Users
7.4.3 Supporting Web Page Submitting
7.5 PERSONALIZED WEBRARIANS
8 IMPLEMENTATION ASPECTS
8.1 WEBRARIANS ARE AGENTS
8.1.1 Retrieval and Pre-classification Agents
8.1.2 Search Interface Agents
8.1.3 Choosing A Webrarian Language
8.2 CENTRALIZED VS. DISTRIBUTED INDEXING
8.3 PREPARATIONS FOR SYSTEM START-UP
8.3.1 The Codes
8.3.2 The Co-workers
8.3.3 The Contents
8.3.4 The Comments
8.4 FINANCING THE SERVICE
8.5 DISSECTING THE MONSTER
9 CONCLUDING REMARKS
9.1 THE PROBLEMS SOLVED
9.2 SUMMARY: THE NOVELTIES
9.3 THE NEXT STEPS TO TAKE
9.4 THE CONCLUSION
APPENDIX A: GLOSSARY
APPENDIX B: REFERENCES
Go to: Front page -
Index - Ch. 1 - Ch.
2 - Ch. 3 - Ch. 4
- Ch. 5 - Ch. 6 - Ch.
7 - Ch. 8 - Ch. 9
- Glossary - References
Visit the author's homepage : http://www.pvv.org/~bct/
E-mail the author, Bjørn Christian Tørrissen:
bct@pvv.org