WWW Search Engine Software
ht://Dig © 1995-1998 Andrew Scherpbier <andrew@contigo.com>
Please see the file COPYING for
license information.
Introduction
The ht://Dig system is a complete world wide web indexing and
searching system for a small domain or intranet. This system
is not meant to replace the need for
powerful internet-wide search systems like Lycos, Infoseek,
Webcrawler and AltaVista. Instead it is meant to cover the
search needs for a single company, campus, or even a
particular sub section of a web site.
As opposed to some WAIS-based or web-server based search
engines, ht://Dig can span several web servers at a site. The
type of these different web servers doesn't matter as long as
they understand the HTTP 1.0 protocol.
ht://Dig was developed at San
Diego State University as a way to search the various web
servers on the campus network. Here are some examples of the
application of ht://Dig on the SDSU network:
Many different types of searches can be set up using only a
single search database. For example, the online documentation
search above uses the same database as the campus main
search. The difference between the searches is that the
documentation search will only show results related to the
online documentation.
Features
Here are some of the major features of ht://Dig. They are in
no particular order.
-
-
Intranet searching
-
ht://Dig has the ability to search through many servers
on a network by acting as a WWW browser.
-
-
It is free
-
The whole system is released under the GNU General Public License
-
-
Robot exclusion is supported
-
The
Standard for Robot Exclusion is supported by
ht://Dig.
-
-
Boolean expression searching
-
Searches can be arbitrarily complex using boolean
expressions.
-
-
Configurable search results
-
The output of a search can easily be tailored to your
needs by means of providing HTML templates.
-
-
Fuzzy searching
-
Searches can be performed using various configurable
algorithms. Currently the following algorithms are
supported (in any combination):
-
exact
-
soundex
-
metaphone
-
common word endings
-
synonyms
-
-
Searching of HTML and text files
-
Both HTML documents and plain text files can be
searched. Searching of other file types will be
supported in future versions.
-
-
Keywords can be added to HTML
documents
-
Any number of keywords can be added to HTML documents
which will not show up when the document is viewed.
This is used to make a document more like to be found
and also to make it appear higher in the list of
matches.
-
-
Email notification of expired
documents
-
Special meta information can be added to HTML documents
which can be used to notify the maintainer of those
documents at a certain time. It is handy to get
reminded when to remove the "New" images from a certain
page, for example.
-
-
A Protected server can be indexed
-
ht://Dig can be told to use a specific username and
password when it retrieves documents. This can be used
to index a server or parts of a server that are
protected by a username and password.
-
-
Searches on subsections of the
database
-
It is easy to set up a search which only returns
documents whose URL matches a certain pattern. This
becomes very useful for people who want to make their
own data searchable without having to use a separate
search engine or database.
-
-
Full source code included
-
The search engine comes with full source code. The
whole system is released under the terms and conditions
of the GNU Public License version
2.0
-
-
The depth of the search can be limited
-
Instead of limiting the search to a set of machines, it
can also be restricted to documents that are a certain
number of "mouse-clicks" away from the start document.
-
-
Full support for the ISO-Latin-1 character
set
-
Both SGML entities like 'à' and ISO-Latin-1
characters can be indexed and searched.
andrew@contigo.com