System requirements

ht://Dig © 1995-1998 Andrew Scherpbier
Please see the file COPYING for license information.


Requirements to build ht://Dig

ht://Dig was developed under Unix using C++.

For this reason, you will need a Unix machine, a C compiler and a C++ compiler. (The C compiler is needed to compile some of the GNU libraries)

Unfortunately, I only have access to a couple of different Unix machines. ht://Dig has been tested on these machines:

Other people have compiled it under Linux as well.

libg++

If you plan on using g++ to compile ht://Dig, you have to make sure that libg++ has been installed. Unfortunately, libg++ is a separate package from gcc/g++. You can get libg++ from the GNU software archive.

Berkeley 'make'

The building relies heavily on the make program. The problem with this is that not all make programs are the same. The requirement for the make program is that it understands the 'include' statement as in
include somefile
The Berkeley 4.4 make program doesn't use this syntax, instead it wants
.include "somefile"
and hence it cannot be used to build ht://Dig.

If your make program doesn't understand the right 'include' syntax, it is best if you get and install gnumake before you try to compile everything. The alternative is to change all the Makefiles.


Disk space requirements

The search engine will require lots of disk space to store its databases. Unfortunately, there is no exact formula to compute the space requirements. It depends on the number of documents you are going to index but also on the various options you use. To give you an idea of the space requirements, here is what I have deduced from our own database size at San Diego State University.

If you keep around the wordlist database (for update digging instead of initial digging) I found that multiplying the number of documents covered by 12,000 will come pretty close to the space required.

We have about 13,000 documents:

         13,000
         12,000 x
    -----------
    156,000,000
or about 150 MB.

Without the wordlist database, the factor drops down to about 7500:

         13,000
          7,500 x
     ----------
     97,500,000
or about 93 MB.

Keep in mind that we keep at most 50,000 bytes of each document. This may seen a lot, but most documents aren't very big and it gives us a big enough chunk to almost always show an excerpt of the matches.


Andrew Scherpbier <andrew@contigo.com>
Last modified: Sat Dec 14 18:37:40 PST