SWISH++ Changes
===============

*******************************************************************************
3.0.3
*******************************************************************************

NEW FEATURES
------------

* A -H option has been added to 'index' to dump the built-in set of recognized
  HTML elements to standard output (so you can check to see if a certain tag is
  recognized or not).

  (This feature will be known as feature OPTH.)

* Boolean configuration file variables now accept "on" and "off" values.

  (This feature will be known as feature ON_OFF.)


BUG FIXES
---------

* There was a small memory leak when indexing META names.

  (This bug fix will be known as bug fix ML1.)

* Reporting errors in a configuration file says what line number the error is
  on.  However, the same error-reporting code is also used to print errors when
  command-line arguments are invalid.  The line number variable wasn't cleared
  so it would print an erroneous line number for an invalid command-line
  option.

  (This bug fix will be known as bug fix CLN.)

* Parsing of Boolean values in configuration files was completely broken.

  (This bug fix will be know as bug fix PBV.)

* WWW::extract_description() did it wrong for ALT attributes with an empty
  value, i.e., ALT="".

  (This bug fix will be known as bug fix ADE.)


CHANGES, file-by-file
---------------------

* conf_bool.c

	1. In parse_value(), added code to accept "on" and "off" for
	   feature ON_OFF.

	2. In parse_value(), added '!' characters before ::strcmp()
	   calls for bug fix PBV.

* conf_bool.h
* conf_int.h
* conf_string.h

	1. Made assignment operators protected since (1) they're not
	   inherited and (2) it's an abstract class.

* conf_int.c
* conf_string.c

	1. Performed following substitution:

		s/cerr/error()/

* conf_var.c

	1. Corresponding change to conf_var.h #1

	2. In parse_line(), added:

		current_config_file_line_no_ = 0;

	   for bug fix CLN.

* conf_var.h

	1. Made msg() accept an ostream& to write to.

	2. Performed following substitution:

		s/string/std::string/

* do_file.c

	1. Corresponding change to my_set.h #1.

* elements.c

	1. Added element_map::instance() for feature OPTH.

	2. Added explicit case for element::forbidden.

* elements.h

	1. Corresponding change for elements.c item #1.

	2. Made element_map::element_map() private for feature OPTH.

	3. Added operator<<( ostream&, element_map::value_type const& )
	   for feature OPTH.

* extract.c

	1. Corresponding change to my_set.h #1.

	2. In usage(), performed following substitution:

		s/Dump default stop-words/Dump stop-words/

	   since it dumps whatever stop-words are being used, not just
	   the built-in default set.

* filter.h

	1. Performed following substitution:

		s/string/std::string/

* FilterExtension.c

	1. Performed following substitution:

		s/cerr/error()/

* html.c

	1. In convert_entity(), changed access to the char_entity_map
	   for feature OPTH.

	2. In parse_html_tag(), corresponding change to my_set.h #1.

	3. In parse_html_tag(), corresponding change to elements.c #1.

	4. In parse_html_tag(), changed the way META names are looked
	   up for bug fix ML1.  Specifically, we no longer
	   unconditionally do a strdup(): this was the source of the
	   memory leak.

* index.c

	1. Added #include "elements.h" for feature OPTH.

	2. In main() and usage(), added code for feature OPTH.

	3. Corresponding change to my_set.h #1.

	4. Corresponding change to extract.c #2.

* less.h

	1. Started using binary_function's first_argument_type,
	   second_argument_type, and result_type typedefs.

* Makefile

	1. Added dependency for index.c on elements.h feature OPTH.

* man/man1/index.1

	1. Added description for new -H option for feature OPTH.

	2. Mentioned which verbosity level is the default.

	3. Added a reference to the "Index of Elements" in the HTML 4.0
	   specification.

* man/man4/swish++.conf.4

	1. Added "on" and "off" for feature ON_OFF.

* my_set.h

	1. Performed following substitution:

		s/find/contains/

	   to distinguish it from STL find() functions that return
	   iterators.

* search.c

	1. Corresponding change to my_set.h #1.

	2. In usage(), removed "standard out" verbiage.

* stem_word.c

	1. In stem_word(), removed use of char_buffer_pool.

* stem_word.h

	1. Corresponding change to less.h item #1.

* util.c

	1. Performed following substitution:

		s/string/std::string/

* util.h

	1. Used S_ISxxx() macros for file tests rather than S_IFxxx.

* version.h

	1. Updated version to "3.0.3".


* WWW.pm

	1. Changed lines 103 and 104 from:

		$s =~ s/<[^>]+?ALT\s*=\s*(['"])([^>]+)\1[^>]*?>/$2/gi;
		$s =~ s/<[^>]+?ALT\s*=\s*(['"])([^'"]+)\1?\s*$/$2/i;
	   to:
		$s =~ s/<[^>]+?ALT\s*=\s*(['"])([^>]*?)\1[^>]*?>/$2/gi;
		$s =~ s/<[^>]+?ALT\s*=\s*(['"])([^'"]*)\1?\s*$/$2/i;

	   for bug fix ADE.


*******************************************************************************
3.0.2
*******************************************************************************

BUG FIXES
---------

* The -r option for index and extract was broken by release 3.0; it's fixed
  now.

  (This bug fix will be known as bug fix DASHR.)


CHANGES, file-by-file
---------------------

* directory.c

	1. On line 104, reversed the order of the conditions to now be:

		if ( is_directory( path ) && recurse_subdirectories )

	   for bug fix DASHR.  For directories, a stat(2) wasn't being
	   performed so the is_plain_file() call in do_file() didn't
	   work.

* extract.c
* index.c

	1. In main(), performed following substitutions for command
	   line argument variables:

		s/char*/char const*/

* search.c

	1. In main(), performed following substitutions for command
	   line argument variables:

		s/char*/char const*/

	2. Performed following substitutions:

		s/dump_match/dump_match_arg/
		s/dump_window_size/dump_window_size_arg/
		s/skip_results/skip_results_arg/

* version.h

	1. Updated version to "3.0.2".


*******************************************************************************
3.0.1
*******************************************************************************

BUG FIXES
---------

* The code failed to compile under g++ 2.95 because it caught errors that
  previous versions of g++ allowed to compile.

  (This bug fix will be known as GCC2.95.)

* There were a few mistakes in the section 1 manual pages to cover all the
  changes to version 3.0.

  (This bug fix will be known as MAN3.)


CHANGES, file-by-file
---------------------

* elements.c

	1. On line 276, added an intermediate cast to int to get rid of
	   an error trying to convert directly from a char* to an enum
	   for bug fix GCC2.95.

* index.c

	1. In rank_full_index(), added another local scope for bug fix
	   GCC2.95.

* man/man1/extract.1
* man/man1/index.1
* man/man1/search.1

	1. Performed following substitution:

		s/the.index/swish++.index/

	   for bug fix MAN3.

	2. Fixed some formatting errors.

* man/man4/swish++.conf.4

	1. Fixed some formatting errors.

* search.c

	1. Performed following substitution:

		s/result_type/results_type/
		s/sorted_result_type/sorted_results_type/

	   and added new result_type type for bug fix GCC2.95.

	2. In main(), performed following substitution;:

		s/typedef vector< result_type::value_type > sorted_result_type;
		 /typedef vector< result_type > sorted_results_type;/

	   for bug fix GCC2.95.

* util.h

	1. Rewrote is_directory() and is_plain_file() in terms of
	   file_exists().

* version.h

	1. Updated version to "3.0.1".

* word_index.h

	1. Added definitions for:

		word_index::const_iterator::operator+=()
		word_index::const_iterator::operator-=()

	   for bug fix GCC2.95.


*******************************************************************************
3.0
*******************************************************************************

NEW FEATURES
------------

* SWISH++ now allows flexible file filtering for extraction and indexing.

  (This feature will be known as feature FFF.)

* SWISH++ now allows configuration files since they were necessary for feature
  FFF.  If I had to add them, I might as well do it right.

  (This feature will be known as feature CONF.)

* SWISH++ now compiles and runs under Windows (95/98/NT).

  (This feature will be known as feature WIN32.)

* 'index' now accepts a -T option that allows the directory to use for
  temporary files to be specified.

  (This feature will be known as feature TEMP.)

* 'index' and 'extract' now report the number of files examined in addition to
  the number indexed or extracted, respectively.

  (This feature will be known as feature EXAM.)


BUG FIXES
---------

* In the admitedly rare case of a malformed HTML file ending in a '<' character
  (without a newline, i.e., '<' is the *VERY* last character in the file),
  'index' would core-dump.

  (This bug fix will be known as bug fix EGT.)


CHANGES, file-by-file
---------------------

* conf_bool.c
* conf_bool.h
* conf_int.c
* conf_int.h
* conf_set.c
* conf_set.h
* conf_string.c
* conf_string.h
* conf_var.c
* conf_var.h
* ExcludeClass.h
* ExcludeExtension.h
* ExcludeMeta.h
* FilesReserve.h
* filter.c
* filter.h
* FilterExtension.c
* FilterExtension.h
* FollowLinks.h
* IncludeExtension.h
* IncludeMeta.h
* IndexFile.h
* man/man4/swish++.conf.4
* RecurseSubdirs.h
* ResultsMax.h
* StemWords.h
* StopWordFile.h
* TitleLines.h
* Verbosity.h
* WordFilesMax.h
* WordPercentMax.h

	1. New files for feature CONF.

* config.h

	1. Added Config_Filename_Default for feature CONF.

	2. Performed following substitution:

		s/the.index/swish++.index/

* config/config.mk

	1. Added -DWIN32 to CCFLAGS for feature WIN32.

	2. Added more comments to CCFLAGS.

	3. Added CCLINK for feature WIN32.

	4. Added a "You shouldn't have to change anything below this
	   line" line.

	5. Added more comments for the "Manual pages" section and the
	   DISTILL variable.

	6. Added .SUFFIXES at bottom.

* config/config-sh

	1. Renamed from config.sh so some versions of make don't get
	   confused with the .sh suffix and try to build it.

	2. Define PJL_NO_SYMBOLIC_LINKS if WIN32 is defined for feature
	   WIN32.

* config/Makefile

	1. Removed test for bool type: bool is now a requirement of the
	   C++ compiler.  This was necessary for feature CONF since it
	   specializes a template on bool.

	2. Performed following substitution:

		s/config.sh/config-sh/

	   corresponding to config/config-sh item #1.

* config/src/bool.c

	1. This file was removed corresponding to config/Makefile item
	   #1.

* directory.c

	1. Performed following substitutions:

	    s/bool recurse_subdirectories/RecurseSubdirs recurse_subdirectories/
	    s/int verbosity/Verbosity verbosity/

	   for feature CONF.

	2. Added PJL_NO_SYMBOLIC_LINKS for WIN32.

	3. Moved definition of stat_buf to util.c.

* directory.h

	1. Include platform.h for new PJL_NO_SYMBOLIC_LINKS symbol.

	2. Moved stat_buf and file test functions to util.h.

* do_file.c

	1. The common code between 'index' and 'extract' was moved
	   here.

	2. The increment of "num_examined_files" was added for feature
	   EXAM.

* exit_codes.h

	1. New header file.

* extract.c

	1. Added explicit definition of MAXNAMLEN under Windows for
	   feature WIN32.

	2. Performed following substitutions:

	    s/string_set exclude_extensions/ExcludeExtension exclude_extensions/
	    s/string_set include_extensions/IncludeExtension include_extensions/

	   for feature CONF.

	3. Corresponding change to directory.c item #1.

	4. Added extract_words() function to parallel index.c's
	   index_words() function.

	5. In main(), redid the way in which command line options are
	   processed such that they take precedence over configuration
	   file variables for feature CONF.

	6. In main(), made -l option conditional on whether we're
	   compiling under Window or not for feature WIN32.

	7. In main(), added -c option for feature CONF.

	8. In main(), added code to test whether a file or directory
	   actually exists before calling do_directory or do_file().

	9. Moved code for do_file() to do_file.c to factor out code
	   common between extract and index.

	A. In usage(), added description of -c option for feature CONF.

	B. In usage(), made description of -l option conditional on
	   Windows for feature WIN32.

	C. Changed all calls to exit(3) to use new exit code enums.

	D. Added "num_examined_files" global variable for feature EXAM.

	E. In main(), added code to print "num_examined_files" for
	   feature EXAM.

* fake_ansi.h

	1. Removed __cplusplus test.

	2. Removed section for bool type: bool is now a requirement of
	   the C++ compiler.  This was necessary for feature CONF since
	   it specializes a template on bool.

* file_index.h
* file_index.c

	1. Removed #include "fake_ansi.h" since bool is now required.

* file_list.c

	1. Added #include "fake_ansi.h".

	2. Removed erroneous #include "html.h".

* file_vector.h

	1. Added #include's for Windows for feature WIN32.

	2. Added conditional compilation for file_vector_base's
	   size_type and fd_ for Windows for feature WIN32.

* file_vector.c

	1. Removed #include "fake_ansi.h" since bool is now required.

	2. Added conditional compilation for Windows for feature WIN32.

* html.c

	1. Performed following substitutions:

		s/no_index_class_count/exclude_class_count/
		s/no_index_class_names/exclude_class_names/

	2. In parse_html_tag(), added:

		if ( c == end )
			return;

	   for bug fix EGT.

* html.h

	1. Performed following substitutions:

		s/no_meta_id/No_Meta_ID/
		s/meta_id_not_found/Meta_ID_Not_Found/

	   to make all enum's have capital letters.

* index.c

	1. Corresponding change to extract.c item #1.

	2. Corresponding change to extract.c item #2.

	3. Corresponding change to extract.c item #5.

	4. Corresponding change to extract.c item #6.

	5. Corresponding change to extract.c item #7.

	6. Corresponding change to extract.c item #8.

	7. Corresponding change to extract.c item #9.

	8. Corresponding change to extract.c item #A.

	9. Corresponding change to extract.c item #B.

	A. Corresponding change to extract.c item #C.

	B. Performed following substitutions:

		s/no_index_class_count/exclude_class_count/
		s/no_index_class_names/exclude_class_names/

	C. Performed following substitutions:

		s/int num_files_reserve/FilesReserve num_files_reserve/
		s/int num_title_lines/TitleLines num_title_lines/
		s/int word_file_file_max/WordFilesMax word_file_max/
		s/int word_file_percent_max/WordPercentMax word_percent_max/

	   for feature CONF.

	D. In main() and write_partial_index(), added "ios::binary" to
	   "out" ofstream for feature WIN32.

	E. In main(), added code for -T option for feature TEMP.

	F. Corresponding change to extract.c item #D.

	G. Corresponding change to extract.c item #E.

* index.h

	1. Corresponding change as html.h #1.

* INSTALL.unix

	1. Remaned from INSTALL due to introduction of INSTALL.win32

* INSTALL.win32

	1. New file for feature WIN32

* Makefile

	1. Added more comments for DEBUG options.

	2. Added new targets for feature CONF.

	3. Redid a lot of dependencies as a result.

* man/man1/index.1

	1. Added descriptions of configuration file variable for
	   feature CONF.

	2. Added Filters subsection to DESCRIPTION for feature FFF.

	3. Added description of -c option for feature CONF.

	4. Added caveat that the -l option is not available under
	   Windows for feature WIN32.

	5. Added description of -T option for feature TEMP.

	6. Added CONFIGURATION FILE section for feature CONF.

	7. Added Filters subsection to EXAMPLES for feature FFF.

	8. Expanded EXIT STATUS section to list specific exit codes.

	9. Added compress(1), gunzip(1), gzip(1), uncompress(1), and
	   swish++.conf(4) to SEE ALSO section.

* man/man1/extract.1

	1. Added descriptions of configuration file variable for
	   feature CONF.

	5. Added caveat that the -l option is not available under
	   Windows for feature WIN32.

	6. Expanded EXIT STATUS section to list specific exit codes.

	7. Added swish++.conf to FILES section for feature CONF.

	8. Performed following substitution:

		s/the.index/swish++.index/

	9. Added swish++.conf(4) to SEE ALSO section for feature CONF.

* man/man1/search.1

	1. Added decription of -c option for feature CONF.

	2. Added CONFIGURATION FILE section for feature CONF.

	3. Expanded EXIT STATUS section to list specific exit codes.

	4. Added swish++.conf to FILES section for feature CONF.

	5. Performed following substitution:

		s/the.index/swish++.index/

	6. Added swish++.conf(4) to SEE ALSO section for feature CONF.

* man/man4/Makefile

	1. Corresponding change to swish++.index.4 item #1.

	2. Added swish++.conf.4 for feature CONF.

* man/man4/swish++.index.4

	1. This file was renamed from swish++.4.

* search.c

	1. Performed following substitution:

		s/bool stem_words/StemWords stem_words/

	   for feature CONF.

	2. Corresponding change as html.h #1.

	3. Corresponding change to extract.c item #C.

	4. Corresponding change to extract.c item #5.

	5. Corresponding change to extract.c item #A.

* stop_words.c

	1. Added local static variable to constructor.

	2. Corresponding change to extract.c item #C.

* stop_words.h

	1. Removed private static data member.

* swish++.conf

	1. Added template configuration for feature FFF.

* token.c

	1. Performed following substitution:

		s/fake_ansi.h/platform.h/

	   since bool is now required.

* util.c
	
	1. Moved stat_buf here from directory.h.

	2. Added parse_config_file() for feature CONF.

* util.h

	1. Corresponding change to util.c item #1.

	2. Moved file test functions here from directory.h.

	3. orresponding change to util.c item #2.

* version.h

	1. Updated version to "3.0".

* word_index.c

	1. Removed #include "fake_ansi.h" since bool is now required.

* word_index.h

	1. Removed #include "fake_ansi.h" since bool is now required.

* word_info.h

	1. Corresponding change as html.h #1.


*******************************************************************************
2.0.1
*******************************************************************************

BUG FIXES
---------

* The code parsed HTML attributes inside HTML comments.  This is (obviously)
  the wrong thing to do.  HTML comments declarations are now really, really
  ignored.  Honest.

  (This bug fix will be known as bug fix ACP.)

* The code parsed HTML attributes inside <!DOCTYPE ...> declarations.  This is
  also (obviously) the wrong thing to do.  <!DOCTYPE...> declarations are now
  also ignored.

  (This bug fix will be known as bug fix EXP.)

* The set of HTML end tags that close some HTML elements was incomplete.

  (This bug fix will be known as bug fix HC1.)


CHANGES, file-by-file
---------------------

* elements.c

	1. For the <colgroup> element, added <colgroup> for bug fix
	   HC1.

	2. For the <td> element, added <tbody>, </tbody>, </td>,
	   <tfoot>, </tfoot>, <tr>, and </tr> for bug fix HC1.

	3. For the <tfoot> element, added <tbody> and <thead> for bug
	   fix HC1.

	4. For the <th> element, added <tbody>, </tbody>, <tfoot>,
	   </tfoot>, </th>, <tr>, and </tr> for bug fix HC1.

	5. For the <thead> element, added <tbody> and <tfoot> for bug
	   fix HC1.

	6. For the <tr> element, added <tbody>, </tbody>, <tfoot>,
	   </tfoot>, and </thead> for bug fix HC1.

* html.c

	1. In parse_html_tag(), added "if ( ... ) return;" around call
	   to skip_html_tag() for bug fix ACP.

	2. In parse_html_tag(), added check to see if first character
	   of an HTML tag is '!' for bug fix EXP.

	3. In skip_html_tag(), changed return type to "bool" and added
	   "return" statements for bug fix ACP.

* version.h

	1. Updated version to "2.0.1".


*******************************************************************************
2.0
*******************************************************************************

NEW FEATURES
------------

* SWISH++ can now selectively not index text in HTML files within HTML elements
  that are members of specified classes.

  (This feature will be known as feature CLASS.)

* The 'search' command now offers optional stemming.  Indexing is unaffected.

  (This feature will be known as feature STEM.)

* In all earlier versions, the number of total words reported was actually the
  total number of words indexed; now, it is the total number of words parsed
  and the former "total words" is now reported as the number of words indexed.

  (This feature will be known as feature NTW.)

* The 'search' command now outputs an additional comment "results" followed by
  the total number of search results.  Additionally, there is a new -R command-
  line option to print this alone.

  (This feature will be known as feature PRC.)


CHANGES, file-by-file
---------------------

* elements.c
* elements.h

	1. Added these files for feature CLASS.

* html.c

	1. Added #include "elements.h" for feature CLASS.

	2. Added extern references to no_index_class_names and
	   no_index_class_count corresponding to index.c #1.

	3. Performed the following substitution:

		s/to_upper/to_lower/

	   to eliminate the to_upper() function entirely.

	4. In grep_title(), performed the following substitution:

		s/TITLE/title/

	   so we can eliminate the to_upper() function entirely.

	5. In parse_html_tag(), corresponding change for html.h #1.

	6. In parse_html_tag(), added code for feature CLASS.

* html.h

	1. For parse_html_tag() function, added:

		bool is_new_file = false

	   for feature CLASS.

* index.c

	1. Added global variables:

		string_set	no_index_class_names;
		int		no_index_class_count;

	   for feature CLASS.

	2. Added global variable:

		long	num_indexed_words;

	   for feature NTW.

	3. In main(), added -C option for feature CLASS.

	4. In main(), added code to print num_indexed_words for feature
	   NTW.

	5. In index_word(), performed following substitution:

		s/num_total_words/num_indexed_words/

	   for feature NTW.

	6. In index_word(), added new:

		++num_total_words;

	   for feature NTW.

	7. In index_word(), added code to test no_index_class_count for
	   feature CLASS.

	8. In index_words(), added:

		static bool new_file;

	   variable for feature CLASS.

	9. In usage(), added description for -C option for feature CLASS.

	A. In merge_indicies(), changed write-header code to neither
	   allocate nor write the offsets for stop words or meta names
	   if there are zero of them.

	B. In rank_full_index(), added check to see if there are no
	   indexed words: if not, return.

	C. In write_full_index(), added check to see if there are no
	   indexed words: if not, return.

	D. In write_full_index(), changed write-header code to neither
	   allocate nor write the offsets for stop words or meta names
	   if there are zero of them.

* Makefile

	1. Added -DDEBUG_parse_class for feature CLASS.

	2. Added elements.o object for feature CLASS.

	3. Added -DDEBUG_stem_word for feature STEM.

	4. Added target for stem_word.o for feature STEM.

* man/man1/index.1

	1. Added description for -C option and examples for feature
	   CLASS.

* man/man1/search.1
 
	 1. Added description of new stemming option for feature STEM.

	 2. Added description of new -R option for feature PRC.

* search.c

	1. Added global variable:

		bool	stem_words;

	   for feature STEM.

	2. In main(), performed following substitution:

		s/dDi:m:Ms:SVw:/dDi:m:Mr:RsSVw:/

	   for features PRC and STEM.

	3. In main(), changed what was option 's' to option 'r' and
	   added a new option 's' for feature STEM.

	4. In main, added a new -R option for feature PRC.

	5. In parse_primary(), added "less_stem" object to word_token
	   case as well as having two exclusive calls to
	   binary_search() and equal_range() depending upon stem_words
	   for feature STEM.

	6. In usage(), corresponding changes to items #3 and #4.

* stem_word.c
* stem_word.h
 
	 1. Added these files for feature STEM.

* postscript.h

	1. Added more comments.

* util.c

	1. Moved is_vowel() function to util.h and made it so that it
	   does not call tolower().

	2. In is_ok_word(), performed following substitution:

		s/is_vowel( *c )/is_vowel( tolower( *c ) )/

	   corresponding to item #1.

	3. In ltoa() and to_lower(), made use of new char_buffer_pool
	   class.

* util.h

	1. Added char_buffer_pool class since its functionality is
	   being used 3 times now.

	2. Moved is_vowel() function here from util.c.

	3. Added lots more comments.

* version.h

	1. Updated version to "2.0".


*******************************************************************************
1.7
*******************************************************************************

NEW FEATURES
------------

* Since version 1.4, SWISH++ indexed the text in the ALT attributes of AREA and
  IMG elements.  SWISH++ now adds a few attributes.  The complete set is:

	Attribute	Element
	---------	-------
	TITLE		any
	ALT		AREA, IMG, INPUT
	STANDBY		OBJECT
	SUMMARY		TABLE

  (This feature will be known as IEA.)

* Added Word_Min_Vowels to config.h so vowel checks can be disabled (or made
  more stringent).

  (This feature will be know as feature WMV.)


BUG FIXES
---------

* When a given word appeared through many files, its ranks came out rather
  "flat" in the search results.  This has been fixed.

  (This bug fix will be known as bug fix 10K.)


CHANGES, file-by-file
---------------------

* config.h

	1. Added Word_Min_Vowels definition for feature WMV.

* extract.c

	1. Split out function extract_word() from do_file() to parallel
	   changes in index.c.

	2. Moved 'in_postscript' variable to be at file scope due to
	   #1.

	3. In do_file(), added missing 'const' to declaration:

		static ext_proc_map const ext_procs;

	   It should have been there all along.

* html.c

	1. Added declarations for find_attribute(), skip_html_comment()
	   and skip_html_tag() to the top of the file.  They should
	   have been there all along.

	2. In convert_entity(), added missing 'const' to declaration:

		static chat_entity_map const char_entities;

	   It should have been there all along.

	3. Modified find_attribute() so that the 'begin' and 'end'
	   iterators are touched only if the attribute is found.

	4. Split out function skip_html_tag() from parse_html_tag()
	   because it's cleaner that way.

	5. In parse_html_tag(), was able to eliminate the
	   'parse_elements' parameter due to #4.

	6. In parse_html_tag(), added code for feature IEA.

* html.h

	1. Corresponding change for html.c #5.

* index.c

	1. Split out function index_word() from index_words() because
	   it's cleaner that way.

	2. Peformed following substitution:

		s/1000.0/10000.0/

	   for bug fix 10K.

	3. In usage(), peformed following substitution for the -M
	   option:

		s/in index/to index/

* man/man1/index.1

	1. Additions for feature IEA.

* util.c

	1. In is_ok_word(), added Word_Min_Vowels for feature WMV.

	2. In is_ok_word(), deleted 'consonants' variable since it
	   wasn't being used.

	3. Redid to_lower() function to use multiple buffers.

	4. Overloaded to_lower() function to take a pair of iterators.

* util.h

	1. Corresponding change for util.c #4.

* version.h

	1. Updated version to "1.7".


*******************************************************************************
1.6
*******************************************************************************

NEW FEATURES
------------

* The value of the CONTENT attribute for META elements can now selectively be
  indexed based on the value of the NAME attribute, either by explicit
  inclusion or exclusion.

  (This feature will be known as feature MIE.)

* The WWW Perl library has a new function, extract_meta(), that can extract the
  value of the CONTENT attribute from a META element having a given NAME
  attribute from an HTML file.

  This can be used to display meta information in search results, e.g., for
  a given search result, also display its author, publication date, etc.

  (This feature will be known as feature EMC.)


BUG FIXES
---------

* If parentheses were used in conjunction with 'not' in a query involving meta
  names, it didn't work, e.g.:

	search author = not hawking

  worked as expected, but:

	search author = not ( hawking )

  didn't even though it is (supposed to be) equivalent.

  (This bug fix will be known as bug fix MNP.)


CHANGES, file-by-file
---------------------

* html.c

	1. Added #include "my_set.h" for feature MIE.

	2. At global scope, added declarations:

		extern string_set exclude_meta_name, include_meta_names;

	   for feature MIE.

	3. In function parse_html_tag(), added code for feature MIE.

* index.c

	1. Added declarations:

		string_set exclude_meta_name;
		string_set include_meta_names;

	   for feature MIE.

	2. In main(), added "m:M:" to opts[] and cases for 'm' and 'M'
	   command-line options for feature MIE.

	3. In usage(), added explanation of -m and -M options for
	   feature MIE.

* Makefile

	1. Added dependency of my_set.h to html.o for feature MIE.

* man/man1/index.1

	1. Added description of new -m and -M command-line options for
	   feature MIE.

* man/man3/WWW.3

	1. Added description for extract_meta() function for feature
	   EMC.

* search.c

	1. Added "int = no_meta_id" to declarations and definitions of
	   parse_meta() and parse_query() functions for bug fix MNP.

	2. In parse_primary()'s lparen_token case, added "meta_id" to
	   recursive call of parse_query() for bug fix MNP.

* version.h

	1. Updated version to "1.6".

* WWW.pm

	1. Added extract_meta() function for feature EMC.

	2. Rewrote extract_description() in terms of extract_meta().


*******************************************************************************
1.5.1
*******************************************************************************

NEW FEATURES
------------

* Both 'index' and 'extract' now have a new verbosity level 4 that prints
  filenames that are not indexed or extracted, respectively, and why.  (This
  feature was added to help fix bug fix HTH.)

  (This feature will be known as feature IEV4.)

* The 'httpindex' script's -v option now works exactly like that of 'index'.

  (This feature will be known as feature HTV.)


BUG FIXES
---------

* META attribute name parsing had a bug where the find_attribute() function
  could occasionally run past the 'end' of where it was supposed to look.

  (This bug fix will be known as bug fix FAE.)

* The 'httpindex' script would hang if it told 'index' to index a file and,
  for whatever reason, 'index' couldn't since 'index' would silently skip the
  file.

  (This bug fix will be known as bug fix HTH.)

* The WWW::extract_description() function returned the first
  $description::chars characters of a file untouched if the file did not end
  with one of the filename extensions matched by the pattern
  /\.(?:[a-z]?html?|txt)$/i.  What it should do is return a null description.

  (This bug fix will be known as bug fix EDN.)

* The 'httpindex' script didn't test the extracted description to see if it is
  null: if it is, it should not attempt to overwrite the original file with the
  description and instead just delete the file.

  (This bug fix will be known as bug fix HTND.)


CHANGES, file-by-file
---------------------

* extract.c

	1. In main(), changed upper-bound for verbosity to 4 for
	   feature IEV4.

	2. In do_file(), added additional print statements for feature
	   IEV4.

	3. In usage(), changed message to show verbosity range as 0-4
	   for feature IEV4.

* html.c

	1. In find_attribute(), made it correctly skip attribute names
	   that don't match for bug fix FAE.

	2. In find_attribute(), made it so that 'c' is never
	   incremented past 'end' (as it sometimes incorrectly was) for
	   bug fix FAE.

* httpindex.in

	1. Performed following substitution:

		s/-v3/-v4/

	   for bug fix HTH.

	2. Added code to test the extracted description to see if it is
	   null for bug fix HTND.

	3. If a file can not be overwriten with its description (using
	   the -d option), a warning is now merely issued rather than
	   dieing as in version 1.5.

	4. Added code for feature HTV.

* index.c

	1. Same as extract.c #1.

	2. Same as extract.c #2.

	3. Same as extract.c #3.

* man/man1/extract.1
* man/man1/index.1

	1. Updated description for feature IEV4.

* version.h

	1. Updated version to "1.5.1".

* WWW.pm

	1. Added a "default case" to WWW::extract_description() for bug
	   fix EDN.


*******************************************************************************
1.5
*******************************************************************************

NEW FEATURES
------------

* A new command, httpindex, has been added to assist in indexing files on
  remote servers.

  (This feature will be known as feature HTTP.)


BUG FIXES
---------

* The regular expressions in extract_description() in WWW.pm had some bugs.

  (This bug fix will be known as bug fix WRE.)

* The ignore stop words feature (feature ISW) added in version 1.2 that was
  broken, fixed, and fixed again is being fixed yet again so that ignored words
  are reported even if there are no other results.

  (This bug fix will be known as bug fix ISW4.)


CHANGES, file-by-file
---------------------

* config/config.mk

	1. Added PERL variable for feature HTTP.

* config/config.pl

	1. This Perl configuration script was added for feature HTTP.

* extract.c

	1. Moved #include <dirent.h> after <sys/types.h> for BSD
	   systems.

* file_vector.c

	1. Modified the behavior of file_vector<T> not to return an
	   error if the file being mapped is of zero length for feature
	   HTTP.

* httpindex.in

	1. This Perl 5 script was added for feature HTTP.

* index.c

	1. In do_file(), added a check to skip an empty file since
	   file_vector<T> now opens empty files for feature HTTP.

	2. Moved #include <dirent.h> after <sys/types.h> for BSD
	   systems.

* INSTALLATION

	1. A third prerequisite of Perl 5 was added for feature HTTP.

	2. A fourth prerequisite of wget was added for feature HTTP.

* Makefile

	1. A target was added for httpindex for feature HTTP.

	2. The Makefile now also installs WWW.pm since it it required
	   by httpindex.

* man/man1/httpindex.1

	1. Added this manual page for feature HTTP.

* man/man1/Makefile

	1. Added targets for new httpindex.1 manual page for feature
	   HTTP.

* search.c

	1. In main(), moved:

		if ( skip_results >= results.size() )
			return 0;

	   past the code that prints the stop words for bug fix ISW4.

* version.h

	1. Updated version to "1.5".

* WWW.pm

	1. This file has been moved up from the subdirectory
	   www_example.

	2. In extract_description(), the regular expressions to extract
	   the META NAME=description descriptions had missing '?'s
	   added for bug fix WRE.

	3. In extract_description(), the regular expression to remove a
	   trailing ALT attribute was fixed for bug fix WRE.


*******************************************************************************
1.4.1
*******************************************************************************

BUG FIXES
---------

* The META names words were associated with was completely wrong.  It worked in
  a small number of test cases (my original test cases -- figures), but not in
  the general case.

  (This bug fix will be known as bug fix MID.)

* In 1.4, a given word could be associated with at most 1 meta name per file.
  This limitation was an oversight.  It has been corrected.

  (This bug fix will be known as bug fix MMM.)


CHANGES, file-by-file
---------------------

* extract.c

	1. Performed following substitution:

		s/string_set.h/my_set.h/

* file_list.c

	1. Performed following substitution:

		s/meta_index/meta_id/

	   for bug fix MID.

	2. Added code to read multiple meta-IDs for bug fix MMM.

* file_list.h

	1. Changed declaration of file_list::value_type to be simply
	   word_info::file since the structures are the same.

* file_list.c

	1. Same as file_list.c #1.

* html.h

	1. Same as file_list.c #1.

* index.c

	1. Same as file_list.c #1.

	2. Same as extract.c #1.

	3. Added remove_tmp_files() function and set it to be called
	   viat atexit() so that temporary files are removed even if
	   the program terminates prematurely.

	4. In index_words(), added code to add multiple meta-IDs for
	   bug fix MMM.

	5. In merge_indices(), added code to write multiple meta-IDs
	   for bug fix MMM.

	6. In write_meta_name_index(), added code to write out the
	   numeric ID for META name for bug fix MID.

	7. In write_word_index(), same as #4.

* index.h

	1. Same as file_list.c #1.

* less.h

	1. Added explicit default constructor since g++ 2.8.0 complains
	   if it isn't there and you try to define a "const less"
	   object.

* my_set.h

	1. This file was renamed from string_set.h.

	2. Made string_set class generic for any type T since we now
	   use a set< short > in word_info::file.

	3. Changed declaration of string_set to be simply:

		typedef my_set< char const* > string_set;

* postscript.h

	1. Same as extract.c #1.

* search.c

	1. Same as extract.c #1.

	2. Same as file_list.c #1.

	3. In dump_single_word(), performed following substitution:

		s/less< char const* >/less< char const* > const/

	   since it can be const (and everything that can be const
	   should be).

	4. In dump_word_window(), same as #3.

	5. Added get_meta_id() function for bug fix MID.

	6. In parse_meta(), performed following substitution:

		s/::distance( meta_names.begin(), found.first )
		 /get_meta_id( found.first )/
	
	   for bug fix MID.

	7. In parse_meta(), same as #3.

	8. In parse_primary(), same as #3.

	9. In parse_primary(), added code in while loop at end of
	   function to check all meta-IDs associated with a word for
	   bug fix MMM.

* stop_words.h

	1. Same as extract.c #1.

* string_set.h

	1. This file was renamed to my_set.h.  (See it for additional
	   changes.)

* version.h

	1. Updated version to "1.4.1".

* word_info.c

	1. Same as file_list.c #1.

* word_info.h

	1. Changed word_info::file struct to include a set of meta-IDs
	   for bug fix MMM.

	2. Changed word_info::file struct to use shorts for occurrences
	   and rank to conserve memory (since additional memory is now
	   being taken up by the set of meta-IDs).

	3. Gave the word_info::file struct 3 speparate constructors so
	   the minimal amount of code is executed depending on how an
	   object is constructed.

*******************************************************************************
1.4
*******************************************************************************

NEW FEATURES
------------

* SWISH++ now indexes and can search META data.

  (This feature will be known as feature META.)

* SWISH++ now indexes the words in ALT attributes within AREA and IMG elements.

  (This feature will be known as feature ALT.)

* SWISH++ can now index files and directories specified via standard input
  instead of via the command line.  When doing this, extensions of files to
  index need not explicitly be specified via the -e option, i.e., 'index'
  assumes you know what you're doing when specifying filenames.

  (This feature will be known as feature ISI.)

* For both 'index' and 'extract', a new -r command line option was added to
  suppress recursively indexing files in subdirectories.  This option is most
  useful in conjunction with the new ISI feature.

  (This feature will be known as feature CLR.)

* Added an optimization option for detemining whether a character is a "word
  character" by eliminating the call to strchr() in is_word_char().  This
  yields about a 10% performance improvement during indexing.

  (This feature will be known as feature WCO.)

* The code for the 'index' was profiled and a couple of performance tweaks were
  made yielding about a 7% performance improvement.

  (This feature will be known as feature PPT.)


BUG FIXES
---------

* A small bug whereby the last word of a file was not indexed if the last line
  didn't end in a newline (or a whitespace character in general) was fixed.

  (This bug fix will be known as bug fix ILW.)


CHANGES, file-by-file
---------------------

* config.h

	1. Added OPTIMIZE_WORD_CHARS, OPTIMIZE_WORD_BEGIN_CHARS, and
	   OPTIMIZE_WORD_END_CHARS for feature WCO.

* directory.c

	1. Added a comment regarding do_file().

	2. Added check of new global "recurse_subdirectories" variable
	   for feature CLR.

* entities.h

	1. Added a comment regarding the use of "less< key_type >" with
	   the map.

* ext_proc.h

	1. Performed following substitution:

		s/map_type::const_iterator i/map_type::const_iterator const i/

	   It should have been that way all along.

* extract.c

	1. In main(), added code for feature ISI.

	2. In main(), added handling of new -r option for feature CLR.

	3. In do_file(), redid main 'while' loop and added 'if's for
	   bug fix ILW.

	4. In do_file(), replaced calls to strchr() with new
	   is_word_begin_char() and is_word_end_char() functions for
	   feature WCO.

	5. In usage(), added missing description for -E option.

	6. In usage(), added description for new -r option for feature
	   CLR.

* file_info.h

	1. Added current_file() function for feature META.

* file_list.c

	1. In operator++(), added meta-index parsing code for feature
	   META.

* file_list.h

	1. Added value_type::meta_index data member for feature META.

* html.c

	1. Added #include "index.h" and #include "meta_map.h" for
	   features ALT and META.

	2. Throughout the entire file, improved the SEE ALSO
	   references, added URLs.

	3. Added find_attribute() for features ALT and META.

	4. Performed following substitution:

		s/skip_html_tag/parse_html_tag/

	   for features ALT and META.

	5. Added parse_elements parameter to parse_html_tag() and code
	   to parse ALT attributes and META elements for features ALT
	   and META.

* html.h

	1. Added definitions of no_meta_index and meta_index_not_found
	   for feature META.

	2. Performed following substitution:

		s/skip_html_tag/parse_html_tag/

	   for feature META.

* index.c

	1. Added #include "index.h" and #include "meta_map.h" for
	   feature META.

	2. Added definition of meta_names for feature META.

	3. In main(), added code for feature ISI.

	4. In main(), added handling of new -r option for feature CLR.

	5. Refactored do_file() by splitting out the actual word
	   indexing part into a new function index_words() for feature
	   META.  The index_words() function is now also called by
	   parse_html_tag() to index the words in the CONTENT attribute
	   of META elements.

	6. In do_file(), replaced 3 function calls to strcmp() to see
	   if a file is an HTML file with a callto a new, inlined
	   is_html_ext() function for feature PPT.

	7. In index_words(), redid main 'while' loop and added 'if's
	   for bug fix ILW.

	8. In index_words(), added 'if' so as not to parse '<' as the
	   start of and HTML tag if meta_index >= 0 for feature META.

	9. In index_words(), replaced calls to strchr() with new
	   is_word_begin_char() and is_word_end_char() functions for
	   feature WCO.

	A. In merge_indicies(), added code to write meta index for
	   feature META.

	B. In merge_indices(), redid code for writing the word index to
	   use low ASCII characters as separators for feature META.

	C. Replaced a lot of 'for' loops iterating over an entire
	   sequence with a new FOR_EACH or TRANSFORM_EACH macro.  (I
	   got tired of typing.)

	D. In rank_full_index(), moved the code to compute the ranks
	   AFTER the tests to see whether a word occurs too frequently.
	   It was originally placed before since file_count needed to
	   be calculated, but I realized this is known ahead of time as
	   simply info.files_.size().

	E. Added write_meta_name_index() for feature META.

	F. In write_full_index(), added call to write_meta_name_index()
	   for feature META.

	G. In write_word_index(), redid code for writing the word index
	   to use low ASCII characters as separators for feature META.

	H. In usage(), added description for new -r option for feature
	   CLR.

* index.h

	1. Added this new file for feature META.

* Makefile

	1. Added new dependencies for feature META.

* man/Makefile

	1. Added missing "pdf" target.

* man/man1/extract.1

	1. Added description of new -r option for feature CRL.

	2. Added description of feature ISI.

* man/man1/index.1

	1. Added description of META element indexing for feature META.

	2. Improved references, added URL.

	3. Added description of new -r option for feature CRL.

	4. Added description of feature ISI.

* man/man1/search.1

	1. Added description and examples of META element searching for
	   feature META.

* man/man4/swish++

	1. Modified description of index file format for feature META.

* meta_map.h

	1. Added this file for feature META.

* search.c

	1. Added #include "html.h" for feature META.

	2. Added definition of meta_names for feature META

	3. In main(), added dump_meta_names and new -M command line
	   option for feature META.

	4. In main(), used new enum for calls to
	   word_index::set_index_file().

	5. Replaced a lot of 'for' loops iterating over an entire
	   sequence with a new FOR_EACH or TRANSFORM_EACH macro.  (I
	   got tired of typing.)

	6. In dump_word_window(), added missing description for 'match'
	   parameter.

	7. In parse_query(), performed following substitution:

		s/parse_primary/parse_meta/

	   for feature META.

	8. Added parse_meta() function for feature META.

	9. In parse_primary(), added meta_index parameter for feature
	   META.

	A. In parse_primary(), added code to add words to result only
	   if the meta-name matches for feature META.

	B. In usage(), added description of new -M option for feature
	   META.

* stop_words.c

	1. In stop_wrod_set::stop_word_set(), redid main 'while' loop
	   and added 'if's for bug fix ILW.

* token.c

	1. Reworked token::hold() to accomodate more than one
	   put_back() in a row for feature META since parse_meta()
	   requires two look-ahead tokens.

	2. Added new case for the '=' token for feature META.

* token.h

	1. Added equal_token for feature META.

	2. Corresponding change to token.c item #1.

* util.c

	1. In is_ok_word(), performed following substitution:

		s/int const len = ::strlen( word )/int const len = c - word/

	   for feature PPT.

	2. In to_lower(), replaced call to transform() with simple
	   while loop for feature PPT.

* util.h

	1. Added new FOR_EACH and TRANSFORM_EACH macros since I got
	   tired of typing.

	2. Added new is_html_ext() function for feature PPT.

	3. Redid is_word_char() function for feature WCO.

	4. Added is_word_begin_char() and is_word_end_char() functions
	   for feature WCO.

* version.h

	1. Updated version to "1.4".

* word_index.h

	1. Added enum for word indices to word_index class.

* word_info.h

	1. Added #include "html.h" for feature META.

	2. Added word_info::file::meta_index_ data member and modified
	   constructor accordingly for feature META.


*******************************************************************************
1.3.2
*******************************************************************************

BUG FIXES
---------

* The ignore stop words feature (feature ISW) added in version 1.2 was slightly
  broken in 1.2.1; it was "fixed" in 1.2.2 (bug fix ISW2), but not quite in
  that if left hand side of a query was ignored, thw whole thing was.

  (This bug fix will be known as bug fix ISW3.)

* In 'index', the check for whether filename extensions were supplied was too
  early in the code so the -S option didn't work.

  (This bug fix will be known as bug fix CFE.)


CHANGES, file-by-file
---------------------

* config/man.mk

	1. Made "make dist" make the manual pages in PDF format in
	   addition to text format.

* index.c

	1. Relocated code to check whether filename extensions were
	   supplied for bug fix CFE.

	2. In main(), used an ostream_iterator() to dump stop words.

	3. In do_file(), split tests for stop-words into two separate
	   'if' statements so to_lower() isn't called unless absolutely
	   necessary.

* search.c

	1. In parse_query(), redid ignore-handling code for bug fix
	   ISW3.

	2. In main(), used an ostream_iterator() to dump stop words.

* stop_words.c

	1. Added stop-words: billions, eighteen, fifteen, fourteen,
	   millions, ninteen, second, seconds, seventeen, sixteen,
	   tens, third, thirteen, trillions.

* util.c

	1. In is_ok_word() on line 192, changed floating point
	   calculation to integer by multiplying LHS by 100 to increase
	   performance.

	2. On the same line, performed the followingg substitution:

		s/>=/>/

	   so the code matches the documentation that says, "...
	   contains more than a third capital letters ..."

* version.h

	1. Updated version to "1.3.2".

* www_example/search.cgi

	1. Removed extraneous 'o' (optimize) options from regular
	   expressions.

* www_example/WWW.pm

	1. Added GNU Public Licensce notice at top.

	2. In trim_whitespace(), used map() rather than a for loop.

	3. Removed extraneous 'o' (optimize) options from regular
	   expressions.


*******************************************************************************
1.3.1
*******************************************************************************

BUG FIXES
---------

* Unbeknownst to me, I introduced a bug in 1.2.2 that broke wildcard searches.
  (Doh!)  This has been fixed.

  (This bug will will be known as bug fix WCF.)


CHANGES, file-by-file
---------------------

* man/man1/search.1

	1. Make it explicitly clear that wildcards are not permitted
	   for the -d and -w options.

* token.c

	1. Moved the line:

		::strcpy( t.lower_buf_, to_lower( t.buf_ ) );

	   before:

		if ( t.type_ )
			return in;

	   for bug fix WCF.

* version.h

	1. Updated version to "1.3.1".


*******************************************************************************
1.3
*******************************************************************************

NEW FEATURES
------------

* In "search," a "window" of words can be dumped around the query words.

  (This feature will be known as feature DWW.)

* In "search," the -d option to dump the index for a word now dumps all the
  query words instead of a single word.  Additionally, a stop-word used to
  print "stop-word"; now it prints "# ignored: " followed by the word.

  (This feature will be known as feature DQW.)

* In "search," the -d option to dump the index for a word now prints the
  comment:

	# not found: word

  if 'word' is not found in the index.

  (This feature will be known as feature NFW.)


CHANGES, file-by-file
---------------------

* directory.c

	1. Changed order of #include's putting direct.h last so that it
	   compiles OK under FreeBSD 2.2.7.

* man/man1/search.1

	1. Corresponding changes for features DWW, DQW, and NFW.

* search.c

	1. In main(), performed the following substitutions:

		s/char const *dump_word/bool dump_word_index/
		s/dump_word/dump_word_index/
		s/d:Di:m:s:SV/dDi:m:s:SV/

	   for feature DQW.

	2. In main(), performed the following substitution:

		s/dDi:m:s:SV/dDi:m:s:SVw:/

	3. In main(), performed the following substitution:

		s/dump_entire_index || dump_stop_words || dump_word
		 /dump_entire_index || dump_stop_words/

	   for feature DQW since the -d option no longer takes an
	   argument.

	4. In main(), added code to handle new -w option for feature
	   DWW.

	5. In main(), added 'while' loop to code to dump multiple words
	   for feature DQW.

	6. In dump_single_word(), performed following substitution:

		s/"stop-word"/"# ignored: " << word/

	   for feature DQW.

	7. In dump_single_word(), added printing of new "not found"
	   comment key for feature NFW.

	8. Added function dump_word_window() for feature DWW.

	9. In usage(), performed following substitution:

		s/-d word/-d/

	   for feature DQW.

	A. In usage(), added text for new -w option for feature DWW.

* version.h

	1. Updated version to "1.3".


*******************************************************************************
1.2.2
*******************************************************************************

NEW FEATURES
------------

* A heuristic was added not to index a word if it contains more than a
  threshold number of consecutive punctuation characters.

  (This feature will be known as feature MCP.)

* Files can now be indexed by exclusion of filename extensions rather than by
  inclusion via a new -E command-line option.

  (This feature will be known as feature EFE.)


BUG FIXES
---------

* The ignore stop words feature (feature ISW) added in version 1.2 was slightly
  broken in 1.2.1 in that the list of ignored words was no longer reported.

  (This bug fix will be known as bug fix ISW2.)


CHANGES, file-by-file
---------------------

* config.h

	1. Performed following substitution:

		s/Word_Hex_Min_Size/Word_Hex_Max_Size/

	   The original name was inconsistent with the other parameters.

	2. Added "Word_Max_Consec_Puncts" for feature MCP.

* config.mk

	1. Performed following substitution:

		s/install.sh/install-sh/

	   so some versions of make don't get confused with the .sh
	   suffix and try to build it.

* extproc.c

	1. Added definitions for WEXITSTATUS and WIFEXITED if not
	   defined on a particular system.

* extract.c

	1. Corresponding change for config.h item 1.

	2. Performed following variable substitution:

		s/extensions/include_extensions/

	   for feature EFE.

	3. Added variable exclude_extensions for feature EFE.

	4. In main(), added code to handle new -E option for feature
	   EFE.

	5. In do_file(), added check against new exclude_extensions
	   variable for feature EFE.

	6. In usage(), added text for new -E option for feature EFE.

* index.c

	1. Performed following variable substitution:

		s/extensions/include_extensions/

	   for feature EFE.

	2. Added variable exclude_extensions for feature EFE.

	3. In main(), added code to handle new -E option for feature
	   EFE.

	4. In do_file(), added check against new exclude_extensions
	   variable for feature EFE.

	5. In usage(), added text for new -E option for feature EFE.

* man/man1/extract.c

	1. Changed description for feature EFE.

* man/man1/index.c

	1. Changed description for feature MCP.

	2. Changed description for feature EFE.

* search.c

	1. Deleted is_stop_word() function for bug fix ISW2.

	2. In dump_single_word(), added code formerly in is_stop_word()
	   here for bug fix ISW2.

	3. In parse_primary(), added code formerly in is_stop_word()
	   here for bug fix ISW2.

* token.c

	1. Changed token so that it is not converted to all lower-case
	   for bug fix ISW2.  Previously, acronyms were not recognized
	   in lower case and keywords ("and," "or," and "not") were not
	   recognized in upper case.

	2. Added code to make a copy of the token string in all lower
	   case.  This is still needed for stop-word determination.

* token.h

	1. Added second buffer to hold all-lower-case version of token
	   text for bug fix ISW2.

* util.c

	1. In is_ok_word(), added code for feature MCP.

* version.h

	1. Updated version to "1.2.2".


*******************************************************************************
1.2.1
*******************************************************************************

NEW FEATURES
------------

* In "search," the original -d option that used to dump the entire index now
  dumps the index entry for a single word.  Correspondingly, a new -D option
  now does what -d used to do.

  (This feature will be known as feature DSW.)

* In "search," the dump of the index entries now includes the rank.

  (This feature will be known as feature DIR.)


BUG FIXES
---------

* Numeric entity references were not converted to their ASCII equivalents.  (I
  don't know how I missed this.)

  (This bug fix will be known as bug fix NER.)

* A search query that contained only stop-words returned all files (up to the
  specified limit or default maximum).

  (This bug fix will be known as bug fix RSW.)


CHANGES, file-by-file
---------------------

* config/config.mk

	1. Added comment at top to remind people that they must do a
	   "make distclean" before recompiling if they change any
	   definitions.

	2. Performed the following substitution:

		s!/usr/ucb/install!$(ROOT)/install.sh!

* entities.c

	1. Added num_entities[] for bug fix NER.

	2. Performed following substitutions:

		s/entity_map/char_entity_map/
		s/entity/char_entity/
		s/entity_name/name/

	   so as to distinguish them from the newly-added num_entities[].

	3. Added:

		"ETH", 'D', "eth", 'd',

	   to char_entity_table[].

* entities.h

	1. Added:

		extern char const num_entities[ 256 ];

	   for bug fix NER.

	2. Corresponding changes for entities.c item 2.

* html.c

	1. Corresponding changes for entities.c item 2.

	2. Made use of new num_entities[] for bug fix NER.

* index.c

	1. On line 400, performed the following substitution:

		s/*lower_word/*const lower_word/

	   It should have been that originally.

* install.sh

	1. Created this shell script to use for installs instead of
	   having to rely on the OS having a "Berkeley-esque" install
	   command.

* man/man1/search.1

	1. Changed description for feature DSW.

* search.c

	1. Added new find_result_type typedef.

	2. Added a new dump_single_word() function for feature DSW.

	3. Added "bool &ignore" parameter to parse_query() and
	   parse_primary() functions for bug fix RSW.

	4. Factored out code that determines whether a word was indexed
	   or not into a new function is_stop_word().

	5. In main(), added code for feature DSW.

	6. In main(), added code for feature DIR.

	7. In parse_query(), added code to ignore stop-words properly
	   for bug fix RSW.

	8. On line 423, performed following substitution:

		s/iterator/const_iterator/

	   It should have been that originally for "const correctness."

	9. In parse_primary(), made use of new is_stop_word() function
	   corresponding to item 4.

	A. In parse_primary(), added code under "not" case to check to
	   see whether the primary should be ignored for bug fix RSW.

	B. In usage(), added text to usage message for feature DSW.

* util.c

	1. On line 65, performed following substitution:

		s/STATIC_CAST/REINTERPRET_CAST/

	   It should have been that originally.

* version.h

	1. Updated version to "1.2.1".

* www_example/search.cgi

	1. Added "&'" characters to those that are not stripped from
	   the query.

	2. Added:

		next if /^#/;

	   so as to ignore comments we know nothing about that future
	   releases of SWISH++ may emit.

*******************************************************************************
1.2
*******************************************************************************

NEW FEATURES
------------

* SWISH++ now stores the list of stop-words in the generated index file so they
  can be ignored on searches later.  Previosuly, using a stop-word in a query
  would always yield 0 results since the stop-word isn't in the index.  After
  thinking about it, this is just plain stupid.

  (This feature will be referred to as feature ISW.)

* You can now specify the number of files to reserve space for on the command
  line for "index" overriding the default.

  (This feature will be referred to as feature ICF.)

* You can now specify the number of lines to look into a file for HTML <TITLE>
  tags on the command line for "index" overriding the default.

  (This feature will be referred to as feature ICt.)

* Added default values to usage messages.

  (This feature will be referred to as feature UDV.)


BUG FIXES
---------

* The detection of malformed queries was completely broken.  I don't see how
  this went undetected for this long.

  (This bug fix will be referred to as bug fix DMQ.)

* In the example WWW.pm Perl library, not all the "Unix-unfriendly" characters
  were stripped from filenames upon upload.

  (This bug fix will be referred to as bug fix UUC.)


CHANGES, file-by-file
---------------------

* config.h

	1. Performed following substitutions:

		s/Title_Lines/Title_Lines_Default/

	   for feature ICt.

	2. Performed following substitutions:

		s/Files_Default/Files_Reserve_Default/

	   for feature ICF.

	3. Added: Index_Filename_Default

* extract.c

	1. Added code to usage() for feature UDV.

	2. In do_file(), added code to check whether a word is a
	   stop-word explicitly for feature ISW.  This corresponds to
	   the change for util.c item 2.

* file_index.c

	1. Moved index file header parsing code into a new function
	   get_index_info() for feature ISW.

	2. Added: #include "util.h"

* file_info.c

	1. Added:

		extern int num_files_reserve;

	   for feature ICF.

	2. Performed following substitution:

		s/Files_Default/num_files_reserve/

	   for feature ICF.

* html.c

	1. Added:

		extern int num_title_lines;

	   for feature ICt.

	2. Performed following substitution:

		s/Title_Lines/num_title_lines/

* index.c

	1. Added write_stop_word_index() function for feature ISW.

	2. Added:

		int num_files_reserve = Files_Reserve_Default;

	   for feature ICF.

	3. Added:

		int num_title_lines = Title_Lines_Default;

	   for feature ICF.

	4. Performed following substitutions:

		s/total_words/num_total_words/
		s/unique_words/num_unique_words/

	5. Performed following substitution:

		s/"the.index"/Index_Filename_Default/

	6. Added 'F' option to command line parsing code and usage
	   message for feature ICF.

	7. Added 't' option to command line parsing code and usage
	   message for feature ICt.

	8. In do_file(), added code to check whether a word is a
	   stop-word explicitly for feature ISW.  This corresponds to
	   the change for util.c item 2.

	9. In merge_indices(), removed extra_stop_words and am now
	   using stop_words since they all have to be written to the
	   index file together.  This was done for feature ISW.

	A. In merge_indices(), added code to write additional header
	   information for the stop-words.

	B. In rank_full_index(), now add computed stop-words to global
	   set so they can all be written to the index file together.
	   This was done for feature ISW.

	C. In write_full_index(), added code to write additional header
	   information for the stop-words.

	D. Added code to usage() for feature UDV.

* Makefile

	1. Added new dependencies for feature ISW.

* man/man1/index.1

	1. Added description of new option for feature ICF.

	2. Added description of new option for feature ICt.

* man/man1/search.1

	1. Added description of comments "search" outputs for feature ISW.

	2. Added description of new -S option for feature ISW.

* man/man4/swish++.4

	1. Added description of new index file format for feature ISW.

* search.c

	1. Added definitions:

		word_index	stop_words;
		string_set	stop_words_found;

	   for feature ISW.

	2. Performed following substitution:

		s/"the.index"/Index_Filename_Default/

	3. In main(), added code for new -S option to dump the stop-
	   words from an index file.  This was done for feature ISW.

	4. In main(), added test of EOF for the query_stream to ensure
	   the entire query is parsed successfully for bug fix DMQ.

	5. In main(), added code to output stop-words ignored in the
	   query for feature ISW.

	6. In parse_query(), changed:

		if ( !parse_primary( query, temp1 ) )
			break;

	   to:

		if ( !parse_primary( query, temp1 ) )
			return false;

	   for bug fix DMQ.

	7. In parse_optional_relop(), changed code in default case by
	   adding a check for a ')' token for bug fix DMQ.

	8. In parse_primary(), added code to search stop-words for a
	   word in a query and ignore it for feature ISW.

	9. In parse_primary() for the lparen_token case, performed
	   following substitution:

		s/lparen_token/rparen_token/

	   for bug fix DMQ.

	A. Added code to usage() for feature UDV.

* stop_words.c

	1. Added "let's".

	2. In constructor, changed use of "new" to "strdup".

	3. Change corresponding to util.c item 3.

* util.c

	1. Added function get_index_info() to extract number of offset
	   information of an index file for feature ISW.

	2. In is_ok_word(), removed check for stop-words for feature
	   ISW.  The calling code must now check for stop-words itself.
	   This was necessary because "search" checks for stop-words
	   differently than either "index" or "extract" does.

	3. Added function:

		char const *to_lower( char const *s )

	   for feature ISW.

	4. Added "missing":
	
		#include <cstring>

* util.h

	1. Change corresponding to util.c item 1.

	2. Change corresponding to util.c item 2.

* version.h

	1. Updated version to "1.2".

* word_index.c

	1. Added int parameter since a word_index is now used for both
	   the regular word index (0) and the new stop-word index (1).

	2. Moved index file header parsing code into a new function
	   get_index_info() in util.c for feature ISW.

* word_index.h

	1. Added int parameter to both constructor and set_file_index()
	   since a word_index is now used for both the regular word
	   index (0) and the new stop-word index (1).

* www_example/WWW.pm

	1. In parse_multipart(), added $'()*/\ characters to those
	   stripped from filenames for bug fix UUC.

* www_example/search.cgi

	1. Added code to handle ignored words returned by "search" for
	   feature ISW.


*******************************************************************************
1.1
*******************************************************************************

NEW FEATURES
------------

* SWISH++ is now out of beta test.  (Nobody has submitted a bug report in a
  while.)

* From "index," you can now dump the built-in default set of stop-words to a
  file to edit and then use to index.

  (This feature will be referred to as feature ESW.)

* Some example Perl 5 code for interfacing SWISH++ to a web-based search form
  has been provided.

  (This feature will be referred to as feature W3E.)


BUG FIXES
---------

* The definition of the THIS macro in fake_ansi.h was just wrong and there is
  no way to fix it; so it and all references to it have been deleted.

  (This bug fix will be referred to as bug fix XTHIS.)


CHANGES, file-by-file
---------------------

* extract.c

	1. In main(), added code to process the new command-line
	   options of -s for feature ESW.

	2. In usage(), augmented message for feature ESW.


* fake_ansi.h

	1. Deleted definition of THIS macro for bug fix XTHIS.

* file_list.c

	1. Deleted references to THIS macro formerly defined in
	   fake_ansi.h and defined a local version instead for bug fix
	   XTHIS.

* index.c

	1. In main(), added code to process the two new command-line
	   options of -s and -S for feature ESW.

	2. In usage(), augmented message for feature ESW.

* Makefile

	1. Added specific build rules for stop_words.c for feature ESW.

	2. Added dependency on stop_words.h to index.c for feature ESW.

	3. Cleanedup rules for "clean," "dist," and "distclean."

* man/Makefile

	1. Added provision to build man3 subdirectory for feature W3E.

* man/man1/index.1

	1. Added descriptions of new command-line options for feature
	   ESW.

	2. Added missing description of additional processing done for
	   HTML files.

* man/man3/Makefile
* man/man3/www.3

	1. New files for W3E.

* stop_words.c

	1. Added global pointer to set-word set for feature ESW.

	2. Added constructor for stop_word_set to initialize the set of
	   stop-words either from the built-in default set or from a
	   file.

* stop_words.h

	1. New file for ESW.

* string_set.h

	1. Changed definition of string_set to be derived from rather
	   than contain a std::set for feature ESW.

* util.c

	1. Moved stop_word_set definitions to stop_words.c for feature
	   ESW.

* version.h

	1. Updated version to "1.1".

* www_example/WWW.pm

	1. Added form data parsing library in Perl 5 for feature W3E.

* www_example/search.cgi
* www_example/search.html

	1. Added example code for feature W3E.


*******************************************************************************
1.1b3
*******************************************************************************

BUG FIXES
---------

* Fixed a bug where unbalanced quotes inside comments would cause a core dump.
  After rereading the HTML 4.0 specification regarding comments, quotes are
  not to be balanced or otherwise treated specially inside comments.

  (This bug fix will be referred to as bug fix CQU.)


CHANGES, file-by-file
---------------------

* ext_proc.c

	1. In process_file(), made pid_error static as it should have
	   been all along.

* html.c

	1. Added inclusion of util.h to access to_upper() function for
	   bug fix CQU.

	2. Added following functions for bug fix CQU:

		is_html_comment()
		skip_html_comment()
		tag_cmp()

	3. In grep_title(), changed for loop to while loop to have more
	   precise control over when the iterator is advanced for bug
	   fix CQU.

	4. In grep_title(), now check to see if an HTML tag is a
	   comment.

	5. In grep_title(), replaced code to check title tag by a call
	   to the new tag_cmp() function.

	6. In skip_html_tag(), added calls to is_html_comment() and
	   skip_html_comment() since comments must be skipped
	   differently.  (For bug fix CQU.)

* Makefile

	1. Added util.h to html.o dependencies for bug fix CQU.

	2. Added "the.index" to the $(RM) line for the clean target.

	3. Deleted the second erroneous dist target.

* itoa.c

	1. Deleted this extraneous file.

* util.c

	1. In ltoa(), made Buf_Size and Num_Buffers static as they
	   should have been all along.

* util.h

	1. Added to_upper() inline function for bug fix CQU.

* version.h

	1. Updated version to "1.1b3".


*******************************************************************************
1.1b2
*******************************************************************************

NEW FEATURES
------------

* For HTML files having titles longer than Title_Max_Size in length, the last
  three characters are replaces by an ellipsis ("...").

  (This feature will be referred to as feature ELL.)


BUG FIXES
---------

* Fixed a core dump in grep_title() for HTML files having titles that exceed
  Title_Max_Size in length.

  (This bug fix will be referred to as bug fix GT1.)


CHANGES, file-by-file
---------------------

* file_vector.c

	1. Performed following substitution:

		s/sysent.h/unistd.h/

	   for portability.

* html.c

	1. Added code for feature ELL.

	2. Fixed grep_title() for bug fix GT1.

* version.h

	1. Updated version to "1.1b2".


*******************************************************************************
1.1b1
*******************************************************************************

NEW FEATURES
------------

* The search command has a new -s option to specify the number of initial
  results to skip.  Used in conjuntion with -m, results can be returned in
  "pages."

  (This feature will be referred to as feature SSR.)


CHANGES, file-by-file
---------------------

* search.c

	1. Added comment for sort_by_rank struct.  This was an omission.

	2. Added -s option in main() for feature SSR.

	3. Added skip_results variable in main() for feature SSR.

	4. Added -s option in usage() for feature SSR.

	5. Removed extra semicolon in usage() that cause only part of
	   the usage message to print.

* version.h

	1. Updated version to "1.1b1".

* man/man1/search.1

	1. Added description of -s option for feature SSR.
