Product FastTextSearch/IB 1.4 from 18 Mar 2001
Description Full text search engine for Interbase.
Author Copyright (c) 1998-2001, SoftLab MIL-TEC Ltd
Web:   http://www.softcomplete.com
Email: support@softcomplete.com
Status Shareware $50. Online registration is available.
Limitation not index each even word (2,4,...)


This is a legal Agreement between you (either an single individual or an entity, herein referred to as Licensee) and SoftLab (herein after referred to as Licensor). By installing this software (herein after referred to as Product), you indicate your acceptance of this Agreement.

SOFTLAB EVALUATION LICENSE AGREEMENT

  1.GRANT OF LICENSE.

    1.1 Licensor grants you a non-exclusive royalty-free license to make as many copies of the Product accompanying this agreement as you want, solely to test and to evaluate the Product before you make the decision to purchase the Product.

    1.2 You shall not use, copy, rent, lease, sell, modify, decompile, disassemble, otherwise reverse engineer, or transfer the Product except as provided in this Agreement. Any such unauthorized use shall result in immediate and automatic termination of this Agreement.

    1.3 You may transfer and distribute the Product in its unmodified form via electronic means to another persons so long as their use of the Product will be strictly for the purpose of evaluation.

    1.4 You are specifically prohibited from charging, or requesting donations for copies of the Product, distributing the Product with any other products without Licensors prior written permission.

  2. WARRANTY AND LIMITATION OF LIABILITY.

The Licensor hereby disclaims all warranties of any kind relating to the Product, whether express or implied, including without limitation any implied warranties of merchantability or fitness for a particular purpose. The Licensee must assume the entire risk of using the Product.

    FastTextSearch/IB is unique solution for the searching textual information in the Interbase database. Allow indexing an unlimited amount string and textual memo fields. Implements some idea from text retrivial systems - stop-word list, word stemming. Server side quick searching (without scan of all record). Possibility of use SoundEx for query. Independence from client software.

Setup complete ... or how to install FastTextSearch/IB.

Install of FastTextSearch is very simple. You must place this files in "interbase\lib" folder.

FileName
Short description
xFTS.dll FTS core
xFTS.ini FTS settings
xLibUDF.dll string function library
xFTS_skip.lst stop-word list
xFTS_tkn.lst list of unreducible words
stem_eng.dll english stemming engine
xFTS_RTF.dll Blob RTF filter
xFTS.lic license file. only in registered version

Step by step ... or how to add FastTextSearch/IB for existing database.

Open file "script\FTSCore.sql" in notepad. Change database name, user name and password to required. Execute script. FTS core will be add to your database.

Customer TableBy example, we add FTS to EMPLOYEE.GDB (from Interbase examples). For add FTS capabilities in some table we must select fields for indexing and define 3 triggers. We will be working with CUSTOMERS table. First step - add field for FTS object identifier.

alter table CUSTOMER add FTS_ID INTEGER;
create index CUSTOMER_IDX_FTS on CUSTOMER(FTS_ID);

 

Second step - define next triggers:

create trigger t_bi_customer_fts for CUSTOMER active before insert position 0 as
declare variable Parser integer;
declare variable i integer;
begin
  if (exists(select * from TS$OPT where Enable>0)) then begin
    Parser=Parser_Create();
    i=Parser_Add(Parser,new.CUSTOMER);
    i=Parser_Add(Parser,new.CONTACT_FIRST);
    i=Parser_Add(Parser,new.CONTACT_LAST);
    i=Parser_Add(Parser,new.ADDRESS_LINE1);
    i=Parser_Add(Parser,new.ADDRESS_LINE2);
    i=Parser_Add(Parser,new.CITY);
    i=Parser_Add(Parser,new.COUNTRY);
    execute procedure TS$UPDATE(new.FTS_ID, 'CUSTOMER', :Parser)
      RETURNING_VALUES new.FTS_ID;
    i=Parser_Free(Parser);
  end
end

create trigger t_bu_customer_fts for CUSTOMER active before update position 0 as
/* same as trigger t_bi_customer_fts */
declare variable Parser integer;
declare variable i integer;
begin
  if (exists(select * from TS$OPT where Enable>0)) then begin
    Parser=Parser_Create();
    i=Parser_Add(Parser,new.CUSTOMER);
    i=Parser_Add(Parser,new.CONTACT_FIRST);
    i=Parser_Add(Parser,new.CONTACT_LAST);
    i=Parser_Add(Parser,new.ADDRESS_LINE1);
    i=Parser_Add(Parser,new.ADDRESS_LINE2);
    i=Parser_Add(Parser,new.CITY);
    i=Parser_Add(Parser,new.COUNTRY);
    execute procedure TS$UPDATE(old.FTS_ID, 'CUSTOMER', :Parser)
      RETURNING_VALUES new.FTS_ID;
    i=Parser_Free(Parser);
  end
end

create trigger t_bd_customer_fts for CUSTOMER active before delete position 0 as
begin
  delete from TS$OBJ where ID = old.FTS_ID;
end

Third step (optional) - build FTS index, if table contains data. For this necessary to execute trigger "Before Update" for each record in table.

update CUSTOMER set CUSTOMER=CUSTOMER;
(click here for see performance this query)

All done ! Test FTS index:

select *
from
TS$Select_OR("customer","tech corp",0) TS
left join customer c on c.fts_id = ts.obj_id

This query select all customer, which contain words "tech" or "corp":

Query result

Attention: FastTextSearch/IB break all string and text memo to words, and built index, which contain this word. Searching for index is realized on coinciding beginning of word with sample. Hereupon, records, contain word "incorporated", will not be found at searching for word "corp".

Second test:

select *
from TS$Select_And("customer","hong kong",0) TS
left join customer c on c.fts_id = ts.obj_id

This query select all customer, which contain words "hong" and "kong":

Query result

SoundEX test:

select *
from
TS$Select_OR("customer","jon",1) TS
left join customer c on c.fts_id = ts.obj_id

Select all customer, which contain words, with SoundEX equal to SoundEX("jon") :

Query result

Quick and fast ... or analysis of performance.

Query
Performance

"OR" Query with 2 word

select *
from
TS$Select_OR("customer", "tech corp", 0) TS
left join customer c on c.fts_id = ts.obj_id

graph 1

"AND" Query with 2 word

select *
from TS$Select_And("customer", "hong kong", 0) TS
left join customer c on c.fts_id = ts.obj_id

graph 2

"OR" SoundEX Query with 1 word

select *
from
TS$Select_OR("customer", "jon", 1) TS
left join customer c on c.fts_id = ts.obj_id

graph 3

Brought diagrams show that only indexed access to data is used.

Some tips for increasing performance:

You don't speak english ? ... or international settings.

This is the sample xFTS.ini file for work with russian and english language (file xfts.rus):

Russian ini file

Parameter
Description
MinWordLen Disable indexing for short words (by example "at", "is", "or", etc.)
WordSet Define set of chars, from which consist words. All rest symbols will be perceive as separators of words
NationSet
Translate
Is kept table for transformation of word in english equivalent (char by char), on which, subsequently, is calculate function SoundEX
[Filters] Additional filters, which apply for any indexing word. In next section we show, how to write your own filter.

Find possible all !... more powers.

Indexing blob fields.

Use function Parser_AddBlob. By example, in table JOB from EMPLOYEE.GDB we built FTS index on fields JOB_COUNTRY varchar(15), JOB_TITLE varchar(25) and JOB_REQUIREMENT BLOB:

create trigger t_bi_job_fts for JOB active before insert position 0 as
declare variable Parser integer;
declare variable i integer;
begin
  if (exists(select * from TS$OPT where Enable>0)) then begin
    Parser=Parser_Create();
    i=Parser_Add(Parser,new.JOB_COUNTRY);
    i=Parser_Add(Parser,new.JOB_TITLE);
    i=Parser_AddBlob(Parser,new.JOB_REQUIREMENT);
    execute procedure TS$UPDATE(new.FTS_ID,'JOB',:Parser)
      RETURNING_VALUES new.FTS_ID;
    i=Parser_Free(Parser);
  end
end

Indexing blob fields, contains text in specific format.

If BLOB field contain RTF text then use function Parser_AddBlobRTF from xFTS_RTF.dll. Source code for this dll found in folder "RtfSrc". In future we planned add support for HTML.

Indexing unreducible words.

If it is necessary in the forced order to put in an index some words (irrespective of their length and outcome of application of filters), they should be enumerated in the file xFTS_tkn.lst. For example, if we want indexing field STATE_PROVINCE in CUSTOMER table we should in the file xFTS_tkn.lst add words "CA", "TX", "MA", "ON", "HI". Otherwise they will not hit in an index as their length less minimum valid word length.
Note: unreducible words can contain non-word char. By example: "B+W", "ABC!", "http://".

Write your own filter.

You can write filter in any programming language, which allows to create Win32 dll module. You is necessary export one function:

procedure ProcessWord(WordBuf: PChar); stdcall;

When calling this functions, word will be place in buffer with size 64 bytes. You are to process word, and return result in this the most buffer. If word is not index then simple assign WordBuf^:=#0.


Performance results were obtained with the help QuickDesk.