Saturday, July 4, 2015

Full Text Search

Full-Text Search in SQL Server lets users and applications run full-text queries against character-based data in SQL Server tables. Before you can run full-text queries on a table, the database administrator must create a full-text index on the table. The full-text index includes one or more character-based columns in the table. These columns can have any of the following data types: charvarcharncharnvarchartextntextimagexml, or varbinary(max) and FILESTREAM.

Full-text queries can include simple words and phrases or multiple forms of a word or phrase. A full-text query returns any documents that contain at least one match (also known as a hit). A match occurs when a target document contains all the terms specified in the full-text query, and meets any other search conditions, such as the distance between the matching terms.

 We can create a full-text index on a table or indexed view in a database. Only one full-text index is allowed per table or indexed view. The index can contain up to 1024 columns. To create an Index, follow the steps:
  1. Create a Full-Text Catalog
  2. Create a Full-Text Index
  3. Populate the Index
Full-text search is applicable to a wide range of business scenarios such as e-businesses—searching for items on a web site; law firms—searching for case histories in a legal-data repository; or human resources departments—matching job descriptions with stored resumes. Full-text queries are not case-sensitive. For example, searching for "Aluminum" or "aluminum" returns the same results. Full-text queries use a small set of Transact-SQL predicates (CONTAINS and FREETEXT) and functions (CONTAINSTABLE and FREETEXTTABLE). 

FREETEXT ( ) is predicate used to search columns containing character-based data types. It will not match the exact word, but the meaning of the words in the search condition. When FREETEXT is used, the full-text query engine internally performs the following actions on thefreetext_string, assigns each term a weight, and then finds the matches.
  • Separates the string into individual words based on word boundaries (word-breaking).
  • Generates inflectional forms of the words (stemming).
  • Identifies a list of expansions or replacements for the terms based on matches in the thesaurus.
CONTAINS ( ) is similar to the Freetext but with the difference that it takes one keyword to match with the records, and if we want to combine other words as well in the search then we need to provide the “and” or “or” in search else it will throw an error.

SELECT candidate_name,SSN
FROM candidates
WHERE CONTAINS(candidate_resume,"SQL Server") AND candidate_division = 'DBA';

SELECT product_id
FROM products
WHERE CONTAINS(product_description, "Snap Happy 100EZ"
    OR FORMSOF(THESAURUS,'Snap Happy')
    OR '100EZ')
AND product_cost < 200 ;

The FORMSOF term performs matches by using other linguistic forms of the word. Two values are available:
  • INFLECTIONAL chooses alternative inflection forms for the match words. If the word is a verb, alternative tenses are used. If the word is a noun, the singular, plural, and possessive forms are used to detect matches.
  • THESAURUS chooses words that have the same meaning, taken from a thesaurus.

Full-text search architecture consists of the following processes:
  • The SQL Server process (sqlservr.exe).
  • The filter daemon host process (fdhost.exe).
 For security reasons, filters are loaded by separate processes called the filter daemon hosts. The fdhost.exe processes are created by an FDHOST launcher service (MSSQLFDLauncher). The FDHOST launcher service must be running for full-text indexing and full-text querying to work.

No comments:

Post a Comment