How to use Stopwords and Stoplist to improve SQL Server Full-Text Search (FTS)

Image for post
Image for post

The current article is about Stopwords and Stoplist usage. The aim is to make Full-Text Search more efficient in terms of storage and performance. Additionally, the readers of this article will get a walkthrough of implementing Full-Text Search with Stopwords and Stoplist.

The article also highlights the importance of Stopwords and Stoplist(s) to keep the Full-Text Search precise and avoid noise words.

Understanding Stopwords and Stoplist

First, let’s clarify the essence of Stopwords and Stoplist. Then we’ll proceed to use them to improve Full-Text Search.

A Stoplist

A stoplist, as the name implies, is a list of stopwords. When associated with Full-Text Search, the Stoplist can filter out meaningless words or terms, thus improving search results.

A Stopword

A stopword is a word that has a minor role in Full-Text Search, despite being important grammatically. Therefore, a stopword is not essential from the Full-Text Search perspective.

According to Microsoft documentation, a stopword can be a word with some meaning in a specific language, or it may be some token with no linguistic value. In both cases, it is useless for the Full-Text Search.

Examples of Stopwords

The following are stopwords in the (British/American) English language:

  • as
  • at
  • be
  • because
  • he
  • did

Since the above words do not contribute to Full-Text Search, they are stopwords, despite their significance in understanding a complete sentence.

System Stoplist(s)

Each supported language, including British English, has a default or system stoplist that can be modified according to particular requirements.

Custom Stoplist(s)

Users can create their own/custom stoplist, using the system stoplist, or from scratch. It is helpful to prevent noise words from becoming a part of the Full-Text Search results.

Languages and Stoplists

Full-Text search supports many different languages. Each of the supported languages has at least one system stoplist by default. However, there can be multiple stoplists for any language, including both system and custom stoplists.

List of supported languages

We can quickly check the list of all supported languages used with Full-Text Search. For this, run the following T-SQL script:

-- List of Full-Text Search supported languages
SELECT ftl.lcid,ftl.name FROM sys.fulltext_languages ftl

The results are as follows:

Image for post
Image for post

The query on the SQL Server 2016 instance returns a total of 53 languages.

Using Stopwords and Stoplist with Full-Text Search

Now, we’ll get some hands-on experience of creating a stoplist of stopwords defined by the user. Then, we’ll use it for the Full-Text Search as a walkthrough. It will be just like a real-time scenario.

Prerequisites

First of all, you need to ensure the following requirements to implement the walkthrough correctly:

  • Having a basic understanding of Full-Text Search
  • The ability to implement the Full-Text Search in SQL Server
  • The presence of the Full-Text Search option enabled/installed on the SQL instance you plan to use

To ensure those prerequisites, go through the below articles if you are already familiar with the T-SQL scripting:

Check Full-Text Search status

Run the following query to check if you have the Full-Text Search installed into your SQL instance:

-- Is Full-Text Search installed then 1 or 0
SELECT fulltextserviceproperty('IsFulltextInstalled') as [Full-Text Search]

The result of the above script should return 1 as follows:

Image for post
Image for post

If you are getting any number other than 1, refer to the articles mentioned above again.

Setup Sample Database (WatchReviewsStoplist)

First, you need to set up a sample database named WatchReviewsStoplist. Use the below script:

-- Create WatchReviewsStoplist database
CREATE DATABASE WatchReviewsStoplist;
GO
-- Connect to the sample database
USE WatchReviewsStoplist
-- (2) Create WatchReview table
CREATE TABLE [dbo].[WatchReview]
(
[ReviewId] INT NOT NULL IDENTITY ,
[Date] DATETIME2 NULL,
[Person] VARCHAR(50) NULL,
[Details] VARCHAR(1000) NULL,
CONSTRAINT [PK_WatchReview] PRIMARY KEY (ReviewId)
)
-- (3) Populate WatchReview table
SET IDENTITY_INSERT [dbo].[WatchReview] ON
INSERT INTO [dbo].[WatchReview] ([ReviewId], [Date], [Person], [Details]) VALUES (1, N'2020-01-01 00:00:00', N'Atif', N'Hi, I have just bought this Casio black digital watch which is excellent and has date, alarm, stopwatch and timer as well.')
INSERT INTO [dbo].[WatchReview] ([ReviewId], [Date], [Person], [Details]) VALUES (2, N'2020-01-02 00:00:00', N'Qasim', N'Hi, I have just bought this Casio black analog watch which is average and slightly discomforting and just got date and time.')
INSERT INTO [dbo].[WatchReview] ([ReviewId], [Date], [Person], [Details]) VALUES (3, N'2020-01-03 00:00:00', N'Martin', N'Hi, I have just purchased this Casio black digital watch which is excellent and has features like alarm, stopwatch, date and timer. ')
INSERT INTO [dbo].[WatchReview] ([ReviewId], [Date], [Person], [Details]) VALUES (4, N'2020-01-10 00:00:00', N'Brian ', N'Hi, I have just ordered this Seiko black digital watch which is excellent and has date, alarm and timer.')
INSERT INTO [dbo].[WatchReview] ([ReviewId], [Date], [Person], [Details]) VALUES (5, N'2020-01-11 00:00:00', N'Sarah', N'Hi, I have just ordered this Seiko white analog watch which is average and has alarm, date and timer.')
SET IDENTITY_INSERT [dbo].[WatchReview] OFF

Check Sample Database

Run the following T-SQL script to view the table contents in the sample database:

-- Check sample database
SELECT wr.ReviewId,wr.Person,wr.Details FROM dbo.WatchReview wr

The output is as follows:

Image for post
Image for post

Stoplist Scenario

Assume that we’ve asked the development team to improve the Full-Text Search for the customers’ reviews. They should do it by excluding noise words. This way, the Full-Text Search remains efficient and also focused.

To fulfill these requirements, we need first to set up the Full-Text Search. It means taking the following actions:

  • Create a Full-Text Catalog
  • Create an empty custom or user-defined Stoplist
  • Create a Full-Text index

Create a full-text catalog with the following script:

-- Create Full-Text catalog
CREATE FULLTEXT CATALOG [WatchReviewCatalog] AS DEFAULT;
GO

Create a blank custom stoplist with the following script:

-- Create empty custom stoplist
CREATE FULLTEXT STOPLIST [WatchReviewStoplist];
GO

A custom stoplist is now created.

Finally, create a Full-Text Index in the Details column for the British English language. It will point to the custom stoplist created earlier. Use the following T-SQL script for this action:

-- Create Full-Text index pointing to the previously created Stoplist
CREATE FULLTEXT INDEX ON dbo.WatchReview(Details LANGUAGE [British English])
KEY INDEX PK_WatchReview
WITH STOPLIST = WatchReviewStoplist;
GO

Run the following Full-Text query to find out which product (watch) got excellent marks from customers:

--Search customer reviews where the product was rated excellent
SELECT * FROM dbo.WatchReview
WHERE CONTAINS(Details,'excellent')

The results are below:

Image for post
Image for post

We can see that customers have given excellent ratings to the black digital watch. Therefore, we know the best product based on reviews, and we understand that the Full-Text Search works fine.

Still, there are few concerns that you might ignore for a table of five rows, but it can get us performance and storage benefits in a relatively large result set if addressed promptly.

Search noise word ‘Hi’ with Full-Text query

There are many noise words attached to the Full-Text index. Soon, they can become a burden. Plus, they are no use to the search.

Let’s check if the noise word Hi is present in the Full-Text Search results because it is better to exclude it.

Run the Full-Text query as follows:

--Run Full-Text query to Search for Noise word 'Hi'
SELECT * FROM dbo.WatchReview
WHERE CONTAINS(Details,'Hi')

The results are below:

Image for post
Image for post

The results returned to show that the word “Hi” is part of the Full-Text search results. Thus, we need to exclude it — it is meaningless to the search and does not help any analysis.

We will add the stopword Hi by altering our custom Stoplist so that it cannot be indexed by Full-Text search and should not be returned by Full-Text queries to save space and improve the search:

--Alter customer stoplist to add Hi noise word
ALTER FULLTEXT STOPLIST WatchReviewStoplist
ADD 'Hi' LANGUAGE 'British English';

Search for the noise word Hi after adding it to the Stoplist:

--Search Noise word 'Hi' after it has been added as a stopword
SELECT * FROM dbo.WatchReview
WHERE CONTAINS(Details,'Hi')

The output is as follows:

Image for post
Image for post

Anytime, you can check the stopwords added to the Stoplist. Run the following script for this:

-- Check added stopwords 
SELECT * FROM sys.fulltext_stopwords

The results are below:

Image for post
Image for post

Congratulations! You have successfully created a stoplist and added a stopword to it. This way, you make the Full-Text Search more efficient in both performance and storage.

Things to do

Now that you can create stoplists and add stopwords to them, try the following to further improve your skills:

  • Add words ‘I,’ ‘this,’ ‘just,’ ‘as noise words to your custom stoplist created in this walkthrough.
  • Check the added stopwords by running the script we specified at the end of the walkthrough.

Originally published on https://codingsight.com/how-to-use-stopwords-and-stoplist-to-improve-sql-server-full-text-search-fts/, a community platform for database administrators and Microsoft stack technologies specialists.

Subscribe here to get SQL Server industry insides 👋

Written by

Awesome blog focused on databases and Microsoft, .NET and cloud technologies. http://codingsight.com/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store