Tuesday, January 10, 2012

SQL Server Fulltext search updates

If you are using Fulltext search with a SQL Server implementation of IDM, be advised that there is a new installation guide that contains updated information about Fulltext search.  I was recently passed this information that you might find helpful...

There is a noiseword (2005)/stop-word(2008) list that contains words that are not indexed. Most of these makes sense (a, the, or, and, etc.) and does not affect searches in IdM. Our naming of privileges with the word "only" was slightly more unfortunate as its also considered a noiseword. This requires customization of the noiseword/stopword list for customers that want to search for the repository privilege.

 

Customizing stopwords in SQL Server 2008:



SQL Server 2008 uses stopwords stored in the database. To customize the list you need to make a copy of the system stopwords list and assign it to be used with the IdM full-text index (ftfull). This can be done using these commands or from the user interface of the SSMS.

CREATE FULLTEXT STOPLIST idmStopList FROM SYSTEM STOPLIST;
-- Remove the words you want to include in the index:
ALTER FULLTEXT STOPLIST idmStopList DROP 'only' LANGUAGE 1033;
ALTER FULLTEXT INDEX ON mxi_values SET STOPLIST idmStopList

Its also possible to view the stopwords using queries. An example listing languages blocking the word "only" follows:

-- To list all entries of 'only' stopwords in the stoplist (can be many languages):
SELECT * FROM sys.fulltext_stopwords WHERE stoplist_id = (SELECT stoplist_id FROM sys.fulltext_stoplists where name = 'idmStopList') and stopword = 'only'



-- To test a stoplist
SELECT special_term, display_term FROM sys.dm_fts_parser (' "a text like system priv ad only somethingsomething" ', 1033,(SELECT stoplist_id FROM sys.fulltext_stoplists where name = 'idmStopList'), 0)







Here we see that "a" and "like" are considered noise and not indexed, while "only" is indexed for exact matches.

Customizing noisewords in SQL Server 2005:


Modify the noiseZZZ.txt files in sql server folder, remove "only" and other things you want to include from ENG,ENU (and others) and then drop & recreate fulltext index:

drop fulltext index on mxi_values
drop fulltext catalog ftfull
CREATE FULLTEXT CATALOG ftfull WITH ACCENT_SENSITIVITY=OFF AS DEFAULT;
CREATE FULLTEXT INDEX ON mxi_values(searchvalue) KEY INDEX IX_MXI_VALUES_Value_ID;
 




Hope you found this helpful! Thanks again to the folks that passed this info along!

2 comments:

Anonymous said...

I didn't even know something like this exists. Maybe I need it eventually, and hopefully I remember where I read it then, thanks!

Matt Pollicove said...

Yeah, it's been a feature that's been around for a while and is a favorite for those running NW IDM on SQL Server (Sorry Oracle fans, no Full Text Search for you...)