Hunspell dictionaries for full-text search
Dictionaries
Below, we provide instructions on how to add an English dictionary, as an example.
Adding a dictionary
-
Connect to the database using
psql. -
To find out which languages are available, get a list of preset configurations for full-text search:
SELECT cfgname FROM pg_catalog.pg_ts_config;The result includes the configuration names represented as language names.
-
Create the
public.my_configconfiguration for full-text search:CREATE TEXT SEARCH CONFIGURATION public.my_config ( COPY = pg_catalog.<configuration> );In your SQL query, specify the language configuration obtained in the previous step.
-
Create a dictionary named
my_dictionaryin the DB:CREATE TEXT SEARCH DICTIONARY my_dictionary ( TEMPLATE = ispell, DictFile = <words_in_dictionary>, AffFile = <affixes>, Stopwords = <stopwords> );SQL query properties:
TEMPLATE: Template used to create a dictionary. You can learn more about Ispell dictionaries here .DictFile: Preset file with words organized in a dictionary.AffFile: Preset file with affixes (prefixes, suffixes, and endings) you can add to words in the dictionary.Stopwords: Stopwords to disregard in full-text searches. Such words may include articles, prepositions and interjections.
Files with dictionaries and affixes preset in Managed Service for PostgreSQL clusters:
Language DictFiledictionaryAffFileaffixesEnglish en_gb.dicten_GB.affixDanish da_dk.dictda_DK.affixSpanish es_es.dictes_ES.affixItalian it_it.dictit_IT.affixGerman de_de_frami.dictde_de_frami.affixPolish pl_pl.dictpl_PL.affixRussian ru_ru.dictru_RU.affixUkrainian uk_ua.dictuk_UA.affixCzech cs_cz.dictcs_CZ.affixThere are also examples of PostgreSQL dictionaries
:DictFiledictionaryAffFileaffixeshunspell_sample_long.dicthunspell_sample_long.affixhunspell_sample_num.dicthunspell_sample_num.affixispell_sample.dictispell_sample.affixIn the
CREATE TEXT SEARCH DICTIONARYSQL query, do not use the.dictand.affixextensions in the file names. -
Link the dictionary named
my_dictionaryand other dictionaries with thewordtoken type.Token is a word or phrase being searched for. It is specified in the search query and displayed in the full-text search results.
ALTER TEXT SEARCH CONFIGURATION public.my_config ALTER MAPPING FOR word WITH my_dictionary,<list of_dictionaries>;In the
WITHrow, specify the publicly available dictionaries with a broader range of words, e.g.,english_ispellorenglish_stem. The broader the dictionary, the further down it should be listed in theWITHrow. -
Set
public.my_configas the default configuration:SET default_text_search_config = 'public.my_config'; -
Make sure the default configuration is set to
public.my_config:SHOW default_text_search_config;Result:
default_text_search_config ---------------------------- public.my_config (1 row) -
Make sure the full-text search works:
SELECT * FROM ts_debug('<token>');Specify a word that can be found in the DB documents as the token.
Result:
alias | description | token | dictionaries | dictionary + lexemes -------+-------------------+---------+--------------------------+--------------------------------+------------ word | Word, all letters | <token> | {<used_dictionaries>} | <dictionary_with_found_lexeme> | {<lexemes>} (1 row)Here, a lexeme is a word that replaces cognate words in full-text searches. For example, if a DB document contains such words as cloud, clouds, and cloud's, PostgreSQL can recognize them as a single lexeme, "cloud".
Examples
Adding an English dictionary
-
Connect to the database using
psql. -
Create the
public.my_english_configconfiguration for full-text search:CREATE TEXT SEARCH CONFIGURATION public.my_english_config ( COPY = pg_catalog.english ); -
Create a dictionary in the DB:
CREATE TEXT SEARCH DICTIONARY english_hunspell ( TEMPLATE = ispell, DictFile = en_gb, AffFile = en_GB, Stopwords = english ); -
Link dictionaries named
english_hunspellandenglish_stemwith thewordtoken type:ALTER TEXT SEARCH CONFIGURATION public.my_english_config ALTER MAPPING FOR word WITH english_hunspell, english_stem; -
Set
public.my_english_configas the default configuration:SET default_text_search_config = 'public.my_english_config'; -
Make sure the default configuration is set to
public.my_english_config:SHOW default_text_search_config;Result:
default_text_search_config ---------------------------- public.my_english_config (1 row) -
Make sure the full-text search works:
SELECT * FROM ts_debug('<token>');Specify a word that can be found in the DB documents as the token.
Result:
alias | description | token | dictionaries | dictionary + lexemes -------+-------------------+---------+---------------------------------+------------------+------------ word | Word, all letters | <token> | {english_hunspell,english_stem} | english_hunspell | {<lexemes>} (1 row)