Hunspell dictionaries for full-text search
Dictionaries
Below, we provide instructions on how to add an English dictionary, as an example.
Adding a dictionary
-
Connect to the database using
psql
. -
To find out which languages are available, get a list of preset configurations for full-text search:
SELECT cfgname FROM pg_catalog.pg_ts_config;
The result includes the configuration names represented as language names.
-
Create the
public.my_config
configuration for full-text search:CREATE TEXT SEARCH CONFIGURATION public.my_config ( COPY = pg_catalog.<configuration> );
In your SQL query, specify the language configuration obtained in the previous step.
-
Create a dictionary named
my_dictionary
in the DB:CREATE TEXT SEARCH DICTIONARY my_dictionary ( TEMPLATE = ispell, DictFile = <words_in_dictionary>, AffFile = <affixes>, Stopwords = <stopwords> );
SQL query properties:
TEMPLATE
: Template used to create a dictionary. You can learn more about Ispell dictionaries here .DictFile
: Preset file with words organized in a dictionary.AffFile
: Preset file with affixes (prefixes, suffixes, and endings) you can add to words in the dictionary.Stopwords
: Stopwords to disregard in full-text searches. Such words may include articles, prepositions and interjections.
Files with dictionaries and affixes preset in Managed Service for PostgreSQL clusters:
Language DictFile
dictionaryAffFile
affixesEnglish en_gb.dict
en_GB.affix
Danish da_dk.dict
da_DK.affix
Spanish es_es.dict
es_ES.affix
Italian it_it.dict
it_IT.affix
German de_de_frami.dict
de_de_frami.affix
Polish pl_pl.dict
pl_PL.affix
Russian ru_ru.dict
ru_RU.affix
Ukrainian uk_ua.dict
uk_UA.affix
Czech cs_cz.dict
cs_CZ.affix
There are also examples of PostgreSQL dictionaries
:DictFile
dictionaryAffFile
affixeshunspell_sample_long.dict
hunspell_sample_long.affix
hunspell_sample_num.dict
hunspell_sample_num.affix
ispell_sample.dict
ispell_sample.affix
In the
CREATE TEXT SEARCH DICTIONARY
SQL query, do not use the.dict
and.affix
extensions in the file names. -
Link the dictionary named
my_dictionary
and other dictionaries with theword
token type.Token is a word or phrase being searched for. It is specified in the search query and displayed in the full-text search results.
ALTER TEXT SEARCH CONFIGURATION public.my_config ALTER MAPPING FOR word WITH my_dictionary,<list of_dictionaries>;
In the
WITH
row, specify the publicly available dictionaries with a broader range of words, e.g.,english_ispell
orenglish_stem
. The broader the dictionary, the further down it should be listed in theWITH
row. -
Set
public.my_config
as the default configuration:SET default_text_search_config = 'public.my_config';
-
Make sure the default configuration is set to
public.my_config
:SHOW default_text_search_config;
Result:
default_text_search_config ---------------------------- public.my_config (1 row)
-
Make sure the full-text search works:
SELECT * FROM ts_debug('<token>');
Specify a word that can be found in the DB documents as the token.
Result:
alias | description | token | dictionaries | dictionary + lexemes -------+-------------------+---------+--------------------------+--------------------------------+------------ word | Word, all letters | <token> | {<used_dictionaries>} | <dictionary_with_found_lexeme> | {<lexemes>} (1 row)
Here, a lexeme is a word that replaces cognate words in full-text searches. For example, if a DB document contains such words as cloud, clouds, and cloud's, PostgreSQL can recognize them as a single lexeme, "cloud".
Examples
Adding an English dictionary
-
Connect to the database using
psql
. -
Create the
public.my_english_config
configuration for full-text search:CREATE TEXT SEARCH CONFIGURATION public.my_english_config ( COPY = pg_catalog.english );
-
Create a dictionary in the DB:
CREATE TEXT SEARCH DICTIONARY english_hunspell ( TEMPLATE = ispell, DictFile = en_gb, AffFile = en_GB, Stopwords = english );
-
Link dictionaries named
english_hunspell
andenglish_stem
with theword
token type:ALTER TEXT SEARCH CONFIGURATION public.my_english_config ALTER MAPPING FOR word WITH english_hunspell, english_stem;
-
Set
public.my_english_config
as the default configuration:SET default_text_search_config = 'public.my_english_config';
-
Make sure the default configuration is set to
public.my_english_config
:SHOW default_text_search_config;
Result:
default_text_search_config ---------------------------- public.my_english_config (1 row)
-
Make sure the full-text search works:
SELECT * FROM ts_debug('<token>');
Specify a word that can be found in the DB documents as the token.
Result:
alias | description | token | dictionaries | dictionary + lexemes -------+-------------------+---------+---------------------------------+------------------+------------ word | Word, all letters | <token> | {english_hunspell,english_stem} | english_hunspell | {<lexemes>} (1 row)