Hunspell dictionaries for full-text search
PostgreSQL uses Dictionaries
The following examples show how to add an English dictionary.
Adding a dictionary
-
Connect to the database via
psql. -
To see which languages are available, get a list of pre-installed full-text search configurations:
SELECT cfgname FROM pg_catalog.pg_ts_config;The result will show configurations named after their respective languages.
-
Create the
public.my_configfull-text search configuration:CREATE TEXT SEARCH CONFIGURATION public.my_config ( COPY = pg_catalog.<configuration> );In your SQL query, specify the language configuration from the previous step.
-
Create a dictionary called
my_dictionaryin your database:CREATE TEXT SEARCH DICTIONARY my_dictionary ( TEMPLATE = ispell, DictFile = <dictionary_words>, AffFile = <affixes>, Stopwords = <stop_words> );SQL query parameters:
TEMPLATE: Dictionary template. Learn more about Ispell dictionaries here .DictFile: Pre-installed dictionary file.AffFile: Pre-installed affix file containing prefixes, suffixes, and endings for expanding dictionary words.Stopwords: Words to ignore in full-text searches, such as articles, prepositions, and interjections.
Dictionary and affix files pre-installed in Managed Service for PostgreSQL clusters:
Language Dictionary ( DictFile)Affixes ( AffFile)English en_gb.dicten_GB.affixDanish da_dk.dictda_DK.affixSpanish es_es.dictes_ES.affixItalian it_it.dictit_IT.affixGerman de_de_frami.dictde_de_frami.affixPolish pl_pl.dictpl_PL.affixRussian ru_ru.dictru_RU.affixUkrainian uk_ua.dictuk_UA.affixCzech cs_cz.dictcs_CZ.affixThere are also examples of PostgreSQL dictionaries
:Dictionary ( DictFile)Affixes ( AffFile)hunspell_sample_long.dicthunspell_sample_long.affixhunspell_sample_num.dicthunspell_sample_num.affixispell_sample.dictispell_sample.affixDo not specify the
.dictand.affixfile extensions in theCREATE TEXT SEARCH DICTIONARYSQL query. -
Link
my_dictionaryand other dictionaries to thewordtoken type.A token is a search word or phrase. You specify it in the search query and it appears in the full-text search results.
ALTER TEXT SEARCH CONFIGURATION public.my_config ALTER MAPPING FOR word WITH my_dictionary,<dictionary_list>;Use the
WITHclause to specify publicly available dictionaries with more extensive vocabulary, such asenglish_ispellorenglish_stem. Dictionaries should be ordered in theWITHclause from more specific to more general. -
Set
public.my_configas the default configuration:SET default_text_search_config = 'public.my_config'; -
Verify that the default configuration is
public.my_config:SHOW default_text_search_config;Result:
default_text_search_config ---------------------------- public.my_config (1 row) -
Verify that the full-text search is working:
SELECT * FROM ts_debug('<token>');For the token, specify the word that can be found in the documents within your database.
Result:
alias | description | token | dictionaries | dictionary + lexemes -------+-------------------+---------+--------------------------+--------------------------------+------------ word | Word, all letters | <token> | {<dictionaries_used>} | <dictionary_with_the_lexem_found> | {<lexemes>} (1 row)Here, a lexeme is a word that replaces all its morphological variants in full-text searches. For example, if a document in your database contains the words
flying
,flies
, andflew
, PostgreSQL will recognize them as a single lexeme,fly
.
Examples
Adding an English dictionary
-
Connect to the database via
psql. -
Create the
public.my_english_configfull-text search configuration:CREATE TEXT SEARCH CONFIGURATION public.my_english_config ( COPY = pg_catalog.english ); -
Create a dictionary in your database:
CREATE TEXT SEARCH DICTIONARY english_hunspell ( TEMPLATE = ispell, DictFile = en_gb, AffFile = en_GB, Stopwords = english ); -
Link
english_hunspellandenglish_stemdictionaries to thewordtoken type:ALTER TEXT SEARCH CONFIGURATION public.my_english_config ALTER MAPPING FOR word WITH english_hunspell, english_stem; -
Set
public.my_english_configas the default configuration:SET default_text_search_config = 'public.my_english_config'; -
Verify that the default configuration is
public.my_english_config:SHOW default_text_search_config;Result:
default_text_search_config ---------------------------- public.my_english_config (1 row) -
Verify that the full-text search is working:
SELECT * FROM ts_debug('<token>');For the token, specify the word that can be found in the documents within your database.
Result:
alias | description | token | dictionaries | dictionary + lexemes -------+-------------------+---------+---------------------------------+------------------+------------ word | Word, all letters | <token> | {english_hunspell,english_stem} | english_hunspell | {<lexemes>} (1 row)