wefe.preprocessing.preprocess_word

wefe.preprocessing.preprocess_word(word: str, options: dict[str, str | bool | Callable] = {}, vocab_prefix: str | None = None) str[source]

pre-processes a word before it is searched in the model’s vocabulary.

Parameters:
  • word (str) – Word to be preprocessed.

  • options (Dict[str, Union[str, bool, Callable]], optional) –

    Dictionary with arguments that specifies how the words will be preprocessed, The available word preprocessing options are as follows:

    • `lowercase`: bool. Indicates if the words are transformed to lowercase.

    • `uppercase`: bool. Indicates if the words are transformed to uppercase.

    • `titlecase`: bool. Indicates if the words are transformed to titlecase.

    • `strip_accents`: bool, {‘ascii’, ‘unicode’}: Specifies if the accents of the words are eliminated. The stripping type can be specified. True uses ‘unicode’ by default.

    • `preprocessor`: Callable. It receives a function that operates on each word. In the case of specifying a function, it overrides the default preprocessor (i.e., the previous options stop working).

    By default, no preprocessing is generated, which is equivalent to {}

Returns:

The pre-processed word according to the given parameters.

Return type:

str