wefe.get_embeddings_from_sets

wefe.get_embeddings_from_sets(model: wefe.word_embedding_model.WordEmbeddingModel, sets: Sequence[Sequence[str]], sets_name: Optional[str] = None, preprocessors: List[Dict[str, Union[str, bool, Callable]]] = [{}], strategy: str = 'first', normalize: bool = False, discard_incomplete_sets: bool = True, warn_lost_sets: bool = True, verbose: bool = False) List[Dict[str, numpy.ndarray]][source]

Given a sequence of word sets, obtain their corresponding embeddings.

Parameters
model
setsSequence[Sequence[str]]

A sequence containing word sets. Example: [[‘woman’, ‘man’], [‘she’, ‘he’], [‘mother’, ‘father’] …].

sets_nameUnion[str, optional]

The name of the set of word sets. Example: definning sets. This parameter is used only for printing. by default None

preprocessorsList[Dict[str, Union[str, bool, Callable]]]

A list with preprocessor options.

A preprocessor is a dictionary that specifies what processing(s) are performed on each word before it is looked up in the model vocabulary. For example, the preprocessor {'lowecase': True, 'strip_accents': True} allows you to lowercase and remove the accent from each word before searching for them in the model vocabulary. Note that an empty dictionary {} indicates that no preprocessing is done.

The possible options for a preprocessor are:

  • lowercase: bool. Indicates that the words are transformed to lowercase.

  • uppercase: bool. Indicates that the words are transformed to uppercase.

  • titlecase: bool. Indicates that the words are transformed to titlecase.

  • strip_accents: bool, {'ascii', 'unicode'}: Specifies that the accents of the words are eliminated. The stripping type can be specified. True uses ‘unicode’ by default.

  • preprocessor: Callable. It receives a function that operates on each word. In the case of specifying a function, it overrides the default preprocessor (i.e., the previous options stop working).

A list of preprocessor options allows you to search for several variants of the words into the model. For example, the preprocessors [{}, {"lowercase": True, "strip_accents": True}] {} allows first to search for the original words in the vocabulary of the model. In case some of them are not found, {"lowercase": True, "strip_accents": True} is executed on these words and then they are searched in the model vocabulary. by default [{}]

strategystr, optional

The strategy indicates how it will use the preprocessed words: ‘first’ will include only the first transformed word found. all’ will include all transformed words found, by default “first”.

normalizebool, optional

True indicates that embeddings will be normalized, by default False

discard_incomplete_setsbool, optional

True indicates that if a set could not be completely converted, it will be discarded., by default True

warn_lost_setsbool, optional

Indicates whether word sets that cannot be fully converted to embeddings are warned in the logger, by default True

verbosebool, optional

Indicates whether the execution status of this function is printed, by default False

Returns
List[EmbeddingDict]

A list of dictionaries. Each dictionary contains as keys a pair of words and as values their associated embeddings.