wefe
.get_embeddings_from_sets¶
- wefe.get_embeddings_from_sets(model: wefe.word_embedding_model.WordEmbeddingModel, sets: Sequence[Sequence[str]], sets_name: Optional[str] = None, preprocessors: List[Dict[str, Union[str, bool, Callable]]] = [{}], strategy: str = 'first', normalize: bool = False, discard_incomplete_sets: bool = True, warn_lost_sets: bool = True, verbose: bool = False) List[Dict[str, numpy.ndarray]] [source]¶
Given a sequence of word sets, obtain their corresponding embeddings.
- Parameters
- model
- setsSequence[Sequence[str]]
A sequence containing word sets. Example: [[‘woman’, ‘man’], [‘she’, ‘he’], [‘mother’, ‘father’] …].
- sets_nameUnion[str, optional]
The name of the set of word sets. Example: definning sets. This parameter is used only for printing. by default None
- preprocessorsList[Dict[str, Union[str, bool, Callable]]]
A list with preprocessor options.
A
preprocessor
is a dictionary that specifies what processing(s) are performed on each word before it is looked up in the model vocabulary. For example, thepreprocessor
{'lowecase': True, 'strip_accents': True}
allows you to lowercase and remove the accent from each word before searching for them in the model vocabulary. Note that an empty dictionary{}
indicates that no preprocessing is done.The possible options for a preprocessor are:
lowercase
:bool
. Indicates that the words are transformed to lowercase.uppercase
:bool
. Indicates that the words are transformed to uppercase.titlecase
:bool
. Indicates that the words are transformed to titlecase.strip_accents
:bool
,{'ascii', 'unicode'}
: Specifies that the accents of the words are eliminated. The stripping type can be specified. True uses ‘unicode’ by default.preprocessor
:Callable
. It receives a function that operates on each word. In the case of specifying a function, it overrides the default preprocessor (i.e., the previous options stop working).
A list of preprocessor options allows you to search for several variants of the words into the model. For example, the preprocessors
[{}, {"lowercase": True, "strip_accents": True}]
{}
allows first to search for the original words in the vocabulary of the model. In case some of them are not found,{"lowercase": True, "strip_accents": True}
is executed on these words and then they are searched in the model vocabulary. by default [{}]- strategystr, optional
The strategy indicates how it will use the preprocessed words: ‘first’ will include only the first transformed word found. all’ will include all transformed words found, by default “first”.
- normalizebool, optional
True indicates that embeddings will be normalized, by default False
- discard_incomplete_setsbool, optional
True indicates that if a set could not be completely converted, it will be discarded., by default True
- warn_lost_setsbool, optional
Indicates whether word sets that cannot be fully converted to embeddings are warned in the logger, by default True
- verbosebool, optional
Indicates whether the execution status of this function is printed, by default False
- Returns
- List[EmbeddingDict]
A list of dictionaries. Each dictionary contains as keys a pair of words and as values their associated embeddings.