`wefe`.WordEmbeddingModel¶

class wefe.WordEmbeddingModel(wv: gensim.models.keyedvectors.KeyedVectors, name: Optional[str] = None, vocab_prefix: Optional[str] = None)[source]¶

A wrapper for Word Embedding pre-trained models.

It can hold gensim’s KeyedVectors or gensim’s api loaded models. It includes the name of the model and some vocab prefix if needed.

__init__(wv: gensim.models.keyedvectors.KeyedVectors, name: Optional[str] = None, vocab_prefix: Optional[str] = None)[source]¶

Initialize the word embedding model.

Parameters

wvBaseKeyedVectors.: An instance of word embedding loaded through gensim KeyedVector interface or gensim’s api.
namestr, optional: The name of the model, by default ‘’.
vocab_prefixstr, optional.: A prefix that will be concatenated with all word in the model vocab, by default None.

Raises

TypeError: if word_embedding is not a KeyedVectors instance.
TypeError: if model_name is not None and not an instance of str.
TypeError: if vocab_prefix is not None and not an instance of str.

Examples

>>> from gensim.test.utils import common_texts
>>> from gensim.models import Word2Vec
>>> from wefe.word_embedding_model import WordEmbeddingModel

>>> dummy_model = Word2Vec(common_texts, window=5,
...                        min_count=1, workers=1).wv

>>> model = WordEmbeddingModel(dummy_model, 'Dummy model dim=10',
...                            vocab_prefix='/en/')
>>> print(model.name)
Dummy model dim=10
>>> print(model.vocab_prefix)
/en/

Attributes

wvBaseKeyedVectors: The model.
vocab :: The vocabulary of the model (a dict with the words that have an associated embedding in the model).
model_namestr: The name of the model.
vocab_prefixstr: A prefix that will be concatenated with each word of the vocab of the model.

batch_update(words: Sequence[str], embeddings: Union[Sequence[numpy.ndarray], numpy.ndarray])[source]¶

Update a batch of embeddings.

This method calls update_embedding method with each of the word-embedding pairs. All words must be in the vocabulary, otherwise an exception will be thrown. Note that both words and embeddings must have the same number of elements, otherwise the method will raise an exception.

Parameters

wordsSequence[str]: A sequence (list, tuple or np.array) that contains the words whose representations will be updated.
embeddingsUnion[Sequence[np.ndarray], np.array],: A sequence (list or tuple) or a np.array of embeddings or an np.array that contains all the new embeddings. The embeddings must have the same size and data type as the model.

Raises

TypeError: if words is not a list
TypeError: if embeddings is not an np.ndarray
Exception: if words collection has not the same size of the embedding array.

normalize()[source]¶

Normalize word embeddings in the model by using the L2 norm.

Use the init_sims function of the gensim’s KeyedVectors class. Warning: This operation is inplace. In other words, it replaces the embeddings with their L2 normalized versions.

update(word: str, embedding: numpy.ndarray)[source]¶

Update the value of an embedding of the model.

If the method is executed with a word that is not in the vocabulary, an exception will be raised.

Parameters

wordstr: The word whose embedding will be replaced. This word must be in the model’s vocabulary.
embeddingnp.ndarray: An embedding representing the word. It must have the same dimensions and data type as the model embeddings.

Raises

TypeError: if word is not a1 string.
TypeError: if embedding is not an np.array.
ValueError: if word is not in the model’s vocabulary.
ValueError: if the embedding is not the same size as the size of the model’s embeddings.
ValueError: if the dtype of the embedding values is not the same as the model’s embeddings.

wefe.WordEmbeddingModel¶

`wefe`.WordEmbeddingModel¶