WEFE depends on gensim’s KeyedVectors to operate the word embeddings models. Therefore, any embedding you want to experiment with must be a model loaded through gensim’s APIs or any library that extends it.

In technical terms, the minimum requirement for WEFE to operate with a model is that it extends the BaseKeyedVectors class.

Next we show several options to load models using different sources.

## Create a example query¶

In this section we only create an example query (same as the query of user guide) to be used in the following sections.

>>> # Load the query
>>> from wefe.query import Query
>>> from wefe.word_embedding import
>>> from wefe.metrics.WEAT import WEAT
>>>
>>> # load the weat word sets
>>>
>>> # create the query
>>> query = Query([word_sets['male_terms'], word_sets['female_terms']],
>>>               [word_sets['career'], word_sets['family']],
>>>               ['Male terms', 'Female terms'],
>>>               ['Career', 'Family'])
>>>
>>> # instantiate the metric
>>> weat = WEAT()


Gensim provides an extensive list of pre-trained models that can be used directly. Below we show an example of use.

>>> import gensim.downloader as api
>>>
>>>
>>> # The resulting object is already a BaseKeyedVectors subclass object.
>>> # so we can wrap directly using .
>>> glove_25_model = (glove_25_keyed_vectors, 'glove-25')
>>>
>>> # Execute the query
>>> result = weat.run_query(query, glove_25_model)
>>> print(result)
{'query_name': 'Male terms and Female terms wrt Career and Family', 'result': 0.33814692}


As we said before, any model that is loaded with gensim and extends BaseKeyedVectors can be used in WEFE to measure bias. In this section we will see how to load a word2vec model and Fasttext.

Note

Gensim is not directly compatible with glove model file format. However, they provide a script that allows you to transform any glove model into a word2vec format.

For example, let’s load word2vec from a .bin file The procedure is quite simple: first we download word2vec binary file from its source and then we load it using the KeyedVectors.load_word2vec_format function.

>>> from gensim.models import KeyedVectors
>>>
>>> word2vec = (w2v_embeddings, 'word2vec')
>>>
>>> result = weat.run_query(query, word2vec)
>>> result
{'query_name': 'Male terms and Female terms wrt Career and Family',
'result': 0.7280304}


The same method works for Fasttext.

>>> from gensim.models import KeyedVectors
>>>
>>> fast = (fast_embeddings, 'fast')
>>> result = weat.run_query(query, fast)
>>>
>>> result
{'query_name': 'Male terms and Female terms wrt Career and Family',
'result': 0.34870023}


While we load FastText here as KeyedVectors (i.e. in word2vec format), it can also be used via FastTextKeyedVectors.

## Flair¶

WEFE does not yet support flair interfaces. However, you can use static embeddings of flair ( Classic Word Embeddings ) which are based on gensim’s KeyedVectors, to load embedding models. The following code is an example of this:

>>> from flair.embeddings import WordEmbeddings
>>>
>>> glove_embedding = WordEmbeddings('glove') # 100 dim glove
>>>
>>> # extract KeyedVectors object
>>> glove_keyed_vectors = glove_embedding.precomputed_word_embeddings
>>> glove_100 = (glove_keyed_vectors, 'glove-100')
>>>
>>> result = weat.run_query(query, glove_100)
>>> print(result)
{'query_name': 'Male terms and Female terms wrt Career and Family', 'result': 1.0486683}