wefe.WEAT

class wefe.WEAT[source]

Word Embedding Association Test (WEAT).

The metric was originally proposed in [1]. Visit WEAT in Metrics Section for further information.

References

[1]: Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. Semantics derived
automatically from language corpora contain human-like biases.
Science, 356(6334):183–186, 2017.
__init__(*args, **kwargs)
run_query(query: Query, model: WordEmbeddingModel, return_effect_size: bool = False, calculate_p_value: bool = False, p_value_test_type: str = 'right-sided', p_value_method: str = 'approximate', p_value_iterations: int = 10000, p_value_verbose: bool = False, lost_vocabulary_threshold: float = 0.2, preprocessors: List[Dict[str, Union[str, bool, Callable]]] = [{}], strategy: str = 'first', normalize: bool = False, warn_not_found_words: bool = False, *args: Any, **kwargs: Any) Dict[str, Any][source]

Calculate the WEAT metric over the provided parameters.

Parameters
queryQuery

A Query object that contains the target and attribute sets to be tested.

modelWordEmbeddingModel

A word embedding model.

return_effect_sizebool, optional

Specifies if the returned score in ‘result’ field of results dict is by default WEAT effect size metric, by default False

calculate_p_valuebool, optional

Specifies whether the p-value will be calculated through a permutation test. Warning: This can increase the computing time quite a lot, by default False.

p_value_test_type{‘left-sided’, ‘right-sided’, ‘two-sided}, optional

When calculating the p-value, specify the type of test to be performed. The options are ‘left-sided’, ‘right-sided’ and ‘two-sided , by default ‘right-sided’

p_value_method{‘exact’, ‘approximate’}, optional

When calculating the p-value, specify the method for calculating the p-value. This can be ‘exact ‘and ‘approximate’. by default ‘approximate’.

p_value_iterationsint, optional

If the p-value is calculated and the chosen method is ‘approximate’, it specifies the number of iterations that will be performed , by default 10000.

p_value_verbosebool, optional

In case of calculating the p-value, specify if notification messages will be logged during its calculation., by default False.

lost_vocabulary_thresholdfloat, optional

Specifies the proportional limit of words that any set of the query is allowed to lose when transforming its words into embeddings. In the case that any set of the query loses proportionally more words than this limit, the result values will be np.nan, by default 0.2

preprocessorsList[Dict[str, Union[str, bool, Callable]]]

A list with preprocessor options.

A preprocessor is a dictionary that specifies what processing(s) are performed on each word before it is looked up in the model vocabulary. For example, the preprocessor {'lowecase': True, 'strip_accents': True} allows you to lowercase and remove the accent from each word before searching for them in the model vocabulary. Note that an empty dictionary {} indicates that no preprocessing is done.

The possible options for a preprocessor are:

  • lowercase: bool. Indicates that the words are transformed to lowercase.

  • uppercase: bool. Indicates that the words are transformed to uppercase.

  • titlecase: bool. Indicates that the words are transformed to titlecase.

  • strip_accents: bool, {'ascii', 'unicode'}: Specifies that the accents of the words are eliminated. The stripping type can be specified. True uses ‘unicode’ by default.

  • preprocessor: Callable. It receives a function that operates on each word. In the case of specifying a function, it overrides the default preprocessor (i.e., the previous options stop working).

A list of preprocessor options allows you to search for several variants of the words into the model. For example, the preprocessors [{}, {"lowercase": True, "strip_accents": True}] {} allows first to search for the original words in the vocabulary of the model. In case some of them are not found, {"lowercase": True, "strip_accents": True} is executed on these words and then they are searched in the model vocabulary.

strategystr, optional

The strategy indicates how it will use the preprocessed words: ‘first’ will include only the first transformed word found. all’ will include all transformed words found, by default “first”.

normalizebool, optional

True indicates that embeddings will be normalized, by default False

warn_not_found_wordsbool, optional

Specifies if the function will warn (in the logger) the words that were not found in the model’s vocabulary, by default False.

Returns
Dict[str, Any]

A dictionary with the query name, the resulting score of the metric, and the scores of WEAT and the effect size of the metric.

Examples

The following example shows how to run a query that measures gender bias using WEAT:

>>> from wefe.query import Query
>>> from wefe.utils import load_test_model
>>> from wefe.metrics import WEAT
>>>
>>> # define the query
>>> query = Query(
...     target_sets=[
...         ["female", "woman", "girl", "sister", "she", "her", "hers",
...          "daughter"],
...         ["male", "man", "boy", "brother", "he", "him", "his", "son"],
...     ],
...     attribute_sets=[
...         ["home", "parents", "children", "family", "cousins", "marriage",
...          "wedding", "relatives",
...         ],
...         ["executive", "management", "professional", "corporation", "salary",
...          "office", "business", "career",
...         ],
...     ],
...     target_sets_names=["Female terms", "Male Terms"],
...     attribute_sets_names=["Family", "Career"],
... )
>>>
>>> # load the model (in this case, the test model included in wefe)
>>> model = load_test_model()
>>>
>>> # instance the metric and run the query
>>> WEAT().run_query(query, model) 
{'query_name': 'Female terms and Male Terms wrt Family and Career',
'result': 0.4634388245467562,
'weat': 0.4634388245467562,
'effect_size': 0.45076532408312986,
'p_value': nan}
>>>
>>>

If you want to return the effect size as result value, use return_effect_size parameter as True while running the query.

>>> WEAT().run_query(query, model, return_effect_size=True) 
{'query_name': 'Female terms and Male Terms wrt Family and Career',
'result': 0.45076532408312986,
'weat': 0.4634388245467562,
'effect_size': 0.45076532408312986,
'p_value': nan}

If you want the embeddings to be normalized before calculating the metrics use the normalize parameter as True before executing the query.

>>> WEAT().run_query(query, model, normalize=True) 
{'query_name': 'Female terms and Male Terms wrt Family and Career',
'result': 0.4634388248814503,
'weat': 0.4634388248814503,
'effect_size': 0.4507653062895615,
'p_value': nan}

Using the calculate_p_value parameter as True you can indicate WEAT to run the permutation test and return its p-value. The argument p_value_method=’approximate’ indicates that the calculation of the permutation test will be approximate, i.e., not all possible permutations will be generated. Instead, random permutations of the attributes to test will be generated. On the other hand, the argument p_value_iterations indicates the number of permutations that will be generated and tested.

>>> WEAT().run_query(
...     query,
...     model,
...     calculate_p_value=True,
...     p_value_method="approximate",
...     p_value_iterations=10000,
... )  
{
    'query_name': 'Female terms and Male Terms wrt Family and Career',
    'result': 0.46343879750929773,
    'weat': 0.46343879750929773,
    'effect_size': 0.4507652708557911,
    'p_value': 0.1865813418658134
}