wefe
.ECT¶
- class wefe.ECT[source]¶
An implementation of the Embedding Coherence Test.
The metrics was originally proposed in [1] and implemented in [2].
The general steps of the test, as defined in [1], are as follows:
Embedd all given target and attribute words with the given embedding model
Calculate mean vectors for the two sets of target word vectors
Measure the cosine similarity of the mean target vectors to all of the given attribute words
Calculate the Spearman r correlation between the resulting two lists of similarities
Return the correlation value as score of the metric (in the range of -1 to 1); higher is better
References
[1]: Dev, S., & Phillips, J. (2019, April). Attenuating Bias in Word vectors.- __init__(*args, **kwargs)¶
- run_query(query: wefe.query.Query, word_embedding: wefe.word_embedding_model.WordEmbeddingModel, lost_vocabulary_threshold: float = 0.2, preprocessor_args: Dict[str, Optional[Union[bool, str, Callable]]] = {'lowercase': False, 'preprocessor': None, 'strip_accents': False}, secondary_preprocessor_args: Optional[Dict[str, Optional[Union[bool, str, Callable]]]] = None, warn_not_found_words: bool = False, *args: Any, **kwargs: Any) Dict[str, Any] [source]¶
Runs ECT with the given query with the given parameters.
- Parameters
- queryQuery
A Query object that contains the target and attribute word sets to be tested.
- word_embedding :
A object that contains certain word embedding pretrained model.
- lost_vocabulary_thresholdfloat, optional
Specifies the proportional limit of words that any set of the query is allowed to lose when transforming its words into embeddings. In the case that any set of the query loses proportionally more words than this limit, the result values will be np.nan, by default 0.2
- preprocessor_argsPreprocessorArgs, optional
Dictionary with the arguments that specify how the pre-processing of the words will be done, by default {} The possible arguments for the function are: - lowercase: bool. Indicates if the words are transformed to lowercase. - strip_accents: bool, {‘ascii’, ‘unicode’}: Specifies if the accents of
the words are eliminated. The stripping type can be specified. True uses ‘unicode’ by default.
- preprocessor: Callable. It receives a function that operates on each
word. In the case of specifying a function, it overrides the default preprocessor (i.e., the previous options stop working).
, by default { ‘strip_accents’: False, ‘lowercase’: False, ‘preprocessor’: None, }
- secondary_preprocessor_argsPreprocessorArgs, optional
Dictionary with the arguments that specify how the secondary pre-processing of the words will be done, by default None. Indicates that in case a word is not found in the model’s vocabulary (using the default preprocessor or specified in preprocessor_args), the function performs a second search for that word using the preprocessor specified in this parameter.
- warn_not_found_wordsbool, optional
Specifies if the function will warn (in the logger) the words that were not found in the model’s vocabulary , by default False.
- Returns
- Dict[str, Any]
A dictionary with the query name and the result of the query.