wefe.RND

class wefe.RND[source]

A implementation of Relative Norm Distance (RND).

It measures the relative strength of association of a set of neutral words with respect to two groups.

References

Nikhil Garg, Londa Schiebinger, Dan Ju-rafsky, and James Zou. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16):E3635–E3644,2018.

__init__(*args, **kwargs)
metric_name: str = 'Relative Norm Distance'
metric_short_name: str = 'RND'
metric_template: Tuple[Union[int, str], Union[int, str]] = (2, 1)
run_query(query: wefe.query.Query, word_embedding: wefe.word_embedding_model.WordEmbeddingModel, distance_type: str = 'norm', average_distances: bool = True, lost_vocabulary_threshold: float = 0.2, preprocessor_args: Dict[str, Optional[Union[bool, str, Callable]]] = {'lowercase': False, 'preprocessor': None, 'strip_accents': False}, secondary_preprocessor_args: Optional[Dict[str, Optional[Union[bool, str, Callable]]]] = None, warn_not_found_words: bool = False, *args: Any, **kwargs: Any) Dict[str, Any][source]

Calculate the RND metric over the provided parameters.

Parameters
queryQuery

A Query object that contains the target and attribute word sets for be tested.

word_embedding :

A object that contain certain word embedding pretrained model.

distance_typestr, optional

Specifies which type of distance will be calculated. It could be: {norm, cos} , by default ‘norm’.

average_distancesbool, optional

Specifies wheter the function averages the distances at the end of the calculations, by default True

lost_vocabulary_thresholdfloat, optional

Specifies the proportional limit of words that any set of the query is allowed to lose when transforming its words into embeddings. In the case that any set of the query loses proportionally more words than this limit, the result values will be np.nan, by default 0.2

preprocessor_argsPreprocessorArgs, optional

Dictionary with the arguments that specify how the pre-processing of the words will be done, by default {} The possible arguments for the function are: - lowercase: bool. Indicates if the words are transformed to lowercase. - strip_accents: bool, {‘ascii’, ‘unicode’}: Specifies if the accents of

the words are eliminated. The stripping type can be specified. True uses ‘unicode’ by default.

  • preprocessor: Callable. It receives a function that operates on each

    word. In the case of specifying a function, it overrides the default preprocessor (i.e., the previous options stop working).

, by default { ‘strip_accents’: False, ‘lowercase’: False, ‘preprocessor’: None, }

secondary_preprocessor_argsPreprocessorArgs, optional

Dictionary with the arguments that specify how the secondary pre-processing of the words will be done, by default None. Indicates that in case a word is not found in the model’s vocabulary (using the default preprocessor or specified in preprocessor_args), the function performs a second search for that word using the preprocessor specified in this parameter.

warn_not_found_wordsbool, optional

Specifies if the function will warn (in the logger) the words that were not found in the model’s vocabulary , by default False.

Returns
Dict[str, Any]

A dictionary with the query name, the resulting score of the metric, and a dictionary with the distances of each attribute word with respect to the target sets means.