wefe.debias.multiclass_hard_debias.MulticlassHardDebias
- class wefe.debias.multiclass_hard_debias.MulticlassHardDebias(pca_args: dict[str, Any] = {'n_components': 10}, verbose: bool = False, criterion_name: str | None = None)[source]
Bases:
BaseDebiasGeneralized version of Hard Debias that enables multiclass debiasing.
Generalized refers to the fact that this method extends Hard Debias in order to support more than two types of social target sets within the definitional set. For example, for the case of religion bias, it supports a debias using words associated with Christianity, Islam and Judaism.
Examples
Note
For more information on the use of mitigation methods, visit Bias Mitigation (Debias) in the User Guide.
The following example shows how to run an ethnicity debias based on Black, Asian and Caucasian groups.
>>> from wefe.datasets import fetch_debias_multiclass, load_weat >>> from wefe.debias.multiclass_hard_debias import MulticlassHardDebias >>> from wefe.utils import load_test_model >>> >>> model = load_test_model() # load a reduced version of word2vec >>> >>> # obtain the sets of words that will be used in the debias process. >>> multiclass_debias_wordsets = fetch_debias_multiclass() >>> weat_wordsets = load_weat() >>> >>> ethnicity_definitional_sets = ( ... multiclass_debias_wordsets["ethnicity_definitional_sets"] ... ) >>> ethnicity_equalize_sets = list( ... multiclass_debias_wordsets["ethnicity_analogy_templates"].values() ... ) >>> >>> # instance the debias object that will perform the mitigation >>> mhd = MulticlassHardDebias(verbose=False, criterion_name="ethnicity") >>> # fits the transformation parameters (bias direction, etc...) >>> mhd.fit( ... model=model, ... definitional_sets=ethnicity_definitional_sets, ... equalize_sets=ethnicity_equalize_sets, ... ) >>> >>> # perform the transformation (debiasing) on the embedding model >>> ethnicity_debiased_model = mhd.transform(model, copy=True)
References
[1]: Manzini, T., Chong, L. Y., Black, A. W., & Tsvetkov, Y. (2019, June).Black is to Criminal as Caucasian is to Police: Detecting and Removing MulticlassBias in Word Embeddings.In Proceedings of the 2019 Conference of the North American Chapter of theAssociation for Computational Linguistics: Human Language Technologies,Volume 1 (Long and Short Papers) (pp. 615-621).- __init__(pca_args: dict[str, Any] = {'n_components': 10}, verbose: bool = False, criterion_name: str | None = None) None[source]
Initialize a Multiclass Hard Debias instance.
- Parameters:
pca_args (Dict[str, Any], optional) – Arguments for the PCA that is calculated internally in the identification of the bias subspace, by default {“n_components”: 10}
verbose (bool, optional) – True will print informative messages about the debiasing process, by default False.
criterion_name (Optional[str], optional) – The name of the criterion for which the debias is being executed, e.g. ‘Gender’. This will indicate the name of the model returning transform, by default None
- fit(model: WordEmbeddingModel, definitional_sets: list[list[str]], equalize_sets: list[list[str]]) BaseDebias[source]
Compute the bias direction and obtains the equalize embedding pairs.
- Parameters:
model (WordEmbeddingModel) – The word embedding model to debias.
definitional_sets (List[List[str]]) – A sequence of string pairs that will be used to define the bias direction. For example, for the case of gender debias, this list could be [[‘woman’, ‘man’], [‘girl’, ‘boy’], [‘she’, ‘he’], [‘mother’, ‘father’], …]. Multiclass hard debias also accepts lists of sets of more than two words, such as religion where sets of words representing Christianity, Islam and Judaism can be used. See the example for more information.
equalize_pairs (Optional[List[List[str]]], optional) – A list with pairs of strings, which will be equalized. In the case of passing None, the equalization will be done over the word pairs passed in definitional_sets, by default None.
- Returns:
The debias method fitted.
- Return type:
BaseDebias
- fit_transform(model: WordEmbeddingModel, target: list[str] | None = None, ignore: list[str] | None = None, copy: bool = True, **fit_params) WordEmbeddingModel
Convenience method to execute fit and transform in a single call.
- Parameters:
model (WordEmbeddingModel) – A word embedding model object.
target (Optional[List[str]], optional) – If a set of words is specified in target, the debias method will be applied only on the word embeddings of this set, by default None.
ignore (Optional[List[str]], optional) – If target is None and a set of words is specified in ignore, the debias method will debias all words except those specified in ignore, by default None.
copy (bool, optional) – If True, the debias will be performed on a copy of the model. If False, the debias will be applied on the same model delivered, causing its vectors to mutate. WARNING: Setting copy with True requires at least 2x RAM of the size of the model. Otherwise the execution of the debias may raise MemoryError, by default True.
verbose (bool, optional) – True will print informative messages about the debiasing process, by default True.
- Returns:
The debiased word embedding model.
- Return type:
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- set_fit_request(*, definitional_sets: bool | None | str = '$UNCHANGED$', equalize_sets: bool | None | str = '$UNCHANGED$', model: bool | None | str = '$UNCHANGED$') MulticlassHardDebias
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
definitional_sets (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
definitional_setsparameter infit.equalize_sets (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
equalize_setsparameter infit.model (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
modelparameter infit.
- Returns:
self – The updated object.
- Return type:
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_transform_request(*, copy: bool | None | str = '$UNCHANGED$', ignore: bool | None | str = '$UNCHANGED$', model: bool | None | str = '$UNCHANGED$', target: bool | None | str = '$UNCHANGED$') MulticlassHardDebias
Configure whether metadata should be requested to be passed to the
transformmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed totransformif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it totransform.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
copy (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
copyparameter intransform.ignore (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
ignoreparameter intransform.model (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
modelparameter intransform.target (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
targetparameter intransform.
- Returns:
self – The updated object.
- Return type:
- transform(model: WordEmbeddingModel, target: list[str] | None = None, ignore: list[str] | None = None, copy: bool = True) WordEmbeddingModel[source]
Execute Multiclass Hard Debias over the provided model.
- Parameters:
model (WordEmbeddingModel) – The word embedding model to debias.
target (Optional[List[str]], optional) – If a set of words is specified in target, the debias method will be performed only on the word embeddings of this set. If None is provided, the debias will be performed on all words (except those specified in ignore). Note that some words that are not in target may be modified due to the equalization process. by default None.
ignore (Optional[List[str]], optional) – If target is None and a set of words is specified in ignore, the debias method will perform the debias in all words except those specified in this set. Note that some words that are in ignore may be modified due to the equalization process. by default None.
copy (bool, optional) – If True, the debias will be performed on a copy of the model. If False, the debias will be applied on the same model delivered, causing its vectors to mutate. WARNING: Setting copy with True requires RAM at least 2x of the size of the model, otherwise the execution of the debias may raise to MemoryError, by default True.
- Returns:
The debiased embedding model.
- Return type: