Measurement Framework

Below we present the main aspects of the measurement framework developed at WEFE.

Note

If you want to see tutorials on how to apply queries, visit Bias Measurement in the User Guide.

Target set

A target word set (denoted by \(T\)) corresponds to a set of words intended to denote a particular social group,which is defined by a certain criterion. This criterion can be any character, trait or origin that distinguishes groups of people from each other e.g., gender, social class, age, and ethnicity. For example, if the criterion is gender we can use it to distinguish two groups, women and men. Then, a set of target words representing the social group “women” could contain words like ‘she’, ‘woman’, ‘girl’, etc. Analogously a set of target words the representing the social group ‘men’ could include ‘he’, ‘man’, ‘boy’, etc.

Attribute set

An attribute word set (denoted by \(A\)) is a set of words representing some attitude, characteristic, trait, occupational field, etc. that can be associated with individuals from any social group. For example, the set of science attribute words could contain words such as ‘technology’, ‘physics’, ‘chemistry’, while the art attribute words could have words like ‘poetry’, ‘dance’, ‘literature’.

Query

Queries are the main building blocks used by fairness metrics to measure bias of word embedding models. Formally, a query is a pair \(Q=(\mathcal{T},\mathcal{A})\) in which \(T\) is a set of target word sets, and \(A\) is a set of attribute word sets. For example, consider the target word sets:

\[\begin{split}\begin{eqnarray*} T_{\text{women}} & = & \{{she},{woman},{girl}, \ldots\}, \\ T_{\text{men}} & = & \{{he},{man},{boy}, \ldots\}, \end{eqnarray*}\end{split}\]

and the attribute word sets

\[\begin{split}\begin{eqnarray*} A_{\text{science}} & = & \{{math},{physics},{chemistry}, \ldots\}, \\ A_{\text{art}} & = & \{{poetry},{dance},{literature}, \ldots\}. \end{eqnarray*}\end{split}\]

Then the following is a query in our framework

\[\begin{equation} Q=(\{T_{\text{women}}, T_{\text{men}}\},\{A_{\text{science}},A_{\text{art}}\}). \end{equation}\]

When a set of queries \(\mathcal{Q} = {Q_1, Q_2, \dots, Q_n}\) is intended to measure a single type of bias, we say that the set has a Bias Criterion. Examples of bias criteria are gender, ethnicity, religion, politics, social class, among others.

Warning

To accurately study the biases contained in word embeddings, queries may contain words that could be offensive to certain groups or individuals. The relationships studied between these words DO NOT represent the ideas, thoughts or beliefs of the authors of this library. This warning applies to all documentation.

Query Template

A query template is simply a pair \((t,a)\in\mathbb{N}\times\mathbb{N}\). We say that query \(Q=(\mathcal{T},\mathcal{A})\) satisfies a template \((t,a)\) if \(|\mathcal{T}|=t\) and \(|\mathcal{A}|=a\).

Fairness Measure

A fairness metric is a function that quantifies the degree of association between target and attribute words in a word embedding model. In our framework, every fairness metric is defined as a function that has a query and a model as input, and produces a real number as output.

Several fairness metrics have been proposed in the literature. But not all of them share a common input template for queries. Thus, we assume that every fairness metric comes with a template that essentially defines the shape of the input queries supported by the metric.

Formally, let \(F\) be a fairness metric with template \(s_F=(t_F,a_F)\). Given an embedding model \(\mathbf{M}\) and a query \(Q\) that satisfies \(s_F\), the metric produces the value \(F(\mathbf{M},Q)\in \mathbb{R}\) that quantifies the degree of bias of \(\mathbf{M}\) with respect to query \(Q\).

Standard usage pattern of WEFE

The following flow chart shows how to perform a bias measurement using a gender query, word2vec embeddings and the WEAT metric.

To see the implementation of this query using WEFE, refer to the Quick start section.

Metrics Implemented So Far

WEFE implements the following bias measurement metrics:

Word Embedding Association Test (WEAT)
Relative Norm Distance (RND)
Relative Negative Sentiment Bias (RNSB)
Mean Average Cosine Similarity (MAC)
Embedding Coherence Test (ECT)
Relational Inner Product Association (RIPA)