wefe.utils.run_queries

wefe.utils.run_queries(metric: Type[BaseMetric], queries: List[Query], models: List[WordEmbeddingModel], queries_set_name: str = 'Unnamed queries set', lost_vocabulary_threshold: float = 0.2, metric_params: dict = {}, generate_subqueries: bool = False, aggregate_results: bool = False, aggregation_function: Union[str, Callable] = 'abs_avg', return_only_aggregation: bool = False, warn_not_found_words: bool = False) DataFrame[source]

Run several queries over a several word embedding models using a specific metic.

Parameters:
metricType[BaseMetric]

A metric class.

querieslist

An iterable with a set of queries.

word_embeddings_modelslist

An iterable with a set of word embedding pretrianed models.

queries_set_namestr, optional

The name of the set of queries or the criteria that will be tested, by default ‘Unnamed queries set’

lost_vocabulary_thresholdfloat, optional

The threshold that will be passed to the , by default 0.2

metric_paramsdict, optional

A dict with custom params that will passed to run_query method of the respective metric, by default {}

generate_subqueries: bool, optional

It indicates if the program, when detecting queries with a bigger template than the metric, should try to generate subqueries compatible with it. If any query is compatible with the metric template, then it appends the same query. DANGER: This may cause some comparisons to become meaningless when comparing biases that are not compatible with each other. By default, False.

aggregate_resultsbool, optional

A boolean that indicates if the results must be aggregated with some function.

aggregation_functionUnion[str, Callable], optional

The function that will be applied row by row to add the results. It must be pandas row compatible operation. Implemented functions: ‘sum’, ‘abs_sub’, ‘avg’ and ‘abs_avg’, by default ‘abs_avg’.

return_only_aggregationbool, optional

If return_only_aggregation is True, only the column with the added queries is returned, by default False.

Returns:
pd.DataFrame

A dataframe with the results. The index contains the word embedding model name and the columns the experiment name. Each cell represents the result of run a metric using a specific word embedding model and query.