wefe.utils
.run_queries
- wefe.utils.run_queries(metric: Type[BaseMetric], queries: List[Query], models: List[WordEmbeddingModel], queries_set_name: str = 'Unnamed queries set', lost_vocabulary_threshold: float = 0.2, metric_params: dict = {}, generate_subqueries: bool = False, aggregate_results: bool = False, aggregation_function: Union[str, Callable] = 'abs_avg', return_only_aggregation: bool = False, warn_not_found_words: bool = False) DataFrame [source]
Run several queries over a several word embedding models using a specific metic.
- Parameters:
- metricType[BaseMetric]
A metric class.
- querieslist
An iterable with a set of queries.
- word_embeddings_modelslist
An iterable with a set of word embedding pretrianed models.
- queries_set_namestr, optional
The name of the set of queries or the criteria that will be tested, by default ‘Unnamed queries set’
- lost_vocabulary_thresholdfloat, optional
The threshold that will be passed to the , by default 0.2
- metric_paramsdict, optional
A dict with custom params that will passed to run_query method of the respective metric, by default {}
- generate_subqueries: bool, optional
It indicates if the program, when detecting queries with a bigger template than the metric, should try to generate subqueries compatible with it. If any query is compatible with the metric template, then it appends the same query. DANGER: This may cause some comparisons to become meaningless when comparing biases that are not compatible with each other. By default, False.
- aggregate_resultsbool, optional
A boolean that indicates if the results must be aggregated with some function.
- aggregation_functionUnion[str, Callable], optional
The function that will be applied row by row to add the results. It must be pandas row compatible operation. Implemented functions: ‘sum’, ‘abs_sub’, ‘avg’ and ‘abs_avg’, by default ‘abs_avg’.
- return_only_aggregationbool, optional
If return_only_aggregation is True, only the column with the added queries is returned, by default False.
- Returns:
- pd.DataFrame
A dataframe with the results. The index contains the word embedding model name and the columns the experiment name. Each cell represents the result of run a metric using a specific word embedding model and query.