evalscope/docs/en/experiments/benchmark/index.md

107 B

Benchmarking

Here are the benchmarking results for some models:

:::{toctree} :maxdepth: 1

mmlu.md :::