Getting started¶

This is a minimal guide on how to get started using SEB. If you feel like the documentation is lacking feel free to file an issue.

Using the CLI¶

SEB comes with a simple cli to allow you to run models. This section will show a minimal example of how to use the CLI but if you want to know more check out the CLI documentation. To get a list of available commands you can simply run:

In [1]:

Copied!

%%bash

seb --help
%%bash

seb --help

Available commands:

  run   Runs the Benchmark either on specified models or on all registered mod...

or for more on the specific command you can call seb {command} --help. To run a model using the CLI you can run it like so:

In [2]:

Copied!

%%bash
seb run -m all-MiniLM-L6-v2 --output-path model_results/
%%bash
seb run -m all-MiniLM-L6-v2 --output-path model_results/

INFO:seb.cli.run:Model registered in SEB. Loading from registry.
Running all-MiniLM-L6-v2:   0%|          | 0/1 [00:00<?, ?it/s]
Running all-MiniLM-L6-v2:   0%|          | 0/16 [00:00<?, ?it/s]
Running all-MiniLM-L6-v2 on Angry Tweets:   0%|          | 0/16 [00:00<?, ?it/s]
Running all-MiniLM-L6-v2 on LCC:   0%|          | 0/16 [00:00<?, ?it/s]         
Running all-MiniLM-L6-v2 on Bornholm Parallel:   0%|          | 0/16 [00:00<?, ?it/s]
Running all-MiniLM-L6-v2 on DKHate:   0%|          | 0/16 [00:00<?, ?it/s]           
Running all-MiniLM-L6-v2 on Da Political Comments:   0%|          | 0/16 [00:00<?, ?it/s]
Running all-MiniLM-L6-v2 on Massive Intent:   0%|          | 0/16 [00:00<?, ?it/s]       
Running all-MiniLM-L6-v2 on Massive Scenario:   0%|          | 0/16 [00:00<?, ?it/s]
Running all-MiniLM-L6-v2 on ScaLA:   0%|          | 0/16 [00:00<?, ?it/s]           
Running all-MiniLM-L6-v2 on Language Identification:   0%|          | 0/16 [00:00<?, ?it/s]
Running all-MiniLM-L6-v2 on NoReC:   0%|          | 0/16 [00:00<?, ?it/s]                  
Running all-MiniLM-L6-v2 on Norwegian parliament:   0%|          | 0/16 [00:00<?, ?it/s]
Running all-MiniLM-L6-v2 on VGSummarizationClustering:   0%|          | 0/16 [00:00<?, ?it/s]
Running all-MiniLM-L6-v2 on SweReC:   0%|          | 0/16 [00:00<?, ?it/s]                   
Running all-MiniLM-L6-v2 on DaLAJ:   0%|          | 0/16 [00:00<?, ?it/s] 
Running all-MiniLM-L6-v2 on SweFAQ:   0%|          | 0/16 [00:00<?, ?it/s]
Running all-MiniLM-L6-v2 on SwednClustering:   0%|          | 0/16 [00:00<?, ?it/s]
Running all-MiniLM-L6-v2: 100%|██████████| 1/1 [00:00<00:00, 25.99it/s]            
ERROR:seb.benchmark:Error when running VGSummarizationClustering on embed-multilingual-v3.0: Cache for embed-multilingual-v3.0 on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on embed-multilingual-v3.0: Cache for embed-multilingual-v3.0 on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on paraphrase-multilingual-MiniLM-L12-v2: Cache for paraphrase-multilingual-MiniLM-L12-v2 on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on paraphrase-multilingual-MiniLM-L12-v2: Cache for paraphrase-multilingual-MiniLM-L12-v2 on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on paraphrase-multilingual-mpnet-base-v2: Cache for paraphrase-multilingual-mpnet-base-v2 on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on paraphrase-multilingual-mpnet-base-v2: Cache for paraphrase-multilingual-mpnet-base-v2 on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on sentence-bert-swedish-cased: Cache for sentence-bert-swedish-cased on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on sentence-bert-swedish-cased: Cache for sentence-bert-swedish-cased on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on electra-small-nordic: Cache for electra-small-nordic on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on electra-small-nordic: Cache for electra-small-nordic on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on DanskBERT: Cache for DanskBERT on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on DanskBERT: Cache for DanskBERT on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on dfm-encoder-large-v1: Cache for dfm-encoder-large-v1 on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on dfm-encoder-large-v1: Cache for dfm-encoder-large-v1 on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on nb-bert-large: Cache for nb-bert-large on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on nb-bert-large: Cache for nb-bert-large on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on nb-bert-base: Cache for nb-bert-base on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on nb-bert-base: Cache for nb-bert-base on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on bert-base-swedish-cased: Cache for bert-base-swedish-cased on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on bert-base-swedish-cased: Cache for bert-base-swedish-cased on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on electra-small-swedish-cased-discriminator: Cache for electra-small-swedish-cased-discriminator on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on electra-small-swedish-cased-discriminator: Cache for electra-small-swedish-cased-discriminator on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on xlm-roberta-base: Cache for xlm-roberta-base on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on xlm-roberta-base: Cache for xlm-roberta-base on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on dfm-sentence-encoder-large-1: Cache for dfm-sentence-encoder-large-1 on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on dfm-sentence-encoder-large-1: Cache for dfm-sentence-encoder-large-1 on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on dfm-sentence-encoder-large-exp1: Cache for dfm-sentence-encoder-large-exp1 on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on dfm-sentence-encoder-large-exp1: Cache for dfm-sentence-encoder-large-exp1 on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on dfm-sentence-encoder-small-v1: Cache for dfm-sentence-encoder-small-v1 on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on dfm-sentence-encoder-small-v1: Cache for dfm-sentence-encoder-small-v1 on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on dfm-sentence-encoder-medium-v1: Cache for dfm-sentence-encoder-medium-v1 on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on dfm-sentence-encoder-medium-v1: Cache for dfm-sentence-encoder-medium-v1 on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on dfm-sentence-encoder-large-exp2-no-lang-align: Cache for dfm-sentence-encoder-large-exp2-no-lang-align on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on dfm-sentence-encoder-large-exp2-no-lang-align: Cache for dfm-sentence-encoder-large-exp2-no-lang-align on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on e5-small: Cache for e5-small on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on e5-small: Cache for e5-small on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on e5-base: Cache for e5-base on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on e5-base: Cache for e5-base on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on e5-large: Cache for e5-large on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on e5-large: Cache for e5-large on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on multilingual-e5-base: Cache for multilingual-e5-base on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on multilingual-e5-base: Cache for multilingual-e5-base on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on multilingual-e5-large: Cache for multilingual-e5-large on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on multilingual-e5-large: Cache for multilingual-e5-large on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on e5-mistral-7b-instruct: Cache for e5-mistral-7b-instruct on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on e5-mistral-7b-instruct: Cache for e5-mistral-7b-instruct on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on sonar-dan: Cache for sonar-dan on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on sonar-dan: Cache for sonar-dan on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on sonar-swe: Cache for sonar-swe on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on sonar-swe: Cache for sonar-swe on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on sonar-nob: Cache for sonar-nob on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on sonar-nob: Cache for sonar-nob on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on sonar-nno: Cache for sonar-nno on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on sonar-nno: Cache for sonar-nno on SwednClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running VGSummarizationClustering on text-embedding-ada-002: Cache for text-embedding-ada-002 on VGSummarizationClustering does not exist. Set run_model=True to run the model.
ERROR:seb.benchmark:Error when running SwednClustering on text-embedding-ada-002: Cache for text-embedding-ada-002 on SwednClustering does not exist. Set run_model=True to run the model.

                                      Benchmark Results                         
┏━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━┳━━┳━┳━━┳━┳━━┳━┳━━┳━┳━━┳━
┃      ┃                         ┃ Average ┃ Average ┃ ┃  ┃ ┃  ┃ ┃  ┃ ┃  ┃ ┃  ┃ 
┃ Rank ┃ Model                   ┃   Score ┃    Rank ┃ ┃  ┃ ┃  ┃ ┃  ┃ ┃  ┃ ┃  ┃ 
┡━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━╇━━╇━╇━━╇━╇━━╇━╇━━╇━╇━━╇━
│    1 │ multilingual-e5-small   │    0.53 │    9.72 │ │  │ │  │ │  │ │  │ │  │ 
│    2 │ NEW: all-MiniLM-L6-v2   │    0.40 │   22.12 │ │  │ │  │ │  │ │  │ │  │ 
│    3 │ embed-multilingual-v3.0 │     nan │    5.39 │ │  │ │  │ │  │ │  │ │  │ 
└──────┴─────────────────────────┴─────────┴─────────┴─┴──┴─┴──┴─┴──┴─┴──┴─┴──┴─

For how to run the benchmark on all models or only on a subset of tasks check out the documentation for the CLI.

Running a task¶

To run a task you will need to fetch the task amd a model run it.

In [3]:

Copied!

import seb

model = seb.get_model("jonfd/electra-small-nordic")
task = seb.get_task("DKHate")

# initialize benchmark with tasks
benchmark = seb.Benchmark(tasks=[task])

# benchmark the model
benchmark_result = benchmark.evaluate_model(model)
import seb

model = seb.get_model("jonfd/electra-small-nordic")
task = seb.get_task("DKHate")

# initialize benchmark with tasks
benchmark = seb.Benchmark(tasks=[task])

# benchmark the model
benchmark_result = benchmark.evaluate_model(model)

In [4]:

Copied!

benchmark_result  # examine output
benchmark_result  # examine output

Out[4]:

BenchmarkResults(meta=ModelMeta(name='electra-small-nordic', description=None, huggingface_name='jonfd/electra-small-nordic', reference='https://huggingface.co/jonfd/electra-small-nordic', languages=['da', 'nb', 'sv', 'nn'], open_source=True, embedding_size=256), task_results=[TaskResult(task_name='DKHate', task_description='Danish Tweets annotated for Hate Speech either being Offensive or not', task_version='1.0.3.dev0', time_of_run=datetime.datetime(2023, 7, 30, 13, 55, 38, 480327), scores={'da': {'accuracy': 0.5945288753799393, 'f1': 0.4912211182797449, 'ap': 0.8950480900418238, 'accuracy_stderr': 0.07818347662767612, 'f1_stderr': 0.05511334661624392, 'ap_stderr': 0.013877821318913264, 'main_score': 0.5945288753799393}}, main_score='accuracy')])

In [5]:

Copied!

benchmark_result[0]  # examine the results for the first task
benchmark_result[0]  # examine the results for the first task

Out[5]:

TaskResult(task_name='DKHate', task_description='Danish Tweets annotated for Hate Speech either being Offensive or not', task_version='1.0.3.dev0', time_of_run=datetime.datetime(2023, 7, 30, 13, 55, 38, 480327), scores={'da': {'accuracy': 0.5945288753799393, 'f1': 0.4912211182797449, 'ap': 0.8950480900418238, 'accuracy_stderr': 0.07818347662767612, 'f1_stderr': 0.05511334661624392, 'ap_stderr': 0.013877821318913264, 'main_score': 0.5945288753799393}}, main_score='accuracy')

Reproducing the Benchmark¶

Reproducing the benchmark is easy and is doable simply using the following command:

In [11]:

Copied!





models = [seb.get_model("all-MiniLM-L6-v2")]
# for simplicity, we will only run it with one model, but you could run it with multiple models:
# models = seb.get_all_models()

full_benchmark = seb.Benchmark()
results = benchmark.evaluate_models(models=models)
models = [seb.get_model("all-MiniLM-L6-v2")]
# for simplicity, we will only run it with one model, but you could run it with multiple models:
# models = seb.get_all_models()

full_benchmark = seb.Benchmark()
results = benchmark.evaluate_models(models=models)

Running all-MiniLM-L6-v2: 100%|██████████| 1/1 [00:00<00:00, 175.16it/s]

This runs the full benchmark on all the specified models as well as all the registrered datasets. Note that all benchmark results are cached as included as a part of the package, this means that you won't have to rerun results that are already run.

In [15]:

Copied!

mdl_result_on_benchmark = results[0]  # results for the first model

mdl_result_on_benchmark[0]  # results for the first task
mdl_result_on_benchmark = results[0]  # results for the first model

mdl_result_on_benchmark[0]  # results for the first task

Out[15]:

TaskResult(task_name='DKHate', task_description='Danish Tweets annotated for Hate Speech either being Offensive or not', task_version='1.1.0', time_of_run=datetime.datetime(2023, 7, 31, 15, 19, 48, 879189), scores={'da': {'accuracy': 0.5504559270516718, 'f1': 0.4487544754943351, 'ap': 0.8825715897823836, 'accuracy_stderr': 0.08179003177509295, 'f1_stderr': 0.04439449341359171, 'ap_stderr': 0.008146255235874632, 'main_score': 0.5504559270516718}}, main_score='accuracy')

Adding a model¶

The benchmark uses a registry to add models. A model in seb includes two thing. 1) a metadata object (seb.ModelMeta) describing the metadata of the model and 2) a loader for the model itself, which is an object that needs an encode methods as described by the seb.ModelInterface. Here is a minimal example of how to add a new model:

In [ ]:

Copied!





from sentence_transformers import SentenceTransformer
from typing import Any
import seb
import numpy as np


model_name = "sentence-transformers/all-MiniLM-L6-v2"


class MyEncoder(seb.Encoder):
    """
    A custom model for SEB that uses the SentenceTransformer library.
    """

    def __init__(self):
        self.model = SentenceTransformer(model_name)

    def encode(  # type: ignore
        self,
        sentences: list[str],
        *,
        task: seb.Task,
        **kwargs: Any,
    ) -> np.ndarray:
        if task.name == "DKHate":  # allow you to embed differently based on the task
            emb = self.model.encode(sentences, batch_size=32, **kwargs)
        else:
            emb = self.model.encode(sentences, batch_size=32, **kwargs)  # here we just do the same for all tasks
        return emb


@seb.models.register(model_name)  # add the model to the registry
def create_my_model() -> seb.SebModel:
    hf_name = model_name

    # create meta data
    meta = seb.ModelMeta(
        name=hf_name.split("/")[-1],
        huggingface_name=hf_name,
        reference="https://huggingface.co/{hf_name}",
        languages=[],
        embedding_size=384,
    )
    return seb.SebModel(
        encoder=MyEncoder(),
        meta=meta,
    )
from sentence_transformers import SentenceTransformer
from typing import Any
import seb
import numpy as np


model_name = "sentence-transformers/all-MiniLM-L6-v2"


class MyEncoder(seb.Encoder):
    """
    A custom model for SEB that uses the SentenceTransformer library.
    """

    def __init__(self):
        self.model = SentenceTransformer(model_name)

    def encode(  # type: ignore
        self,
        sentences: list[str],
        *,
        task: seb.Task,
        **kwargs: Any,
    ) -> np.ndarray:
        if task.name == "DKHate":  # allow you to embed differently based on the task
            emb = self.model.encode(sentences, batch_size=32, **kwargs)
        else:
            emb = self.model.encode(sentences, batch_size=32, **kwargs)  # here we just do the same for all tasks
        return emb


@seb.models.register(model_name)  # add the model to the registry
def create_my_model() -> seb.SebModel:
    hf_name = model_name

    # create meta data
    meta = seb.ModelMeta(
        name=hf_name.split("/")[-1],
        huggingface_name=hf_name,
        reference="https://huggingface.co/{hf_name}",
        languages=[],
        embedding_size=384,
    )
    return seb.SebModel(
        encoder=MyEncoder(),
        meta=meta,
    )

Note that if you want to use the CLI with one of your own added models you can import registrered functions from a file specified using the --code flag.