Span-based#

augmenty.span.entities#

augmenty.span.entities.create_ent_augmenter_v1(level: float, ent_dict: Dict[str, Iterable[List[str]]], replace_consistency: bool = True, resolve_dependencies: bool = True) Callable[[Language, Example], Iterator[Example]][source]#

Create an augmenter which replaces an entity based on a dictionary lookup.

Parameters:
  • level (float) – the percentage of entities to be augmented.

  • ent_dict (Dict[str, Iterable[List[str]]]) – A dictionary with keys corresponding the the entity type you wish to replace (e.g. “PER”) and a itarable of the replacements. A replacement is a list of string of the desired entity replacement [“Kenneth”, “Enevoldsen”].

  • replace_consistency (bool, optional) – Should an entity always be replaced with the same entity? Defaults to True.

  • resolve_dependencies (bool, optional) – Attempts to resolve the dependency tree by setting head of the original entitity as

  • as (the head of the first token in the new entity. The remainder is the passed) –

Returns:

The augmenter

Return type:

Callable[[Language, Example], Iterator[Example]]

Example

>>> ent_dict = {"ORG": [["Google"], ["Apple"]],
>>>             "PERSON": [["Kenneth"], ["Lasse", "Hansen"]]}
>>> # augment 10% of names
>>> ent_augmenter = create_ent_augmenter(ent_dict, level = 0.1)
augmenty.span.entities.create_ent_format_augmenter_v1(reordering: List[int | None], formatter: List[Callable[[Token], str] | None], level: float, ent_types: List[str] | None = None) Callable[[Language, Example], Iterator[Example]][source]#

Creates an augmenter which reorders and formats a entity according to reordering and formatting functions.

Parameters:
  • reordering (List[Union[int, None]]) – A reordering consisting of a the desired order of the list of indices, where None denotes the remainder. For instance if this function was solely used on names [-1, None] indicate last name (the last token in the name) followed by the remainder of the name. Similarly one could more use the reordering [3, 1, 2] e.g. indicating last name, first name, middle name. Note that if the entity only include two tokens the 3 will be ignored producing the pattern [1, 2].

  • formatter (List[Union[Callable[[Token], str], None]]) – A list of function taking in a spaCy Token returning the reformatted str. E.g. the function lambda token: token.text[0] + “.” would abbreviate the token and add punctuation. None corresponds to no augmentation.

  • level (float) – The probability of an entities being augmented.

  • ent_types (Optional[Iterable[str]], optional) – The entity types which should be augmented. Defaults to None, indicating all entity types.

Returns:

The augmenter

Return type:

Callable[[Language, Example], Iterator[Example]]

Example

>>> import augmenty
>>> import spacy
>>> nlp = spacy.load("en_core_web_sm")
>>> abbreviate = lambda token: token.text[0] + "."
>>> augmenter = augmenty.load("ents_format_v1", reordering = [-1, None],
>>>                           formatter=[None, abbreviate], level=1,
>>>                            ent_types=["PER"])
>>> texts = ["my name is Kenneth Enevoldsen"]
>>> list(augmenty.texts(texts, augmenter, nlp))
["my name is Enevoldsen K."]
augmenty.span.entities.create_per_replace_augmenter_v1(names: Dict[str, List[str]], patterns: List[List[str]], level: float, names_p: Dict[str, List[float]] = {}, patterns_p: List[float] | None = None, replace_consistency: bool = True, person_tag: str = 'PERSON') Callable[[Language, Example], Iterator[Example]][source]#

Create an augmenter which replaces a name (PER) with a news sampled from the names dictionary.

Parameters:
  • names (Dict[str, List[str]]) – A dictionary of list of names to sample from. These could for example include first name and last names.

  • pattern (List[List[str]]) – The pattern to create the names. This should be a list of patterns. Where a pattern is a list of strings, where the string denote the list in the names dictionary in which to sample from.

  • level (float) – The proportion of PER entities to replace.

  • names_p (Dict[str, List[float]], optional) – The probability to sample each name. Defaults to {}, indicating equal probability for each name.

  • patterns_p (Optional[List[float]], optional) – The probability to sample each pattern. Defaults to None, indicating equal probability for each pattern.

  • replace_consistency (bool, optional) – Should the entity always be replaced with the same entity? Defaults to True.

  • person_tag (str, optional) – The tag of the person entity. Defaults to “PERSON”. However it should be noted that much such as the Danish spacy model uses “PER” instead.

Returns:

The augmenter

Return type:

Callable[[Language, Example], Iterator[Example]]

Example

>>> names = {"firstname": ["Kenneth", "Lasse"],
>>>          "lastname": ["Enevoldsen", "Hansen"]}
>>> patterns = [["firstname"], ["firstname", "lastname"],
>>>             ["firstname", "firstname", "lastname"]]
>>> person_tag = "PERSON"
>>> # replace 10% of names:
>>> per_augmenter = create_per_replace_augmenter(names, patterns, level=0.1,
>>>                                              person_tag=person_tag)