Span-based#
augmenty.span.entities#
- augmenty.span.entities.create_ent_augmenter_v1(level: float, ent_dict: Dict[str, Iterable[List[str]]], replace_consistency: bool = True, resolve_dependencies: bool = True) Callable[[Language, Example], Iterator[Example]] [source]#
Create an augmenter which replaces an entity based on a dictionary lookup.
- Parameters:
level (float) – the percentage of entities to be augmented.
ent_dict (Dict[str, Iterable[List[str]]]) – A dictionary with keys corresponding the the entity type you wish to replace (e.g. “PER”) and a itarable of the replacements. A replacement is a list of string of the desired entity replacement [“Kenneth”, “Enevoldsen”].
replace_consistency (bool, optional) – Should an entity always be replaced with the same entity? Defaults to True.
resolve_dependencies (bool, optional) – Attempts to resolve the dependency tree by setting head of the original entitity as
as (the head of the first token in the new entity. The remainder is the passed) –
- Returns:
The augmenter
- Return type:
Callable[[Language, Example], Iterator[Example]]
Example
>>> ent_dict = {"ORG": [["Google"], ["Apple"]], >>> "PERSON": [["Kenneth"], ["Lasse", "Hansen"]]} >>> # augment 10% of names >>> ent_augmenter = create_ent_augmenter(ent_dict, level = 0.1)
- augmenty.span.entities.create_ent_format_augmenter_v1(reordering: List[int | None], formatter: List[Callable[[Token], str] | None], level: float, ent_types: List[str] | None = None) Callable[[Language, Example], Iterator[Example]] [source]#
Creates an augmenter which reorders and formats a entity according to reordering and formatting functions.
- Parameters:
reordering (List[Union[int, None]]) – A reordering consisting of a the desired order of the list of indices, where None denotes the remainder. For instance if this function was solely used on names [-1, None] indicate last name (the last token in the name) followed by the remainder of the name. Similarly one could more use the reordering [3, 1, 2] e.g. indicating last name, first name, middle name. Note that if the entity only include two tokens the 3 will be ignored producing the pattern [1, 2].
formatter (List[Union[Callable[[Token], str], None]]) – A list of function taking in a spaCy Token returning the reformatted str. E.g. the function lambda token: token.text[0] + “.” would abbreviate the token and add punctuation. None corresponds to no augmentation.
level (float) – The probability of an entities being augmented.
ent_types (Optional[Iterable[str]], optional) – The entity types which should be augmented. Defaults to None, indicating all entity types.
- Returns:
The augmenter
- Return type:
Callable[[Language, Example], Iterator[Example]]
Example
>>> import augmenty >>> import spacy >>> nlp = spacy.load("en_core_web_sm") >>> abbreviate = lambda token: token.text[0] + "." >>> augmenter = augmenty.load("ents_format_v1", reordering = [-1, None], >>> formatter=[None, abbreviate], level=1, >>> ent_types=["PER"]) >>> texts = ["my name is Kenneth Enevoldsen"] >>> list(augmenty.texts(texts, augmenter, nlp)) ["my name is Enevoldsen K."]
- augmenty.span.entities.create_per_replace_augmenter_v1(names: Dict[str, List[str]], patterns: List[List[str]], level: float, names_p: Dict[str, List[float]] = {}, patterns_p: List[float] | None = None, replace_consistency: bool = True, person_tag: str = 'PERSON') Callable[[Language, Example], Iterator[Example]] [source]#
Create an augmenter which replaces a name (PER) with a news sampled from the names dictionary.
- Parameters:
names (Dict[str, List[str]]) – A dictionary of list of names to sample from. These could for example include first name and last names.
pattern (List[List[str]]) – The pattern to create the names. This should be a list of patterns. Where a pattern is a list of strings, where the string denote the list in the names dictionary in which to sample from.
level (float) – The proportion of PER entities to replace.
names_p (Dict[str, List[float]], optional) – The probability to sample each name. Defaults to {}, indicating equal probability for each name.
patterns_p (Optional[List[float]], optional) – The probability to sample each pattern. Defaults to None, indicating equal probability for each pattern.
replace_consistency (bool, optional) – Should the entity always be replaced with the same entity? Defaults to True.
person_tag (str, optional) – The tag of the person entity. Defaults to “PERSON”. However it should be noted that much such as the Danish spacy model uses “PER” instead.
- Returns:
The augmenter
- Return type:
Callable[[Language, Example], Iterator[Example]]
Example
>>> names = {"firstname": ["Kenneth", "Lasse"], >>> "lastname": ["Enevoldsen", "Hansen"]} >>> patterns = [["firstname"], ["firstname", "lastname"], >>> ["firstname", "firstname", "lastname"]] >>> person_tag = "PERSON" >>> # replace 10% of names: >>> per_augmenter = create_per_replace_augmenter(names, patterns, level=0.1, >>> person_tag=person_tag)