Overview of Augmenters#
The following tables list all the available augmenters in augmenty, along with a short description. It also contains list all of the labels which the augmentersrespects. For instance, if you wish to train a named entity recognition pipeline you should not use augmenters which do not respect entity labels. Similarly, a hint is also given to whether the augmenter is recommended for training or evaluation. Lastly, the package includes a list of references to any data or packages used as well as references to example application of the augmenter in practice.
Augmenter name |
Description |
Token |
Dependency parsing |
Entity |
Document |
Training |
Evaluation |
References |
---|---|---|---|---|---|---|---|---|
char_replace_random_v1 |
Creates an augmenter that replaces a character with a random character from the keyboard. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
char_replace_v1 |
Creates an augmenter that replaces a character with a random character from replace dict. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
char_swap_v1 |
Creates an augmenter that swaps two neighbouring characters in a token with a given probability. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
conditional_token_casing_v1 |
Creates an augmenter that conditionally cases the first letter of a token based on the getter. Either lower og upper needs to specifiedd as True. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
da_historical_noun_casing_v1 |
Creates an augmenter that capitalizes nouns. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
da_æøå_replace_v1 |
Creates an augmenter that augments æ, ø, and å into their spelling variants ae, oe, aa. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
duplicate_token_v1 |
Creates an augmenter that randomly duplicate a token token. |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
|
ents_format_v1 |
Creates an augmenter which reorders and formats a entity according to reordering and formatting functions. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
ents_replace_v1 |
Create an augmenter which replaces an entity based on a dictionary lookup. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
keystroke_error_v1 |
Creates a augmenter which augments a text with plausible typos based on keyboard distance. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
letter_spacing_augmenter_v1 |
Typically casing is used to add emphasis to words, but letter spacing has also been used to add e m p h a s i s to words (e.g. by Grundtvig; Baunvig, Jarvis and Nielbo, 2020). This augmenter randomly adds letter spacing emphasis to words. This augmentation which are human readable, but which are clearly challenging for systems using a white-space centric tokenization. |
✅ |
✅ |
✅ |
✅ |
✅ |
||
paragraph_subset_augmenter_v1 |
Create an augmenter that extracts a subset of a document. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
per_replace_v1 |
Create an augmenter which replaces a name (PER) with a news sampled from the names dictionary. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
random_casing_v1 |
Create an augment that randomly changes the casing of the document. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
random_starting_case_v1 |
Creates an augmenter which randomly cases the first letter in each token. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
random_synonym_insertion_v1 |
Creates an augmenter that randomly inserts a synonym or from the tokens context. The synonyms are based on wordnet. |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
|
remove_spacing_v1 |
Creates an augmenter that removes spacing with a given probability. |
✅ |
✅ |
✅ |
✅ |
✅ |
||
spacing_insertion_v1 |
Creates and augmneter that randomly adds a space after a chara cter. Tokens are kept the same. |
✅ |
✅ |
✅ |
✅ |
✅ |
||
spacy.combined_augmenter.v1 |
Create a data augmentation callback that uses orth-variant replacement. The callback can be added to a corpus or other data iterator during training. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
spacy.lower_case.v1 |
Create a data augmentation callback that converts documents to lowercase. The callback can be added to a corpus or other data iterator during training. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
spacy.orth_variants.v1 |
Create a data augmentation callback that uses orth-variant replacement. The callback can be added to a corpus or other data iterator during training. |
✅ |
✅ |
✅ |
✅ |
✅ |
||
spongebob_v1 |
Create an augmneter that converts documents to SpOnGeBoB casing. |
✅ |
✅ |
✅ |
✅ |
✅ |
||
token_dict_replace_v1 |
Creates an augmenter swaps a token with its synonym based on a dictionary. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
token_insert_random_v1 |
Creates an augmenter that randomly swaps two neighbouring tokens. |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
|
token_insert_v1 |
Creates an augmenter that randomly inserts a token generated based on a insert function. |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
|
token_replace_v1 |
Creates an augmenter which replaces a token based on a replace function. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
|
token_swap_v1 |
Creates an augmenter that randomly swaps two neighbouring tokens. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
Usage: Wei and Zau (2019) |
upper_case_v1 |
Create an augmenter that converts documents to uppercase. |
✅ |
✅ |
✅ |
✅ |
✅ |
||
word_embedding_v1 |
Creates an augmenter which replaces a token based on a replace function. |
✅ |
✅ |
✅ |
✅ |
✅ |
||
wordnet_synonym_v1 |
Creates an augmenter swaps a token with its synonym based on a dictionary. |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
Data: Miller (1998), Package: Steven (2006), Usage: Wei and Zau (2019) |