Document-based#
Document-based augmenters include all augmenters which augment the entire document. It includes augmenters such as changing whole document casing or subsetting documents into smaller parts.
augmenty.doc.casing#
- augmenty.doc.casing.create_spongebob_augmenter_v1(level: float) Callable[[Language, Example], Iterator[Example]] [source]#
Create an augmneter that converts documents to SpOnGeBoB casing.
- Parameters:
level (float) – The percentage of examples that will be augmented.
- Returns:
The augmenter.
- Return type:
Callable[[Language, Example], Iterator[Example]]
Example
>>> import augmenty >>> import spacy >>> nlp = spacy.blank("en") >>> spongebob_augmenter = augmenty.load("spongebob_v1", level=1) >>> texts = ["A sample text"] >>> list(augmenty.texts(texts, spongebob_augmenter, nlp)) ["A SaMpLe tExT"]
- augmenty.doc.casing.create_upper_casing_augmenter_v1(level: float) Callable[[Language, Example], Iterator[Example]] [source]#
Create an augmenter that converts documents to uppercase.
- Parameters:
level (float) – The percentage of examples that will be augmented.
- Returns:
The augmenter.
- Return type:
Callable[[Language, Example], Iterator[Example]]
Example
>>> import augmenty >>> import spacy >>> nlp = spacy.blank("en") >>> upper_case_augmenter = augmenty.load("upper_case_v1", level=0.1) >>> texts = ["A sample text"] >>> list(augmenty.texts(texts, upper_case_augmenter, nlp)) ["A SAMPLE TEXT"]
augmenty.doc.subset#
- augmenty.doc.subset.create_paragraph_subset_augmenter_v1(min_paragraph: float | int = 1, max_paragraph: float | int = 1.0, respect_sentences: bool = True) Callable[[Language, Example], Iterator[Example]] [source]#
Create an augmenter that extracts a subset of a document.
- Parameters:
min_paragraph (Union[float, int]) – An float indicating the min percentage of the document to include or a float indicating the minimum number of paragraps to include (tokens in respect sentences is False). Defaults to 1, indicating at least one sentence.
max_paragraph (Union[float, int]) – An float indicating the max percentage of the document to include or a float indicating the maximum number of paragraps to include (tokens in respect sentences is False). Defaults to 1.00 indicating 100%.
respect_sentences (bool) – should the augmenter respect sentence bounderies? Defaults to True.
- Returns:
The augmenter.
- Return type:
Callable[[Language, Example], Iterator[Example]]
Example
>>> import augmenty >>> import spacy >>> nlp = spacy.blank("en") >>> nlp.add_pipe("sentencizer") >>> upper_case_augmenter = augmenty.load("sent_subset_v1", level=0.7) >>> text = "Augmenty is a wonderful tool for augmentation. " + >>> "It have tons of different augmenters. " + >>> " Augmenty is developed using spaCy." >>> list(augmenty.texts([text], upper_case_augmenter, nlp)) ["Augmenty is a wonderful tool for augmentation. Augmenty is developed using spaCy."]