Document-based#

Document-based augmenters include all augmenters which augment the entire document. It includes augmenters such as changing whole document casing or subsetting documents into smaller parts.

augmenty.doc.casing#

augmenty.doc.casing.create_spongebob_augmenter_v1(level: float) Callable[[Language, Example], Iterator[Example]][source]#

Create an augmneter that converts documents to SpOnGeBoB casing.

Parameters:

level (float) – The percentage of examples that will be augmented.

Returns:

The augmenter.

Return type:

Callable[[Language, Example], Iterator[Example]]

Example

>>> import augmenty
>>> import spacy
>>> nlp = spacy.blank("en")
>>> spongebob_augmenter = augmenty.load("spongebob_v1", level=1)
>>> texts = ["A sample text"]
>>> list(augmenty.texts(texts, spongebob_augmenter, nlp))
["A SaMpLe tExT"]
augmenty.doc.casing.create_upper_casing_augmenter_v1(level: float) Callable[[Language, Example], Iterator[Example]][source]#

Create an augmenter that converts documents to uppercase.

Parameters:

level (float) – The percentage of examples that will be augmented.

Returns:

The augmenter.

Return type:

Callable[[Language, Example], Iterator[Example]]

Example

>>> import augmenty
>>> import spacy
>>> nlp = spacy.blank("en")
>>> upper_case_augmenter = augmenty.load("upper_case_v1", level=0.1)
>>> texts = ["A sample text"]
>>> list(augmenty.texts(texts, upper_case_augmenter, nlp))
["A SAMPLE TEXT"]

augmenty.doc.subset#

augmenty.doc.subset.create_paragraph_subset_augmenter_v1(min_paragraph: float | int = 1, max_paragraph: float | int = 1.0, respect_sentences: bool = True) Callable[[Language, Example], Iterator[Example]][source]#

Create an augmenter that extracts a subset of a document.

Parameters:
  • min_paragraph (Union[float, int]) – An float indicating the min percentage of the document to include or a float indicating the minimum number of paragraps to include (tokens in respect sentences is False). Defaults to 1, indicating at least one sentence.

  • max_paragraph (Union[float, int]) – An float indicating the max percentage of the document to include or a float indicating the maximum number of paragraps to include (tokens in respect sentences is False). Defaults to 1.00 indicating 100%.

  • respect_sentences (bool) – should the augmenter respect sentence bounderies? Defaults to True.

Returns:

The augmenter.

Return type:

Callable[[Language, Example], Iterator[Example]]

Example

>>> import augmenty
>>> import spacy
>>> nlp = spacy.blank("en")
>>> nlp.add_pipe("sentencizer")
>>> upper_case_augmenter = augmenty.load("sent_subset_v1", level=0.7)
>>> text = "Augmenty is a wonderful tool for augmentation. " +
>>>   "It have tons of different augmenters. " +
>>>   " Augmenty is developed using spaCy."
>>> list(augmenty.texts([text], upper_case_augmenter, nlp))
["Augmenty is a wonderful tool for augmentation. Augmenty is developed using
spaCy."]