Document-based#

Document-based augmenters include all augmenters which augment the entire document. It includes augmenters such as changing whole document casing or subsetting documents into smaller parts.

augmenty.doc.casing#

augmenty.doc.casing.create_spongebob_augmenter_v1(level: float) Callable[[Language, Example], Iterator[Example]][source]#

Create an augmneter that converts documents to SpOnGeBoB casing.

Parameters:

level – The percentage of examples that will be augmented.

Returns:

The augmenter.

Example

>>> import augmenty
>>> import spacy
>>> nlp = spacy.blank("en")
>>> spongebob_augmenter = augmenty.load("spongebob_v1", level=1)
>>> texts = ["A sample text"]
>>> list(augmenty.texts(texts, spongebob_augmenter, nlp))
["A SaMpLe tExT"]
augmenty.doc.casing.create_upper_casing_augmenter_v1(level: float) Callable[[Language, Example], Iterator[Example]][source]#

Create an augmenter that converts documents to uppercase.

Parameters:

level – The percentage of examples that will be augmented.

Returns:

The augmenter.

Example

>>> import augmenty
>>> import spacy
>>> nlp = spacy.blank("en")
>>> upper_case_augmenter = augmenty.load("upper_case_v1", level=0.1)
>>> texts = ["A sample text"]
>>> list(augmenty.texts(texts, upper_case_augmenter, nlp))
["A SAMPLE TEXT"]

augmenty.doc.subset#

augmenty.doc.subset.create_paragraph_subset_augmenter_v1(min_paragraph: Union[float, int] = 1, max_paragraph: Union[float, int] = 1.0, respect_sentences: bool = True) Callable[[Language, Example], Iterator[Example]][source]#

Create an augmenter that extracts a subset of a document.

Parameters:
  • min_paragraph – An float indicating the min percentage of the document to include or a float indicating the minimum number of paragraps to include (tokens in respect sentences is False). E.g. 1, indicates at least one sentence.

  • max_paragraph – An float indicating the max percentage of the document to include or a float indicating the maximum number of paragraps to include (tokens in respect sentences is False). E.g. 1.00 indicates 100%.

  • respect_sentences – should the augmenter respect sentence bounderies?

Returns:

The augmenter.

Example

>>> import augmenty
>>> import spacy
>>> nlp = spacy.blank("en")
>>> nlp.add_pipe("sentencizer")
>>> augmenter = augmenty.load("sent_subset_v1", level=0.7)
>>> text = "Augmenty is a wonderful tool for augmentation. " +
>>>   "It have tons of different augmenters. " +
>>>   " Augmenty is developed using spaCy."
>>> list(augmenty.texts([text], augmenter, nlp))
["Augmenty is a wonderful tool for augmentation. Augmenty is developed using
spaCy."]