Analysis of a speech#

This pipeline is an educational example of how one could analyse a text using Asent. For this analysis we use a single speech from Trump.

import spacy
import asent

def load_speech():
    file_path = "trump_speech.txt"

    with open(file_path, "r") as f:
        speech = f.read()

    return speech

speech = load_speech()

print(speech[:200]) # examine the first 200 characters

Well, thank you very much. And good afternoon.

As President, my highest and most solemn duty is the defense of our nation and its citizens.

Last night, at my direction, the United States military su

To analyse the text we will need a spacy pipeline with the asent component and a sentencizer:

# create spacy pipeline:
nlp = spacy.blank("en")
nlp.add_pipe("sentencizer")
nlp.add_pipe("asent_en_v1")

doc = nlp(speech)  # process document
sentences = [sent for sent in doc.sents]  # extract sentences

Examining a sentence#

for i in range(5):
    asent.visualize(sentences[i])

Well 1.1 , thank 1.5 you very much .

And good 1.9 afternoon .

As President , my highest and most -0.6 solemn duty is the defense 0.5 of our nation and its citizens .

Last night , at my direction , the United 1.8 States military successfully 2.2 executed a flawless 2.3 precision strike -0.5 that killed -3.5 the number 0.3 - one terrorist -3.7 anywhere in the world , Qasem Soleimani .

Soleimani was plotting imminent and sinister -2.9 attacks -1.9 on American diplomats and military personnel , but we caught him in the act and terminated him .

If we want to take a closer look as a specific sentence we can use the analysis visualization.

asent.visualize(sentences[2], "analysis")

Already here we get quite a lot of information? Which opens up for a bunch of questions:

Should we update the model with new words?
Should we remove some words?
Could we do other things to improve the analysis? E.g. what about “United” in the United States? Should that be positive?

Extracting polarities#

To extract the polarity from a sentence we can simply use the ._.polarity attribute to get teh polarity of the sentence:

sentences[2]._.polarity

SpanPolarityOutput(neg=0.076, neu=0.853, pos=0.071, compound=-0.024, span=

As President, my highest and most solemn duty is the defense of our nation and its citizens.)

Then assuming we want the compound (average) polarity we can simply extract it as so:

# extracting polarity from one sentence:
compound_polarity = sentences[2]._.polarity.compound
compound_polarity

-0.024005576936002263

For all the documents we can thus do:

# Polarity pr. sentence
polarities = [sent._.polarity.compound for sent in sentences]

And we can then plot the polarities using matplotlib:

# Plot polarities
import matplotlib.pyplot as plt
plt.plot(polarities)
plt.title("Polarity of Trump's speech")
plt.xlabel("Chronological Order (sentence)")
plt.ylabel("Polarity")

Text(0, 0.5, 'Polarity')

../_images/e734592b0d7774e879201d13386f8eb01c92c2360d918d8a9b0848a69b33ee55.png

Analysis#

What do we see from this plot? We see that the Trumps speech consistently varies from positive to negative. For example:

“Under my leadership, America’s policy is unambiguous: To terrorists who harm or intend to harm any American, we will find you; we will eliminate you. We will always protect our diplomats, service members, all Americans, and our allies.”

Does this match with your subjective reading of the speech?

Other analyses we could take a look is:

extraction of positive/negative words
comparison of sentiment across documents
What is spoken positive or negative about? Can we use the Dependency tree?
Error analysis af lexicon

Improving the plot (optional)#

In the following I do a bit of work to improve the visualization.

# Add a horizontal line at y=0 to indicate neutrality
plt.axhline(y=0, color='gray', linestyle='--', linewidth=1)


# Plot polarities with a line style and color
plt.plot(polarities, color='steelblue', linestyle='-', linewidth=2)

# Improve the title and axis labels with font size adjustments and clarity
plt.title("Polarity of Trump's Speech", fontsize=14, fontweight='bold')
plt.xlabel("Chronological Order (Sentence)", fontsize=12)
plt.ylabel("Polarity", fontsize=12)

# Adding grid for better readability
plt.grid(True, which='both', linestyle='--', linewidth=0.5)

# Improve the tick marks for better readability
plt.xticks(fontsize=10)
plt.yticks(fontsize=10)


# Show the plot with the enhancements
plt.show()

../_images/b998c74f2974417855d493ccec568f56ee62302fe5a0932fc688a421db173bc8.png