5 Best Ways to Classify Emotions Using NRC Lexicon in Python

πŸ’‘ Problem Formulation: Emotion classification is the process of associating words with emotions, which can be crucial for sentiment analysis and human-computer interaction. The challenge is to accurately categorize text data into specific emotions using a lexicon such as the NRC Emotion Lexicon in Python. For instance, given the input ‘I love sunny days’, the desired output might categorize the feeling as ‘Joy’.

Method 1: NRC Lexicon with TextBlob

The TextBlob library can be extended to classify emotions by using the NRC lexicon. This method involves feeding words into TextBlob and cross-referencing each word with the NRC lexicon to classify emotions.

Here’s an example:

from textblob import TextBlob
from nrclex import NRCLex

text = "I am ecstatic about the concert tonight!"
emotion = NRCLex(TextBlob(text).string)

print(emotion.top_emotions)

Output: [(‘joy’, 0.5)]

This Python code snippet creates a TextBlob object from the input text and initializes an NRCLex object with this text. The NRCLex.top_emotions attribute is then used to return the top emotions associated with the input text.

Method 2: NRC Lexicon with NLTK

NLTK, a leading platform for building Python programs to work with human language data, can also be paired with the NRC lexicon for emotion classification. The methodology entails tokenizing the text and mapping each token to an emotion from the lexicon.

Here’s an example:

import nltk
from nltk.corpus import stopwords
from nrclex import NRCLex

nltk.download('punkt')
nltk.download('stopwords')

text = "I am terribly disappointed with the service."
tokens = nltk.word_tokenize(text)
filtered_words = [word for word in tokens if word not in stopwords.words('english')]

emotion = NRCLex(' '.join(filtered_words))

print(emotion.top_emotions)

Output: [(‘anger’, 0.3333333333333333), (‘negative’, 0.3333333333333333), (‘sadness’, 0.3333333333333333)]

After downloading necessary NLTK data, the snippet tokenizes the input text, removes stopwords, and then utilizes NRCLex to classify the remaining words’ emotions.

Method 3: Custom Function with NRC Lexicon

Building a custom function to parse the NRC lexicon can give you more control over the emotion classification process. This method requires manually loading the lexicon and writing a function that categorizes the input text.

Here’s an example:

from collections import Counter

def load_nrc_emotions(filepath):
    with open(filepath, 'r') as file:
        lexicon = {}
        for line in file.readlines():
            word, emotion, value = line.strip().split('\t')
            if word not in lexicon:
                lexicon[word] = {}
            lexicon[word][emotion] = int(value)
        return lexicon

def classify_emotions(text, lexicon):
    words = text.lower().split()
    emotions = Counter()
    for word in words:
        if word in lexicon:
            for emotion, value in lexicon[word].items():
                if value > 0:
                    emotions[emotion] += 1
    return emotions.most_common()

nrc_lexicon = load_nrc_emotions('nrc_emotion_lexicon.txt')
print(classify_emotions("I'm thrilled to see the fireworks tonight!", nrc_lexicon))

Output: [(‘anticipation’, 1), (‘joy’, 1), (‘positive’, 1), (‘surprise’, 1)]

The custom function load_nrc_emotions() reads the NRC lexicon and classify_emotions() analyzes the text for emotion classification. This example demonstrates its usage with sample text.

Method 4: NRC Lexicon with Pandas

Pandas can be employed for emotion classification by loading the NRC lexicon into a DataFrame and using it to label the text. This is effective when working with large datasets or alongside other data analysis tasks.

Here’s an example:

import pandas as pd

lexicon_df = pd.read_csv('nrc_emotion_lexicon.csv')
text = "The quarantine period was incredibly difficult and isolating."

def get_emotion_counts(text, lexicon_df):
    emotions = {}
    for word in text.split():
        if word in lexicon_df.values:
            word_emotions = lexicon_df[lexicon_df['word'] == word]
            for index, row in word_emotions.iterrows():
                emotion = row['emotion']
                if emotion not in emotions:
                    emotions[emotion] = 0
                emotions[emotion] += 1
    return emotions

print(get_emotion_counts(text, lexicon_df))

Output: {‘sadness’: 1, ‘fear’: 1, ‘negative’: 2}

The Pandas DataFrame lexicon_df contains the NRC lexicon, and the function get_emotion_counts() computes the emotion counts for the input text based on this DataFrame.

Bonus One-Liner Method 5: Compact NRCLex Usage

For quick emotion classification, NRCLex can be used in a one-liner fashion, which is convenient for a fast analysis or scripting.

Here’s an example:

print(NRCLex('Feeling sad and miserable.').top_emotions)

Output: [(‘sadness’, 0.5), (‘negative’, 0.5)]

This one-liner uses the NRCLex library to directly classify emotions of the given text, returning the most prominent emotions associated with it.

Summary/Discussion

  • Method 1: TextBlob. Leverages TextBlob’s easy text manipulation. Limited by TextBlob’s features and possibly slower with large data sets.
  • Method 2: NLTK. Combines NLTK’s robust language processing tools with NRC lexicon for detailed analysis. More steps required to process text.
  • Method 3: Custom Function. Provides full control and transparency. More complex and requires understanding of lexicon structure.
  • Method 4: Pandas. Ideal for data analysis-oriented projects and integrates smoothly with other data processing. May be overkill for small scripts.
  • Bonus Method 5: NRCLex One-Liner. Fast and straightforward. Good for quick checks but lacks detailed customization.