NLP DATA wrangling previous cont.

1 min readSep 23, 2021

we have lemmatized the word but you may have asked isn’t that difficult working with such cases where you yourself have to set the tag.

if you thought same you are in right place otherwise try reasoning and start assking why to sharpen our mind throw your phone from your hand its a waste.

9>> Rise of solution

The sentence needs to be converted into POS(parts of speech tags)

Code

nltk.pos_tags(tokens)

[(‘The’, ‘DT’), (‘brown’, ‘’JJ)] DT is like derivative, JJ adjective

This tags needs to be converted into Wordnet POS tags

10 > WordNEt

wordnet_tokens = pos_tag_wordnet(tagged_tokens)
print(wordnet_tokens)

map accordingly and use now lemmatization

11 Stop words Removal

example I me we you our

def remove_stopwords(text, is_lower_case=False, stopwords=None):
    if not stopwords:
        stopwords = nltk.corpus.stopwords.words('english')
    tokens = nltk.word_tokenize(text)
    tokens = [token.strip() for token in tokens]
    
    if is_lower_case:
        filtered_tokens = [token for token in tokens if token not in stopwords]
    else:
        filtered_tokens = [token for token in tokens if token.lower() not in stopwords]
    
    filtered_text = ' '.join(filtered_tokens)    
    return filtered_text

With this you have cleaned your data for more info check other stories.

I have attached a colab file for you reference that includes stemming, lemattizing and removal of stop words.

https://colab.research.google.com/drive/1gx5xy1fTZYs1YSybKfwoUVmRBUf--3yp#scrollTo=IZxnwuqrVIZx

NLP DATA wrangling previous cont.

Written by Adarsha Regmi