NLP DATA wrangling previous cont.

Adarsha Regmi
1 min readSep 23, 2021

--

we have lemmatized the word but you may have asked isn’t that difficult working with such cases where you yourself have to set the tag.

if you thought same you are in right place otherwise try reasoning and start assking why to sharpen our mind throw your phone from your hand its a waste.

9>> Rise of solution

The sentence needs to be converted into POS(parts of speech tags)

Code

nltk.pos_tags(tokens)

[(‘The’, ‘DT’), (‘brown’, ‘’JJ)] DT is like derivative, JJ adjective

This tags needs to be converted into Wordnet POS tags

10 > WordNEt

wordnet
wordnet_tokens = pos_tag_wordnet(tagged_tokens)
print(wordnet_tokens)

map accordingly and use now lemmatization

11 Stop words Removal

example I me we you our

def remove_stopwords(text, is_lower_case=False, stopwords=None):
if not stopwords:
stopwords = nltk.corpus.stopwords.words('english')
tokens = nltk.word_tokenize(text)
tokens = [token.strip() for token in tokens]

if is_lower_case:
filtered_tokens = [token for token in tokens if token not in stopwords]
else:
filtered_tokens = [token for token in tokens if token.lower() not in stopwords]

filtered_text = ' '.join(filtered_tokens)
return filtered_text

With this you have cleaned your data for more info check other stories.

I have attached a colab file for you reference that includes stemming, lemattizing and removal of stop words.

https://colab.research.google.com/drive/1gx5xy1fTZYs1YSybKfwoUVmRBUf--3yp#scrollTo=IZxnwuqrVIZx

--

--