NLP DATA wrangling previous cont.
--
we have lemmatized the word but you may have asked isn’t that difficult working with such cases where you yourself have to set the tag.
if you thought same you are in right place otherwise try reasoning and start assking why to sharpen our mind throw your phone from your hand its a waste.
9>> Rise of solution
The sentence needs to be converted into POS(parts of speech tags)
Code
nltk.pos_tags(tokens)
[(‘The’, ‘DT’), (‘brown’, ‘’JJ)] DT is like derivative, JJ adjective
This tags needs to be converted into Wordnet POS tags
10 > WordNEt
wordnet_tokens = pos_tag_wordnet(tagged_tokens)
print(wordnet_tokens)
map accordingly and use now lemmatization
11 Stop words Removal
example I me we you our
def remove_stopwords(text, is_lower_case=False, stopwords=None):
if not stopwords:
stopwords = nltk.corpus.stopwords.words('english')
tokens = nltk.word_tokenize(text)
tokens = [token.strip() for token in tokens]
if is_lower_case:
filtered_tokens = [token for token in tokens if token not in stopwords]
else:
filtered_tokens = [token for token in tokens if token.lower() not in stopwords]
filtered_text = ' '.join(filtered_tokens)
return filtered_text
With this you have cleaned your data for more info check other stories.
I have attached a colab file for you reference that includes stemming, lemattizing and removal of stop words.
https://colab.research.google.com/drive/1gx5xy1fTZYs1YSybKfwoUVmRBUf--3yp#scrollTo=IZxnwuqrVIZx