• Magdalena Konkiewicz

Useful Python NLP libraries That you did not know about…


Image by Magic Creative from Pixabay


Introduction

Have you been playing with Natural Language Processing recently? If you have you probably have used spacy or NLTK libraries to build some basic NLP algorithms and process the text data. However, there are other python libraries that could help with data preprocessing or visualizations.

In this article, I will present you with three less-known libraries that I use pretty often when working with text. They are mostly small libraries designed to achieve a specific task.

Let's get started.



Numerizer

The first library I would like you to get introduced to is numerizer. As its name suggests it changes written numbers to corresponding numerical representation.

You can install it using the following code:



pip install numerizer


The library is very easy to use. You just need to call numerize function on the string number. The example below changes ‘fifty five’ to its digit representation ‘55’.



>>> from numerizer import numerize
>>> numerize(‘fifty five’)
'55'


It also works for fractions and less conventional representations of them. For example, you can use words like a quarter, half, etc when referring to fractions and there should still get converted properly.



>>> numerize(‘fifty and a quarter’)
'50.25'


The library is very useful and I have used it when dealing with transcribed text a lot.



Emot

This is another useful library when you are dealing with the text especially text that comes from social media when people use a lot of emojis and emoticons. The library takes emoticons and emojis and returns their text representation. As simple as that.

You can install emot using the following code:



pip install emot


Let’s see a practical example of how to use emot library on some string that contains emoticons. We will pass it ‘I love python ☮ 🙂 ❤’.



>>> import emot 
>>> emot_obj = emot.emot() 
>>> text =I love python ☮ 🙂 ❤” 
>>> emot_obj.emoji(text)
{'value': ['☮', '🙂', '❤'],
 'location': [[14, 15], [16, 17], [18, 19]],
 'mean': [':peace_symbol:', ':slightly_smiling_face:', ':red_heart:'],
 'flag': True}

As you can see after importing the library we have to create an emot_object. Once we have done that we can simply call emoji() function on the text with emoticons.

A result is a dictionary with detailed information about emojis and emoticons contained in the string such as their text representation and location.

There is also a useful function called bulk_emoji() that allows you to extract emojis from a list of strings. This is probably what a lot of NLP data will look like so no need to loop manually through the list. Just call bulk_emoji():



>>> import emot
>>> emot_obj = emot.emot()
>>> bulk_text = ["I love python ☮ 🙂 ❤", "This is so funny 😂",]
>>> emot_obj.bulk_emoji(bulk_text, multiprocessing_pool_capacity=2)
[{'value': ['☮', '🙂', '❤'],
  'location': [[14, 15], [16, 17], [18, 19]],
  'mean': [':peace_symbol:', ':slightly_smiling_face:', ':red_heart:'],
  'flag': True},
 {'value': ['😂'],
  'location': [[17, 18]],
  'mean': [':face_with_tears_of_joy:'],
  'flag': True}]

As a result, you will get a list of dictionaries as seen in the example above.



WordCloud

Another very useful library is WordCloud. It helps you to visualize word frequencies in a given text.

Have you ever heard the phrase “A picture is worth a thousand words”?

I think this is the essence of what this library achieves. By looking at the word cloud of a given text you can pretty much guess what the text is about without reading it.

So how does it work?

You can install it using the following code:



pip3 install WordCloud


Once you have downloaded the library it is very simple to use. Just call generate() function on the WorldCloud object and pass it some text as a parameter. Let’s it at the following text that I have taken from CNN Health:

“As a mom of two young girls, I’ve found that teaching kids how to prepare meals — while it comes with its challenges — has also been one of my most rewarding and enjoyable experiences as a parent. My daughters have watched me in the kitchen since they were toddlers — basically since they could eat mashed avocado. That’s my first bit of advice: start early. Allowing your children to observe you cooking and involving them in simple food prep at a young age will do much more than help them become comfortable in the kitchen. It also increases the odds they will enjoy eating healthy foods.” You can use the following code to create a word cloud for the text above.



from wordcloud import WordCloud

text = “As a mom of two young girls, I’ve found that teaching kids how to prepare meals — while it comes with its challenges — has also been one of my most rewarding and enjoyable experiences as a parent. My daughters have watched me in the kitchen since they were toddlers — basically since they could eat mashed avocado. That’s my first bit of advice: start early. Allowing your children to observe you cooking and involving them in simple food prep at a young age will do much more than help them become comfortable in the kitchen. It also increases the odds they will enjoy eating healthy foods.”

wordcloud = WordCloud().generate(text)

Once you have generated the cloud data it is very easy to display it. You just need to import plt module from matplotlib and set some display parameters.



import matplotlib.pyplot as plt

plt.imshow(wordcloud, interpolation=’bilinear’)
plt.axis(“off”)
plt.show()



Voila! You can immediately see that the text is about: young kids, mum, and the kitchen.

You can play with the library and set it up to your needs by changing the size, and colors of the cloud as well as specifying the stopwords used in the text.



Summary

In this article, I have shared three NLP libraries that I use for some text processing and visualization while working with text data.

I hope you have enjoyed this article and that you will get a chance to use the libraries presented in this tutorial on some real data sets.

48 views0 comments

Recent Posts

See All