Recently one of our clients had an interesting scenario, where they wanted to understand customer characteristics based on their comments and feedback given to the company. The client was a leading service provider to the logistics, manufacturing, and transportation industry. We helped them using text analytics.
So, what is text analytics?
Text analytics is a process of converting unstructured text data into a meaningful form that can be used analyzed. The process helps in various things, including analyzing customer feedback, entity modeling, product reviews and more.
Text analytics involves first retrieving unstructured data and then structuring it to find patterns and trends. Sentimental analysis, clustering, lexical analysis, predictive analysis and categorisation are also a part of the process of analyzing texts.
Why do we need text analytics?
Let us consider that you have launched a product virtually and people are reacting to it. However, the feedbacks are mixed – both positive and negative – and in huge numbers. This would be rather difficult to do it manually.
Due to text analytics, businesses can easily study these texts which can be presented in the form of charts, surveys and other such formats. On the basis of the results, companies can then make an informed decision about the product concerned and its performance.
Further, companies can also find out what problems customers are facing and resolve it before it negatively affects their customer base. Businesses can also understand what customers need and tweak their products and services to better suit their needs.
Case study
Aryng’s uses its proprietary data-to-decisions BADIR framework to resolve client’s problems.
BADIR is a five-step recipe-based plan to help you follow a structured approach to find the solutions efficiently. The process takes you from determining what questions to ask, through the design of your analysis, the collection of data, and data analysis, all the way to recommendations for actions to help drive impact on your business.
In this case, the client wanted to determine the differences in Net Promoter Score (NPS), identify keywords from verbatim analysis of customers’ feedback comments giving indicators of churn, and analyze the NPS and churn behavior. NPS is a metric that determines the rate at which customers are likely to recommend a company, product, or service.
The first step of BADIR is to identify the business problem. In this case, it was to determine the action the leadership should take to lower churn and increase NPS. We then formed an analysis plan using a few hypotheses – smaller divisions have better NPS scores than larger divisions as the former tend to offer better customer service than the latter; larger clients have lower customer satisfaction; likely keywords will be related to either inability to fill positions or delay in the same
Based on this plan, we collected the required data from the comments and feedback of the client’s customers using text analytics.
A few examples of the comments were as follows:
- “Sending drivers to us without the experience to do the job.”
- “We do not get qualified members promptly.”
- “I HAVE HAD VERY LITTLE TO NO LUCK GETTING PEOPLE IN TO FILL MY OPEN POSITIONS.”
Also read: Are you a Data Science hero — aka BADIRist?
Insights and recommendations
Since the task at hand involved text data, the insights were based on text analytics.
With the advent of deep learning algorithms, complex language models, and advanced natural language processing techniques, there are various ways to carry out text analytics.
Models such as word2vectors, Glove, BERT (developed by Google) and GPT-3 (developed by OpenAI whose founders include Elon Musk and Sam Altman) are quite popular models for text analytics.
Keeping in mind the core business problem, we decided to perform three critical analyses:
- Analyze the text comments through word clouds
- Build predictive models with sequence algorithms.
- Do Aspect-based sentiment analysis.
The above three analyses were chosen to get different kinds of insights. Based on them, the results and recommendations helped our client increase NPS by 10%.
Let’s take a dive into each of these analyses and see what they involve.
Word Cloud
Word cloud is a method to analyze the most commonly occurring words when provided with a particular condition or context. It allows us to understand which words are associated most commonly with a particular topic, field, context, etc.
In the above-mentioned case, the text comments were from customers who were either active or inactive. Word clouds helped us understand the most common words used. It allowed us to understand the themes that ran across the two sub-set of customers – active and inactive – and the ones that differentiated them.
The larger the words in the image – the greater is its frequency in the subset. This, of course, comes with the disclaimer that the most common words like is, the, a, of, etc. have been removed before the analysis. In the world of text analytics, we often call these words stop-words. They are often removed because they would dominate in terms of frequency while lending very little to zero value to any meaningful analysis.
Check the images below to see the frequency of repeated words in both subsets.

Word Cloud for active customers.

Word cloud for inactive customers.
Predictive Analysis
In this case, we also built a sequence model to predict whether the customer was a promoter or detractor. The comments of the customers were used as input for the sequence model. Since deep neural networks only accept numerical values, the comment in English was converted to a series of vectors – each word in the statement being a numerical vector.
The sequence model was a bidirectional LSTM (Long-short term memory) network – a special kind of deep neural network, designed to predict an outcome based on a sequence.
These models can become very complicated and are more challenging to interpret than conventional Machine Learning models. However, they can accurately capture patterns from a sequence of words (in our case, text converted into numerical vectors) and understand them.
This model can be used to flag comments, which indicate low NPS and help the client connect to unhappy customers much more efficiently.
Aspect-based Sentiment Analysis
An example of word association is the SpaCy model. In aspect-based sentiment analysis, models usually select an aspect and try to figure out the sentiment associated with it.
There are various ways to do this analysis. In this case, we used the language model created by the SpaCy library. This model gives us word associations through which we can figure out if the words associated with our target words give positive or negative sentiments.
It can also help with possible problems associated with our target aspects. For example, we were able to extract possible issues that were related to candidates as:
- ‘Poor’.
- ‘Not received.’
- ‘Qualified not getting.’
- ‘Unqualified.’
Conclusion
As we can see, there are so many ways texts that can be analyzed depending on the business needs. From predicting outputs from a sequence of words to analyzing sentiments – the opportunities are endless.
There are more complex models mentioned earlier in this blog used to make difficult predictions such as answers to questions, chatbots, and conversational AIs. In the past decade itself, the advancement in NLP and Deep Learning has opened up a whole new world.