What is Text Analysis?

Tirth Patel
5 min readNov 6, 2020

--

Text analysis helps to extract meaningful information from text, such as identifying sentiment from tweets, survey response and product review. In the literal term, it is the process of parsing text into the machine-readable format to identify facts from them. The goal of text analysis is to create structured data from unstructured text documents.

Businesses want to extract meaningful information from the unstructured text data. Nevertheless, sorting through the documents is a time-consuming and expensive process. As an alternative, it could be done by using machines where high volumes of text can automatically analyse and meaningful insights are provided from business data.

What is the difference between Text Mining vs Text Analysis vs Text Analytics?

Text mining is also known as Text analysis, and they often use interchangeably. Text mining is the process of cleaning and transforming unstructured text data to be available for Text analytics. Text analytics, however, is applying the statistical and machine learning techniques to reveal a pattern or infer information across thousands of mined data, resulting in graphs, reports and visualisation.

Conventional Methods and Techniques for Text Mining:

Basic Method:

  • Word Frequency: It is used to identify the most frequent words in the set of documents.
  • Collocation: It refers to a sequence of words the commonly appear together. The most common example for collocation is bigrams (a sequence of two adjacent words appearing together.).
  • Concordance: This method is used to identify the context in which words or set of words appears.

Advance Method:

Text Classification: Text classification is the process of categorising the unstructured data. It further includes:

  • Sentiment Analysis: It helps to analyse the emotions from the text.
  • Topic Analysis: It helps to understand the main themes of the documents. It is the primary way to organise the text in a document.

Text Extraction: Text Extraction is a technique that helps to extract specific information from text like email, number and address.

  • Named Entity Recognition: It allows to identify and extract named entity from text like a place, company, organisation and person from the text.
  • Clustering: It is a technique to form a group of a similar document.

Uses of Text Analysis in IT industry

Working in the IT industry, I can see many opportunities where Text Analytics techniques can significantly benefit organisations, including Improved customer service and retention of customers, and enhancing the Human Resources department.

It helps to improved Customer Services

The advantage of using conversational surveys is the customer feedback can be uncovered to get deeper insights into respondents. It makes text analytics more vital to capture information from unstructured data and can help to improve the business. But, how we can analyse text from the company’s customer services?

Support Ticket Tagging

When a customer opens a support ticket, customer service representatives need to categorise each ticket before addressing it. It is a very dull and time-consuming process as a representative need to read each ticket and choose an appropriate tag. Topic analysis can be applied, where each ticket can be tagged based on the automatically identified topics.

First, the model analyses the language and expression, for instance, ‘I didn’t get the weekly sales report’. Finally, it compares with other similar conversations and finds a matching tag for the ticket automatically. In this case, it could be under the Sales Report Issue tag.

Ticket Analytics

To identify if the customer is happy with the way an issue was solved or if they have expressed frustration with the handling of the issue, sentiment analyser could be used. Sentiment analyser could be used to make sense of user feedback and help to categorise the customer feedback as Positive, Negative or Neutral. Eventually, it helps to gain deeper insights into customer opinion and boost company sales and revenues.

Human Resources Department

The task of the HR team is to read and classify the CVs of a promising candidate which help to fill the company new job position. However, in a big company, the HR department possibly receives hundreds of CVs per day, which make the task of selecting the right candidate for the job position more tedious. The HR team can apply the text classifier, which automatically tags CVs to find candidates more efficiently.

Also, named entity recognition can be used to extract main aspects of CV, like ‘Full Name’, ‘Degree’, ‘Job Experience’ and ‘Skills’. Moreover, the more advance method like summary extractor can be applied so that a summary of the document can be extracted to get the general idea of the topic.

Challenges in Text Mining

There are many issues which might occur during text mining process which directly affect the overall accuracy and relevance of the results. The primary issue is word ambiguity which uses different word having the same meaning. For instance, to refer to the same piece of clothing Australians, Britons and Americans use a different word like “Daks”, “Trousers” and “Pants” respectively.

Moreover, Analysing text in one language is hard enough; analysing text in multiple languages makes it more complicated. For example, the name ‘John’ in English is translated as ‘Jean’ in French and ‘Juan’ in Spanish. Additionally, The issue with the topic modelling algorithm is that it generates a different topic if the order of the data is shuffled (Agrawal et al., 2018). To overcome these issues, domain-specific knowledge is required, which help to provide context and business understanding to improve the accuracy of the results.

Summary

Text analysis has become a powerful tool that helps businesses to gain actionable insights from their text data. Text analytic approaches such as sentiment analysis, clustering methods and topic analysis are all simple but effective methods for analysing of the unstructured data. It saves time, helps to automate tasks and increases the productivity of the company to provide better services to their customers. However, Domain knowledge should be needed to validate the accuracy of the results obtained from the analysis.

Reference:

Agrawal, A., Fu, W., & Menzies, T. (2018). What is wrong with topic modelling? And how to fix it using search-based software engineering. Information and Software Technology, 98(C), 74–88. https://doi.org/10.1016/j.infsof.2018.02.005

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Tirth Patel
Tirth Patel

Written by Tirth Patel

I write about my learnings in the field of Data Science, Visualization, Artificial Intelligence, etc.| Linkedin: https://www.linkedin.com/in/tirth27/

No responses yet

Write a response