You have developed a survey, got hundreds of people to fill it outand have started your analysis of the results. You find that most people ticked “Agree” or “Strongly Agree” to many of your questions, you got some numerical averages, like people visited your store 3.6 times on average each month. But how do you get useful information from the other field - “Additional Comments”?

Many surveys ask people to provide additional comments. In addition, many other ways of collecting information involve text-based data. It is hard to analyse effectively, but that doesn’t mean you should ignore it. The data here can provide insight into what your customers or clients know that you don’t – incredibly useful information!

Luckily there are lots of ways to analyse this data, although you won’t find these options in Excel. The most straightforward way is to manually read them all yourself and take notes on the themes that you observe. There are even formal methods for doing these, such as Grounded Theory, which provides some rigor to this type of analysis. However it is time consumingand there are ways to automate the analysis, giving you a good view of the data, without having to read hundreds of responses.

A great option is to perform a word count. To do this, we take each document, split the text into words, and then count the frequency of each word. There are some nuances though, so a little processing is involved:

  1. Remove common words, like “the”, “is” and other words like this.
  2. Find word “stems”, so that “read” and “reading” are counted together, rather than as separate words.

The following code will take a document from you and provide a word count with these fixes applied. Note that because words are stemmed, words like “thinking” will be counted as the word “think”.

Enter some text above to count the words.

From here, you can put these counts into a word cloud builder to create a nice visualisation. Click this button to generate an example:

Click the button below to generate the word cloud

There are also much more complex ways to extract more information:

  • Merge frequently occurring word pairs, giving you the difference between “artificial intelligence” and “artificial” - quite a difference to your analysis!
  • Flagging odd queries for manual analysis. For example, someone might put random text in, which can create odd word counts. Identifying this and removing it will improve the overall analysis.
  • Semantic merging, where words that have similar meaning are combined. For instance, when asked about the weather, some people might say “rainy” and others “wet”, but they really mean the same thing, and could be counted together.
  • Fix misspellings, so that you don’t lose word frequencies (particularly for harder to spell words).

These types of analysis can also be combined with a topic analysis, where lower-level word meanings are combined to overall topics. For example, if we were analysing news articles, we might wish to split them into “sports” or “world news” articles. We could even combine it with last week’s blog post on semantic analysis – see if your “additional comments” in your survey are positive or negative!

Overall, text based data can provide a huge insight into your business, but it does take a bit more to properly extract. If you need any assistance, we can help.

Image attribution