Tactics for moderating User Generated Content

February 11, 2021 Anusha

A content strategist needs to formulate rules to address a large variety of errors in raw UGC.

Introduction

User Generated Content is just that – responses shared by users through a quiz form. This interaction might happen at the user's home, workplace, at a park, or even during a commute. They might be using a desktop or a mobile device. Depending on the circumstances, the user might be focused or distracted, affecting the quality of submission in the form of misspellings and grammatical mistakes. Sometimes, users may share inappropriate information or use language unsuitable for consumption. How might we moderate the UGC so as to make it suitable for consumption?

Context

Over 2019 and 2020, I had the opportunity to help build a product made with and fuelled by User Generated Content (UGC) at HolidayIQ.com which was acquired by Lastminute.com. The HolidayIQ Platform was a product that collected, analysed, and distributed travel related UGC for highly personalised marketing at scale. This post is the third in the UGC series and focuses on moderating UGC for eventual consumption.

Part 1: The unexplored power of User Generated Content
Part 2: Tactics for eliciting high quality User Generated Content
Part 4: Tactics for curating User Generating Content into content entities

How might we…

A nicely written article can be broadly described as one that scores well on:

Grammar & punctuation
Absence of misspellings
Usability
Readability

Content curated using UGC needs to compete with authored content using the very same KPIs. In addition, there is a secondary need: The UGC needs to be understandable to curators i.e. content professionals who select and arrange the UGC to create content items requiring manual intervention.

As Content Strategists, how might we treat the UGC to provide value to the primary audience i.e. the end users while also making it understandable to secondary users i.e. curators?

Fixing misspellings

Marketing at scale requires UGC to be moderated at scale. This can be achieved by designing and implementing a system that automatically fixes most misspellings. This is easier said than done for a simple reason – context. Let's consider a few examples:

Ale:
- Correct: "We drank ale and made merry."
- Incorrect: "We ale and drank and made merry." The human mind automatically detects the error and replaces the 'ale' with 'ate'. But a machine needs to be taught.
- Incorrect: "We visited the Ale Center to view the exhibition." The correct word could be 'ANE', which could have been automatically replaced by autocorrect, depending upon the device used by the user to submit their responses.
Excellence
- Correct: "The excellence of the service bowled us over."
- Incorrect: "The service was excellence and we were bowled over." The correct word would be "excellent" – obvious to a human but not so much so to a machine.
- Incorrect: "The excelleencse of the service bowled us over." Variations such as these are infinite. Would a machine automatically comprehend the root word and its correct form?

Machine Learning is key to solving such problems. To build intelligence continually:

Incorporate a process that intersperses algorithmic and manual checks
Begin with implementing a spell check that would highlight misspellings
Next, implement a manual check to review the suggested changes
Treat misspellings in batches as far as possible to help the machine learn context
If in a batch, a word is considered misspelt because of the context, make provision for manual correction

Ensuring Usability

Marketing at scale requires content items of various types – those that require manual curation and others that are automatically generated. That is why it is important to ensure that the UGC is certified suitable for use by assessing it against the indicators not limited to:

Profanity
Nudity
Drugs
Violence and threats
Meaningless content (Gibberish)
Irrelevant content

A Content Strategist will need to source or create lists of keywords and phrases that the algorithm should flag for manual inspection. To detect meaningless content, one may need to create rules such as:

Repetition of a single letter multiple times such as "tttttttttttttt"
Repetition of a series of letters multiple times such as "wuwuwuwuwuwuwuw"
A series of letters that do not make a word such as "jgkjlhuguigug"
A series of punctuation marks such as "ˆ%&%&%*&%*"

Over time, the machine would learn what kind of language to watch out for and correct automatically, with much lower levels of manual supervision. Detecting irrelevant content – such as one describing the food as a response to a question about the weather – would require much higher levels of learning.

Takeaway: It can't be perfect

Automated moderation of UGC at scale is unlikely to be perfect because errors – whether intentional or unintentional – are not finite. Manual intervention serves three purposes. First, to identify errors undetected by the algorithm. Second, to create new rules to improve the algorithm. Third, to modify existing rules to incorporate context.