StrainTek
 
 
 

WELCOME AT THE ML2P PROJECT

Machine Learning Profanity Filtering

Here is hosted the profanity filtering ML2P project, which is currently funded by Google's Digital News Initiative.

 
 
 

Our system is applied to comments from Youtube, Reddit and portals using DISQUS 

 


 

yielding one score per comment...


The higher the score the more suspicious the comment!

 
 
bot.gif

The fact is that most systems use only lists of bad words.

Why the #$@&%*! is that happening? 76% of the profane messages
have no bad words
We can filter +80%
of the profane messages!
WOW!

 
shutterstock_96487157.jpg

MACHINE LEARNING

Natural  language processing

 
 
sign.png
 
 
 

How it works

 
shutterstock_429650206.jpg

PARTICIPANTS

Brands
& Vendors


Ensemble learning

Various machine learning models have a say for each comment as for its profanity. This allows for models which disagree with each other to decide a common outcome.


Word embeddings

Over 5M comments, annotated by human experts and moderators have been used to generate vectors corresponding to words. This vector space is then used to relate words to each other, but, also, to relate words to senses.


Text classification

Various NLP techniques have been incorporated to transform text into meaningful (for machines only) input. Also, noise has been canceled from the vocabulary and text cleansing techniques are applied. Bag of words transformation and other simple techniques have been used for some of our models.


Neural Networks

Recurrent and Convolutional Neural Networks are used, optimized via Deep Learning techniques. Regularization, dropout, attention, normalization, padding, masking are some of the nuts and bolts that make our Neural Networks work.


 
The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.
— Edsger W. Dijkstra
 
 
shutterstock_486101275.jpg