Fake reviews detection and transfer learning

Enrico Alemani
7 min readJan 4, 2020

We apply the Universal Language Model Fine-Tuning(ULMFiT) by Howard and Ruder (2018) to fake reviews detection and demonstrate that deep transfer learning outperforms previously researched statistical techniques by Ott, Cardie and Hancock (2013) as well as a standard neural architecture. Additionally, we make several theoretical contributions, including showing that our model make predictions on the basis of meaningful deception cues. For data and workings, see https://github.com/ieriii/online_reviews

We also deployed a live app which can be used to check the authenticity of hotel reviews, see https://thefakeproject.com/

Source: Howard and Ruder (2018)


In broad terms, the methods used for the classification of fake reviews can be grouped into two categories: behavioural approaches and linguistic methods. The former aims at the identification of anomalous reviewers’ behaviour by examining the metadata associated with a review, such as ratings or timestamps. The latter focuses on the identification of linguistic patterns based on the idea that deceptive and truthful statements have distinctive features. As an example, the choice of terms might suggest an unconscious need for fraudsters to keep distance from false statements while trying to emphasize the fake ‘truth’ of their review (Mihalcea and Strapparava, 2009). Some authors also use ensemble methods which consists of a combination of both behavioural and linguistic approaches.

In our project, we focus on linguistic methods and specifically ULMFiT as a new intuitive approach which can be used to evaluate how well transfer learning performs in the classification of fake and genuine online reviews. The idea is to see whether the model performs well in capturing and classifying deceptive signal, even in the presence of a limited amount of labelled data.


We collected online reviews for restaurants and hotels from two main sources. Mukherjee et al. (2013) includes 67,395 labelled Yelp reviews for hotels and restaurants in Chicago and New York. Also, we obtained an additional sample of 1,600 labelled hotel reviews from Ott, Cardie and Hancock (2013). This corpus contains information from several consumer review websites, booking systems and Amazon Mechanical Turk (AMT).

We considered Ott, Cardie and Hancock (2013) data to be the most suitable data set for our classification task. The main reason is that this data has reliable ground truth which has been constructed by employing Amazon Mechanical Turkers with the precise scope of writing fake content. In contrast, it is unclear whether the Mukherjee et al. (2013) data has the same robust standard since labels are based on the Yelp filtering algorithm (fake: filtered, non-fake: unfiltered) whose mechanics are unobservable. Therefore, the Mukherjee et al. (2013) data has been employed only in unsupervised tasks which do not require labels (i.e. language modelling).

Baseline and ULMFiT models

Ott, Cardie and Hancock (2013) investigates the use of several techniques to classify fake reviews in our data set. The authors did not implement any deep learning framework but rather statistical classifiers (e.g. SVMs) trained with psychological linguistic traits and word-frequencies features. We consider these to be a valid baseline model since it allows us to isolate the performance improvements generated by the use of our deep learning architecture. Moreover, to determine whether the use of transfer learning is necessary to boost the performance (or accelerate learning) of neural networks, we compare our model against a standard multi-layer perceptron (MLP) consisting of 3 layers, 64 neurons for the input and hidden layers and two output neurons. We use ReLU activation function and Softmax in the last layer. Ott, Cardie and Hancock (2013) and MLP are our baseline models.

Following Howard and Ruder (2018), our system architecture consists of 3 blocks, whereby each block is made of an embedding layer of size 400 and 3 stacked LSTMs of 1150 hidden activations per layer. In addition, blocks have different customer heads according to the task to be carried out, as follows:

  • (step1) general domain language model, where we we use the pre-trained set of weights and parameters calculated from training a language model on the Wikitext long-term dependencies language modelling data set.
  • (step2) target task language model fine-tuning, where we fine-tune the knowledge gained during step1 on all the online reviews data gathered. The reason for this is that the Wikipedia and online reviews content are probably from a different distribution and need to be aligned; and
  • (step3) target task classifier where the language model is attached a custom classifier to predict label for genuine and fake reviews.


We found that the use of pre-trained language model on large comprehensive text corpora (i.e. Wikitext-103) can be successfully used to build a general knowledge of the English language which can be recalled, fine-tuned and transferred to perform a task (i.e. classification) on a different data set. Particularly, we consider that this technique is more effective than training from scratch traditional statistical classifiers (i.e. SVMs) as in Ott, Cardie and Hancock (2013) as well as standard neural network architecture such as MLP. Notably, we found that our system architecture achieves the highest overall accuracy in the classification of fake reviews outperforming Ott, Cardie and Hancock (2013) by 3.2% and standard neural networks by 33.8% (Table 1). We observed that the improved accuracy stems from better detection of patterns and deception cues in text of the fake review class. This is confirmed by a higher recall rate for deceptive reviews, which increases by approx. 7.9% compared to Ott, Cardie and Hancock (2013).

Table 1: Accuracy of baselines and our model

We explored how our model learned to make predictions by investigating whether classification was triggered by meaningful deceptive patterns and not spurious cues that would make the model no better than random guessing. To do so, we employed the Sequential Jacobian from Graves (2012) and LIME from Ribeiro, Singh and Guestrin (2016). The former provides information about the sensitivity of the network output relative to each input word, whereas the latter quantifies the ‘importance’ of each word relative to its context.

The results confirmed our hypothesis that fake and genuine reviews have different traits and that the network could capture signals accordingly. Specifically, our model is likely to classify as fake all reviews characterised by concepts associated to the ‘self’ (e.g. ‘I’, ‘myself’) and that contain terms somewhat extraneous to the hotel guests’ experience (e.g. ‘husband’) or lack clear description of spatial arrangements. Figure 1 shows an example of a deceptive features identified using LIME.

Figure1: LIME results for a deceptive review

We also conducted a number of robustness checks to isolate the effect of ULMFiT multi stage training procedure, the relevance of pre-training as well as regularisation techniques. All tests confirmed the relevance of each technique in driving the accuracy of our classifier compared to the baselines. Finally, we note that we could have trained our model for larger number of epochs or tried different combination of hyperparameters in the attempt to achieve even better accuracy. However, we decided our results were good enough to validate our hypotheses and that it was better to save time and resources.


We found that deep learning is more effective than previously researched statistical methods or standard neural network architecture for the task of fake review identification, even in the presence of limited labelled data. Particularly, our classifier appears to use its ability to identify meaningful deception cues hidden within the text corpus. We consider our results to be surprising since the task requires the network to be an actual ‘lie detector’.

Future work can include testing the performance of different neural architectures such as the transformers by Vaswani et al. (2017) or training technique like BERT by Devlin et al. (2018). Most importantly, to improve the industry application of fake review classifiers, further research is required to gain a better understanding of how neural models make predictions when solving natural language processing tasks. For instance, the study of the hidden layers representations learned during training could provide additional evidence of the model understanding of a language.


Devlin, J. et al. (2018) “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. Available at: http://arxiv.org/abs/1810.04805

Graves, A. (2012). ‘Sequential Jacobian’ in Supervised Sequence Labelling with Recurrent Neural Networks. Berlin: Springer Berlin, pp. 23–25

Howard, J., Ruder, S., (2018) ‘Universal language model fine-tuning for text classification’, ACL 2018–56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). Association for Computational Linguistics (ACL), pp. 328–339, doi:10.18653/v1/p18–1031Mihalcea and Strapparava (2009)

Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. (2013) ‘What yelp fake review filter might be doing?’, Proceedings of the 7th International Conference on Weblogs and Social Media, ICWSM 2013. AAAI press, pp. 409–418.

Ott, M., Cardie, C. and Hancock, J.T., (2013) ‘Negative deceptive opinion spam’ NAACL HLT 2013–2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Main Conference. Association for Computational Linguistics (ACL), pp. 497–501.

Ribeiro, M.T., Singh, S., Guestrin, C., (2016) ‘Why should i trust you?” Explaining the predictions of any classifier, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, pp. 1135–1144. doi:10.1145/2939672.2939778

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Polosukhin, I. (2017) ‘Attention is all you need’, Advances in Neural Information Processing Systems, Neural information processing systems foundation, pp. 5999–6009.