The University of Sheffield
Department of Computer Science

Xiaxin Li MSc Dissertation 2015/16

Community Question Answering

Supervised by A.Vlachos

Abstract

In Community Question Answering (CQA) systems, a user usually has to wait for some time before he or she can get an answer for his/her question. However, in CQA systems there might already be some good answers to other questions that are similar to the user's question. It is of practical value to present a user with good answers from similar questions to answer the user's question, and these good answers should be ranked on the basis of how good they are to the user's question. The aim of this project is to build such a ranking system as proposed by SemEval -2016 Task 3.

In this report, the techniques and approaches to solving related problems are reviewed. Two ranking systems (one SVM-based and one LSTM-based) are designed and implemented by using Python and related software packages. The grid search methodology is used to find the optimal C and gamma parameters to train the SVM, and the final SVM-based ranking system achieved a good MAP score, which puts the SVM-based ranking system at the 4th place among all participating teams. The LSTM-based system trains very slowly due to the network architecture and the nature of the training data. No comparable result can be obtained for the LSTM-based system given the time limit of the project and the available computational resources. Finally, comparisons are made between the two systems in different aspects, and future work is proposed with the aim of accelerating the training speed of the LSTM network.