Question Answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language.
A QA implementation, usually a computer program, may construct its answers by querying a structured database of knowledge or information, usually a knowledge base. More commonly, QA systems can pull answers from an unstructured collection of natural language documents.
Some examples of natural language document collections used for QA systems include:
- a local collection of reference texts
- internal organization documents and web pages
- compiled newswire reports
- a set of Wikipedia pages
- a subset of World Wide Web pages
QA research attempts to deal with a wide range of question types including: fact, list, definition, How, Why, hypothetical, semantically constrained, and cross-lingual questions.
- Closed-domain question answering deals with questions under a specific domain (for example, medicine or automotive maintenance), and can be seen as an easier task because NLP systems can exploit domain-specific knowledge frequently formalized in ontologies. Alternatively, closed-domain might refer to a situation where only a limited type of questions are accepted, such as questions asking for descriptive rather than procedural information. QA systems in the context of machine reading applications have also been constructed in the medical domain, for instance related to Alzheimers disease
- Open-domain question answering deals with questions about nearly anything, and can only rely on general ontologies and world knowledge. On the other hand, these systems usually have much more data available from which to extract the answer.
Two early QA systems were BASEBALL and LUNAR. BASEBALL answered questions about the US baseball league over a period of one year. LUNAR, in turn, answered questions about the geological analysis of rocks returned by the Apollo moon missions. Both QA systems were very effective in their chosen domains. In fact, LUNAR was demonstrated at a lunar science convention in 1971 and it was able to answer 90% of the questions in its domain posed by people untrained on the system. Further restricted-domain QA systems were developed in the following years. The common feature of all these systems is that they had a core database or knowledge system that was hand-written by experts of the chosen domain. The language abilities of BASEBALL and LUNAR used techniques similar to ELIZA and DOCTOR, the first chatterbot programs.
SHRDLU was a highly successful question-answering program developed by Terry Winograd in the late 60s and early 70s. It simulated the operation of a robot in a toy world (the "blocks world"), and it offered the possibility to ask the robot questions about the state of the world. Again, the strength of this system was the choice of a very specific domain and a very simple world with rules of physics that were easy to encode in a computer program.
In the 1970s, knowledge bases were developed that targeted narrower domains of knowledge. The QA systems developed to interface with these expert systems produced more repeatable and valid responses to questions within an area of knowledge. These expert systems closely resembled modern QA systems except in their internal architecture. Expert systems rely heavily on expert-constructed and organized knowledge bases, whereas many modern QA systems rely on statistical processing of a large, unstructured, natural language text corpus.
The 1970s and 1980s saw the development of comprehensive theories in computational linguistics, which led to the development of ambitious projects in text comprehension and question answering. One example of such a system was the Unix Consultant (UC), developed by Robert Wilensky at U.C. Berkeley in the late 1980s. The system answered questions pertaining to the Unix operating system. It had a comprehensive hand-crafted knowledge base of its domain, and it aimed at phrasing the answer to accommodate various types of users. Another project was LILOG, a text-understanding system that operated on the domain of tourism information in a German city. The systems developed in the UC and LILOG projects never went past the stage of simple demonstrations, but they helped the development of theories on computational linguistics and reasoning.
Recently, specialized natural language QA systems have been developed, such as EAGLi for health and life scientists.
Most modern QA systems use natural language text documents as their underlying knowledge source. Natural language processing techniques are used to both process the question and index or process the text corpus from which answers are extracted. An increasing number of QA systems use the World Wide Web as their corpus of text and knowledge; however, many of these tools do not produce a human-like answer, but rather employ "shallow" methods (keyword-based techniques, templates, etc.) to produce a list of documents or a list of document excerpts containing the probable answer highlighted.
In an alternative QA implementation, human users assemble knowledge in a structured database, called a knowledge base, similar to those employed in the expert systems of the 1970s. It is also possible to employ a combination of structured databases and natural language text documents in a hybrid QA system. Such a hybrid system may employ data mining algorithms to populate a structured knowledge base that is also populated and edited by human contributors. An example hybrid QA system is the Wolfram Alpha QA system which employs natural language processing to transform human questions into a form that is processed by a curated knowledge base.
As of 2001, QA systems typically included a question classifier module that determines the type of question and the type of answer. After the question is analysed, the system typically uses several modules that apply increasingly complex NLP techniques on a gradually reduced amount of text; thus, a document retrieval module uses search engines to identify the documents or paragraphs in the document set that are likely to contain the answer, and a filter preselects small text fragments that contain strings of the same type as the expected answer. For example, if the question is "Who invented penicillin?", the filter returns text that contain names of people. Finally, an answer extraction module looks for further clues in the text to determine if the answer candidate can indeed answer the question.
A multiagent question-answering architecture has been proposed, where each domain is represented by an agent which tries to answer questions taking into account its specific knowledge; a meta–agent controls the cooperation between question answering agents and chooses the most relevant answer(s).
Question answering methods
QA is very dependent on a good search corpus - for without documents containing the answer, there is little any QA system can do. It thus makes sense that larger collection sizes generally lend well to better QA performance, unless the question domain is orthogonal to the collection. The notion of data redundancy in massive collections, such as the web, means that nuggets of information are likely to be phrased in many different ways in differing contexts and documents, leading to two benefits:
- By having the right information appear in many forms, the burden on the QA system to perform complex NLP techniques to understand the text is lessened.
- Correct answers can be filtered from false positives by relying on the correct answer to appear more times in the documents than instances of incorrect ones.
Open domain question answering
In information retrieval, an open domain question answering system aims at returning an answer in response to the user's question. The returned answer is in the form of short texts rather than a list of relevant documents. The system uses a combination of techniques from computational linguistics, information retrieval and knowledge representation for finding answers.
The system takes a natural language question as an input rather than a set of keywords, for example, "When is the national day of China?" The sentence is then transformed into a query through its logical form. Having the input in the form of a natural language question makes the system more user-friendly, but harder to implement, as there are various question types and the system will have to identify the correct one in order to give a sensible answer. Assigning a question type to the question is a crucial task, the entire answer extraction process relies on finding the correct question type and hence the correct answer type.
Keyword extraction is the first step for identifying the input question type. In some cases, there are clear words that indicate the question type directly. i.e. "Who", "Where" or "How many", these words tell the system that the answers should be of type "Person", "Location", "Number" respectively. In the example above, the word "When" indicates that the answer should be of type "Date". POS (Part of Speech) tagging and syntactic parsing techniques can also be used to determine the answer type. In this case, the subject is "Chinese National Day", the predicate is "is" and the adverbial modifier is "when", therefore the answer type is "Date". Unfortunately, some interrogative words like "Which", "What" or "How" do not give clear answer types. Each of these words can represent more than one type. In situations like this, other words in the question need to be considered. First thing to do is to find the words that can indicate the meaning of the question. A lexical dictionary such as WordNet can then be used for understanding the context.
Once the question type has been identified, an Information retrieval system is used to find a set of documents containing the correct key words. A tagger and NP/Verb Group chunker can be used to verify whether the correct entities and relations are mentioned in the found documents. For questions such as "Who" or "Where", a Named Entity Recogniser is used to find relevant "Person" and "Location" names from the retrieved documents. Only the relevant paragraphs are selected for ranking.
A vector space model can be used as a strategy for classifying the candidate answers. Check if the answer is of the correct type as determined in the question type analysis stage. Inference technique can also be used to validate the candidate answers. A score is then given to each of these candidates according to the number of question words it contains and how close these words are to the candidate, the more and the closer the better. The answer is then translated into a compact and meaningful representation by parsing. In the previous example, the expected output answer is "1st Oct."
In 2002, a group of researchers presented an unpublished and largely unsourced report as a funding support document, in which they describe a 5-year roadmap of research current to the state of the question answering filed at that time.
QA systems have been extended in recent years to encompass additional domains of knowledge For example, systems have been developed to automatically answer temporal and geospatial questions, questions of definition and terminology, biographical questions, multilingual questions, and questions about the content of audio, images, and video. Current QA research topics include:
- interactivity—clarification of questions or answers
- answer reuse or caching
- knowledge representation and reasoning
- social media analysis with QA systems
- sentiment analysis
- utilization of thematic roles
- semantic resolution: to bridge the gap between syntactically different questions and answer-bearing texts
- utilization of linguistic resources, such as WordNet, FrameNet, and the similar
IBM's question answering system, Watson, defeated the two greatest Jeopardy champions, Brad Rutter and Ken Jennings, by a significant margin.
- Roser Morante , Martin Krallinger , Alfonso Valencia and Walter Daelemans. Machine Reading of Biomedical Texts about Alzheimer's Disease. CLEF 2012 Evaluation Labs and Workshop. September 17, 2012
- GREEN JR, Bert F; et al. (1961). "Baseball: an automatic question-answerer.". western joint IRE-AIEE-ACM computer conference: 219–224.
- Woods, William A; Kaplan, R. (1977). "Lunar rocks in natural English: Explorations in natural language question answering". Linguistic structures processing 5. 5: 521–569.
- Hirschman, L. & Gaizauskas, R. (2001) Natural Language Question Answering. The View from Here. Natural Language Engineering (2001), 7:4:275-300 Cambridge University Press.
- Galitsky B, Pampapathi R. Can many agents answer questions better than one. First Monday. 2005;10. doi:10.5210/fm.v10i1.1204.
- Lin, J. (2002). The Web as a Resource for Question Answering: Perspectives and Challenges. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002).
- Galitsky, Boris (2003). Natural Language Question Answering System: Technique of Semantic Headers. International Series on Advanced Intelligence. Volume 2. Australia: Advanced Knowledge International. ISBN 978-0-86803-979-4.
- Burger, J., Cardie, C., Chaudhri, V., Gaizauskas, R., Harabagiu, S., Israel, D., Jacquemin, C., Lin, C-Y., Maiorano, S., Miller, G., Moldovan, D., Ogden, B., Prager, J., Riloff, E., Singhal, A., Shrihari, R., Strzalkowski, T., Voorhees, E., Weishedel, R., date unknown, "Tasks and Program Structures to Roadmap Research in Question Answering (QA)," at Issues [DRAFT DOCUMENT], accessed 1 January 2016.
- Here is some content taken verbatim from that roadmap (see preceding citation): " Question classes: Different types of questions (e.g., "What is the capital of Liechtenstein?" vs. "Why does a rainbow form?" vs. "Did Marilyn Monroe and Cary Grant ever appear in a movie together?") require the use of different strategies to find the answer. Question classes are arranged hierarchically in taxonomies.  Question processing: The same information request can be expressed in various ways, some interrogative ("Who is the King of Lesotho?") and some assertive ("Tell me the name of the King of Lesotho."). A semantic model of question understanding and processing would recognize equivalent questions, regardless of how they are presented. This model would enable the translation of a complex question into a series of simpler questions, would identify ambiguities and treat them in context or by interactive clarification.  Context and QA : Questions are usually asked within a context and answers are provided within that specific context. The context can be used to clarify a question, resolve ambiguities or keep track of an investigation performed through a series of questions. (For example, the question, "Why did Joe Biden visit Iraq in January 2010?" might be asking why Vice President Biden visited and not President Obama, why he went to Iraq and not Afghanistan or some other country, why he went in January 2010 and not before or after, or what Biden was hoping to accomplish with his visit. If the question is one of a series of related questions, the previous questions and their answers might shed light on the questioner's intent.)  Data sources for QA: Before a question can be answered, it must be known what knowledge sources are available and relevant. If the answer to a question is not present in the data sources, no matter how well the question processing, information retrieval and answer extraction is performed, a correct result will not be obtained.  Answer extraction: Answer extraction depends on the complexity of the question, on the answer type provided by question processing, on the actual data where the answer is searched, on the search method and on the question focus and context.  Answer formulation: The result of a QA system should be presented in a way as natural as possible. In some cases, simple extraction is sufficient. For example, when the question classification indicates that the answer type is a name (of a person, organization, shop or disease, etc.), a quantity (monetary value, length, size, distance, etc.) or a date (e.g. the answer to the question, "On what day did Christmas fall in 1989?") the extraction of a single datum is sufficient. For other cases, the presentation of the answer may require the use of fusion techniques that combine the partial answers from multiple documents.  Real time question answering: There is need for developing Q&A systems that are capable of extracting answers from large data sets in several seconds, regardless of the complexity of the question, the size and multitude of the data sources or the ambiguity of the question.  Multilingual (or cross-lingual) question answering: The ability to answer a question posed in one language using an answer corpus in another language (or even several). This allows users to consult information that they cannot use directly. (See also Machine translation.)  Interactive QA: It is often the case that the information need is not well captured by a QA system, as the question processing part may fail to classify properly the question or the information needed for extracting and generating the answer is not easily retrieved. In such cases, the questioner might want not only to reformulate the question, but to have a dialogue with the system. In addition, system may also use previously answered questions. (For example, the system might ask for a clarification of what sense a word is being used, or what type of information is being asked for.)  Advanced reasoning for QA: More sophisticated questioners expect answers that are outside the scope of written texts or structured databases. To upgrade a QA system with such capabilities, it would be necessary to integrate reasoning components operating on a variety of knowledge bases, encoding world knowledge and common-sense reasoning mechanisms, as well as knowledge specific to a variety of domains. Evi is an example of such as system.  Information clustering for QA: Information clustering for question answering systems is a new trend that originated to increase the accuracy of question answering systems through search space reduction. In recent years this was widely researched through development of question answering systems which support information clustering in their basic flow of process.  User profiling for QA: The user profile captures data about the questioner, comprising context data, domain of interest, reasoning schemes frequently used by the questioner, common ground established within different dialogues between the system and the user, and so forth. The profile may be represented as a predefined template, where each template slot represents a different profile feature. Profile templates may be nested one within another.  Deep Question Answering: Deep QA complement traditional Question Answering by adding some machine learning capabilities within a standard factoid question answering pipeline. The idea is to leverage curated data repositories or knowledge bases, which can be general ones such as Wikipedia, or domain-specific (e.g. molecular biology) in order to provide more accurate answers to the end-users.
- On the subject of interactive QA, see also Perera, R. and Nand, P. (2014). "Interaction History Based Answer Formulation for Question Answering," at [DRAFT DOCUMENT], accessed 1 January 2015.
- On the subject of information clustering for QA, see also Perera, R. (2012). "IPedagogy: Question Answering System Based on Web Information Clustering," at [DRAFT DOCUMENT], accessed 1 January 2015.
- On the subject of deep question answering, see the following citation.
- Gobeill J, Gaudinat A, Pasche E, Vishnyakova D, Gaudet P, Bairoch A, Ruch P (2015). "Deep Question Answering for protein annotation". Database (Oxford). 2015. doi:10.1093/database/bav081. PMC 4572360. PMID 26384372.
- Maybury, M. T. editor. 2004. New Directions in Question Answering. AAAI/MIT Press.
- BitCrawl by Hobson Lane at the Wayback Machine (archived October 27, 2012)
- Perera, R. and Perera, U. 2012. Towards a thematic role based target identification model for question answering.
- Bahadorreza Ofoghi; John Yearwood & Liping Ma (2008). The impact of semantic class identification and semantic role labeling on natural language answer extraction. The 30th European Conference on Information Retrieval (ECIR'08). Springer Berlin Heidelberg. pp. 430–437.
- Bahadorreza Ofoghi; John Yearwood & Liping Ma (2009). "The impact of frame semantic annotation levels, frame‐alignment techniques, and fusion methods on factoid answer processing". Journal of the American Society for Information Science and Technology. 60 (2): 247–263. doi:10.1002/asi.20989.
- Dragomir R. Radev, John Prager, and Valerie Samn. Ranking suspected answers to natural language questions using predictive annotation. In Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, WA, May 2000.
- John Prager, Eric Brown, Anni Coden, and Dragomir Radev. Question-answering by predictive annotation. In Proceedings, 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, July 2000.
- Hutchins, W. John; Harold L. Somers (1992). An Introduction to Machine Translation. London: Academic Press. ISBN 0-12-362830-X.
- L. Fortnow, Steve Homer (2002/2003). A Short History of Computational Complexity. In D. van Dalen, J. Dawson, and A. Kanamori, editors, The History of Mathematical Logic. North-Holland, Amsterdam.
- Question Answering Evaluation at NTCIR
- Question Answering Evaluation at TREC
- Question Answering Evaluation at CLEF
- Quiz Question Answers