PRESENCE is a new architecture for speech-based human-machine interaction that is founded on the premise that future progress depends, not on how to "bridge the gap" between speech science and speech technology, but on both communities seeking to assimilate wider research findings on the behaviour of living systems in general and the cognitive abilities of human beings in particular.
The architecture is inspired by relatively old ideas such as 'perceptual control theory' [Powers, W. T. (1973). Behavior: The Control of Perception: Hawthorne, NY: Aldine] together with relatively new discoveries such as 'mirror neurons' [Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience, 27, 169-192] coupled with contemporary theories of cortical functionality such as 'hierarchical temporal memory' [Hawkins, J. (2004). On Intelligence: Times Books] and 'emulation mechanisms' [Wilson, M., & Knoblich, G. (2005). The case for motor involvement in perceiving conspecifics. Psychological Bulletin, 131(3), 460-473].
PRESENCE intentionally blurs the distinction between the core components of a traditional spoken language dialogue system and, as a result, cooperative and communicative behavior emerges as a by-product of an architecture that is founded on a model of co-action in which the system has in mind the needs and intentions of a user, and a user has in mind the needs and intentions of the system.
PRESENCE is based on the premise that there are three fundamental factors that ultimately determine an organism's fitness to survive in an evolutionary framework:
These constraints, coupled with an integrated and recursive processing architecture, pave the way to a new approach to spoken language technology in which high-level interactive behaviours such as prosody and emotion emerge as essential aspects of a communicative system rather than as processing afterthoughts.

The PRESENCE architecture is organized into four layers. The top layer is the main path for motor behaviour such as speaking. A system's needs S:n modulated by motivation, causes the selection of a communicative intention S:i that would satisfy those needs. The selection mechanism can be implemented as a search process, and this is indicated by the diagonal arrow running through the S:i module. The selected intention drives both actual motor behaviour S:m and an emulation of possible motor behaviour S:E(S:m) on the second layer. Sensory input feeds back into this second layer, providing a check as to whether the desired intention has been met. If there is a mismatch between intended behavior and the perceived outcome, then the resulting error signal will cause the system to alter its behaviour appropriately. The third layer of the model captures the empathetic relationship between system as a speaker and the user as a listener that conditions the speaking behaviour of the system. U:E(S:i) represents the emulation by the user of the intentions of the system, and S:E(U:E(S:i)) represents the emulation of that function by the system. A similar arrangement applies to S:E(U:E(S:m)) - the system's emulation of the user's emulation of the systems motor output. The fourth layer represents the system's means for interpreting the needs, intentions and behaviour of a user though a process of emulating the user's needs S:E(U:n), intentions S:E(U:i) and behaviour S:E(U:m).
The second, third and fourth layers are able to exploit the information embedded in the previous layers, and this is indicated by the large block arrows. This process is equivalent to parameter sharing between the different models and thus represents not only an efficient use of information but also offers a mechanism for learning. In fact such a process may be bi-directional, and the potential flow of information in the opposite direction is indicated by the small block arrows.
The basic communicative loop in the PRESENCE architecture contains system components that are themselves realized using similarly-structured building blocks. The PRESENCE architecture is thus inherently nested recursively and hence hierarchical in structure. As a result, further refinements in behaviour arise from the operation of the nested components.
PRESENCE suggests a new model of speech generation that:
In 2006, I coined the term 'reactive speech synthesis' to cover such advanced behaviours.
PRESENCE suggests a new model of speech recognition that:
Moore, R. K., & Nicolao, M. (2011). Reactive speech synthesis: actively managing phonetic contrast along an H&H continuum, 17th International Congress of Phonetics Sciences (ICPhS). Hong Kong. [pdf]
Moore, R. K. (2010). Cognitive approaches to spoken language technology. In F. Chen & K. Jokinen (Eds.), Speech Technology: Theory and Applications (pp. 89-103). New York Dordrecht Heidelberg London: Springer.
Crook, N., Smith, C., Cavazza, M., Pulman, S., Moore, R. K., & Boye, J. (2010). Handling user interruptions in an embodied conversational agent, AAMAS 2010: 9th International Conference on Autonomous Agents and Multiagent Systems. Toronto.
Hofe, R., & Moore, R. K. (2008). Towards an investigation of speech energetics using 'AnTon': an animatronic model of a human tongue and vocal tract. Connection Science, 20(4), 319336.
Moore, R. K. (2007). Towards speech-based human-robot interaction, Symposium on Language and Robotics. Aveiro, Portugal [pdf].
Zyga, L. (2007). Machines might talk with humans by putting themselves in our shoes, PhysOrg.com.
Moore, R. K. (2007). PRESENCE: A human-inspired architecture for speech-based human-machine interaction. IEEE Trans. Computers, 56(9), 1176-1188 [pdf].
Moore, R. K. (2007). Invited talk - PREdictive SENsorimotor Control and Emulation (PRESENCE): Implications for Future Spoken Language Technology, CONTACT International Workshop on Is a neural theory of language possible? Development of unified representations in natural and artificial systems. Lecce, Italy [video (180 Mbytes!)].
Moore, R. K. (2007). Spoken language processing: piecing together the puzzle. Speech Communication, 49, 418-435 [pdf].