Naoko Tosa, Ryohei Nakatsu: Interactive Poem

  • ©, Naoko Tosa and Ryohei Nakatsu, Interactive Poem
  • Photo from


    Interactive Poem



Artist Statement:

    Interactive Poem is a new type of poem, created by you and a computer agent, collaborating in a poetic world full of inspiration, emotion and sensitivity. The concept of this interactive poem is based on conventional poetry, but goes beyond traditional limits by introducing the capability of interaction. You and a computer agent create a dialogue by exchanging short poetic phrases, and through this exchange produce a new poetic world that integrates the poetic world of the agent with your own.

    Interaction: A computer agent called “MUSE” who has been carefully designed with a face suitable for expressing the emotion of a poetic world, appears on the screen. She will utter a short poetic phrase to you. Hearing it allows you to enter the world of the poem and, at the same time, feel an impulse to respond by uttering one of the optional phrases or by creating your own poetic phrase. Exchanging poetic phrases through this interactive processes allows you and MUSE to become collaborative poets who generate a new poem and a new poetic world. The interaction mechanism operates as follows.

    1)    When MUSE utters a phrase, the recognition process is activated. A participant then utters a phrase and it is recognized by the phrase recognition function, which uses the lexicon subset corresponding to the next set of phrases in the transition network. At the same time, emotion contained in the utterance is recognized by the emotion recognition function.

    2)    Based on information pertaining to recognition and the transition network, reaction of the system is decided. The facial expression of MUSE changes according to the results of emotion recognition, and the phrase MUSE utters is based on the results of phrase recognition and the transition network. The background scene changes as the transitions continue.

    3)    In the above stated manner, poetic phrases between MUSE and the participant are consecutively produced.

    The speech recognition unit has two different speech recognition functions: phrase recognition and emotion recognition. To recognition each phrase uttered by a participant, HMM (hidden Markov model) based speaker-independent speech recognition technology has been adopted. Each phrase to be uttered is represented in the form of a phoneme sequence and is stored in the lexicon. To simultaneously detect the emotional state of a participant, the emotion recognition function is introduced. A neural network architecture has been adopted as the basic architecture for emotion recognition. This neural network is trained by using the utterances of many speakers to express the eight emotional states of joy, happiness, anger, fear, teasing, disgust, disappointment, and emotionless. As such, speaker-independent and content-independent emotion recognition is realized.