Press or Say (Preview)

The Press or Say action enables flow builders to create a voice flow in which  a caller can provide an input through speech or  DTMF input using the dial pad on their phone. Whichever input they choose is then streamed directly to the IBM Watson speech recognition service (we’ll add more providers later). We’re still fine-tuning and taking feedback for Press or Say, so it has been launched in preview.



The action looks like the Collect DTMF action in most ways and allows for the same configuration of the following items:



Termination Key

The key a caller can press on their dial pad key on dial pad to signify they’ve completed their input



Specify the number of seconds the user has to enter their input before the flow moves on. 


Number of Digits

The number of digits that the action should expect to receive (i.e. 16 digits for a credit card number)


Audio Configuration

Embed audio clips or text-to-speech to configure an audio prompt for the action (i.e. “Please enter your 16-digit credit card number.”)


Store data in Atmosphere® Insights Data Records

When checked, DTMF inputs will be stored for analytics purposes in Atmosphere® Insights.


Note: Due to the potential for sensitive or private data to be shared, we do not store any DTMF inputs without this box being checked.


Enable Speech Rec

Allow for spoken inputs to be converted to text.



The Press or Say action creates a variable that contains the caller’s input (either DTMF digits or transcription of their speech). A switch action can be used to route the flow based on the contents of the variable.  



  • Lower latency: This action streams directly to speech recognition services, such as IBM Waston, for much faster speech recognition in the voice flow compared with the previous solution for this use case, which involved chaining multiple actions together and relied on API calls to the speech recognition service.
  • Convert spoken numbers to digits: The backend improvements also allow for spoken numbers to be translated to digits (i.e. Spoken “Two four seven three” becomes “2473”).
  • Interrupt: Because this action streams to the speech recognition service directly, the action is built to respond quickly to speech, which means that if the user wishes to interrupt while the audio prompt is playing, they can do so. Previously, with the API-based method, the caller was required to wait for the audio prompt to finish before speaking.