微软语音技术 SAPI 5.4 文本转语音 TTS 应用程序范例


另请参阅: TTS (文本转语音) 引擎

本文为微软官方 TTSApp (SAPI 5.4) 范例原文转载。

 

TTSApp is an example of a text-to-speech (TTS) enabled application. This sample application is intended to demonstrate many of the features for SAPI 5 in a single coherent application. It is not a full featured TTS-enabled application although the foundations of many of the options are present.

TTSApp allows you to hear the resulting audio output from the TTS process for text entered in the main window. Alternatively, you can open a file and TTSApp will speak the contents of that file.

Each word is highlighted in the text window to indicate the current TTS processing position. Features include:

SAPI5 TTSApp The main display window of the TTSApp sample application.
Text window TTSApp speaks the text contained in this window using TTS.
Speak Initiates the TTS process.
Voices Selects the voice for the audio output.
Rate Selects the rate of speech.
Volume Selects the volume level of the audio output stream.
Open File Enables TTSApp to open and speak the contents of a stored text file.
Pause Pauses the TTSApp text phrase speaking process.
Resume Resumes the TTSApp text phrase speaking process.
Stop Stops the TTSApp text phrase speaking process.
About Displays the About TTSApp information dialog box.
Format Selects the audio format.
Skip Specifies the number of sentences to skip in the phrase speaking process.
Speak wav Speaks the contents of a stored wav file.
Reset Resets TTSApp to its original configuration setting.
Save to wav Saves the contents of the TTSApp audio output stream to a wav file.
Show all events Displays all TTSApp SAPI events.
Process XML Specifies that the TTS voice will speak the XML tags and their contents in the TTS process.
Mouth Position Displays mouth shapes for phrase elements as they are spoken.
  • Ee125104.TTSapp_Main(en-us,VS.85).gif
    SAPI5 TTSApp main window.
    Use the main TTSApp window to select the configuration settings that affect the TTS process. The elements of TTSApp are listed above. Click the text in the left column for additional information.

  • Ee125104.TTSApp_TextWindow(en-us,VS.85).gif
    Text window
    The text content of this window is spoken by TTSApp. All text entered in this window is processed and spoken by TTSApp voice.
    By default, the text content of this window is, " Enter the text you wish spoken here. "

  • Ee125104.Btn_Speak(en-us,VS.85).gif
    Speak
    Click Speak to initiate the text-to-speech process.

  • Ee125104.Drp_Voices(en-us,VS.85).gif
    Voices
    Select a voice using the drop-down list. TTSApp uses the selected voice when speaking a wav file or the contents of the text window.

  • Ee125104.Sld_Rate(en-us,VS.85).gif
    Rate
    Move the slide control to the right to increase the speech rate, and to the left to decrease the speech rate. The Rate level determines the number of text units spoken per minute.

  • Ee125104.Sld_Volume(en-us,VS.85).gif
    Volume
    Move the slide control to the right to increase the volume level, and to the left to decrease the volume level.

  • Ee125104.Btn_Open(en-us,VS.85).gif
    Open File
    Click Open File to access the Windows Open dialog box. Select the file, and then click Open .

  • Ee125104.Btn_Pause(en-us,VS.85).gif
    Pause
    Click Pause to interrupt the TTS process.

  • Ee125104.Btn_Resume(en-us,VS.85).gif
    Resume
    Click Resume to continue the TTS process.

  • Ee125104.Btn_Stop(en-us,VS.85).gif
    Stop
    Click Stop to stop the TTS process.

  • Ee125104.TTSapp_About_shad(en-us,VS.85).gif
    Ee125104.Btn_About(en-us,VS.85).gif
    About
    The About window displays information related to TTSApp. Click OK to close the About window.

  • Ee125104.Drp_Format(en-us,VS.85).gif
    Format
    Use the drop-down list in Format to select one of the following format rates.

Selectable format rates 8kHz8 Bit Mono8 Bit Stereo16 Bit Mono16 Bit Stereo 11kHz8 Bit Mono8 Bit Stereo16 Bit Mono16 Bit Stereo 12kHz8 Bit Mono8 Bit Stereo16 Bit Mono16 Bit Stereo 16kHz8 Bit Mono 8 Bit Stereo16 Bit Mono16 Bit Stereo 22kHz8 Bit Mono8 Bit Stereo16 Bit Mono16 Bit Stereo 24kHz8 Bit Mono8 Bit Stereo16 Bit Mono16 Bit Stereo 32kHz8 Bit Mono8 Bit Stereo16 Bit Mono16 Bit Stereo 44kHz8 Bit Mono8 Bit Stereo16 Bit Mono16 Bit Stereo 48kHz8 Bit Mono8 Bit Stereo16 Bit Mono16 Bit Stereo
  • Ee125104.Btn_Skip_Spin(en-us,VS.85).gif
    Skip
    Use the spin box to select the number of skipped sentences. Skip functions only while text is being spoken.

  • Ee125104.Btn_Speak_Wav(en-us,VS.85).gif
    Speak wav
    Speak wav enables TTSApp to speak the contents of a wav file. Click Speak wav to access the Windows Open dialog box. Select a wav file from the dialog box, and then click Open .

  • Ee125104.Btn_Reset(en-us,VS.85).gif
    Reset
    Click Reset to reset TTSApp to its original configuration state.

  • Ee125104.Btn_Save_Wav(en-us,VS.85).gif
    Save to wav
    Click Save to wav to save the TTSApp audio output stream to a wav file.

  • Ee125104.TTSapp_ShowEvents(en-us,VS.85).gif
    Show all events
    Select Show all events to display SAPI related events in the event display window as the input text is processed by TTSApp.

  • Ee125104.TTSapp_SpeakXML(en-us,VS.85).gif
    Process XML
    Select Process XML to include the XML tags and their contents in the audio output stream from TTSApp. When this option is selected, the application will parse and interpret the XML tags literally.

    For example, if the Process XML option is selected, the application could be paused for the specified number of milliseconds in the SILENCE tag.

    Process XML   XML tag   Result
    Ee125104.TTSapp_SpeakXML(en-us,VS.85).gif <SILENCE MSEC = "3000"/> The application would speak 3000 milliseconds of silence.
    Ee125104.TTSapp_SpeakXML_NotSelected(en-us,VS.85).gif <SILENCE MSEC = "3000"/> The application will speak the phrase, "less than silence msec equals quote three thousand quote slash greater than."
  • Ee125104.TTSapp_Mouth(en-us,VS.85).gif
    Mouth Position
    The mouth position displays the various mouth shapes and positions as TTSApp processes the input text stream.