Skip to content

Latest commit

 

History

History
594 lines (413 loc) · 9.22 KB

File metadata and controls

594 lines (413 loc) · 9.22 KB

Reference

audio

client.audio.speech(...) -> GetSpeechResponse

📝 Description

Synthesize speech audio from text or SSML. Returns the complete audio file plus billing and speech-mark metadata in a single JSON response. For low-latency playback or long-form text, use POST /v1/audio/stream.

🔌 Usage

from speechify import Speechify
from speechify.environment import SpeechifyEnvironment

client = Speechify(
    api_key="<token>",
    environment=SpeechifyEnvironment.DEFAULT,
)

client.audio.speech(
    audio_format="mp3",
    input="Hello! This is the Speechify text-to-speech API.",
    model="simba-english",
    voice_id="george",
)

⚙️ Parameters

input: str

Plain text or SSML to be synthesized to speech. Refer to https://docs.speechify.ai/docs/api-limits for the input size limits. Emotion, Pitch and Speed Rate are configured in the ssml input, please refer to the ssml documentation for more information: https://docs.speechify.ai/docs/ssml#prosody

voice_id: str — Id of the voice to be used for synthesizing speech. Refer to /v1/voices endpoint for available voices

audio_format: typing.Optional[GetSpeechRequestAudioFormat] — The format for the output audio. Note, that the current default is "wav", but there's no guarantee it will not change in the future. We recommend always passing the specific param you expect.

language: typing.Optional[str]

Language of the input. Follow the format of an ISO 639-1 language code and an ISO 3166-1 region code, separated by a hyphen, e.g. en-US. Please refer to the list of the supported languages and recommendations regarding this parameter: https://docs.speechify.ai/docs/language-support.

model: typing.Optional[GetSpeechRequestModel] — Model used for audio synthesis. simba-english is optimized for English, simba-multilingual for non-English or mixed input. simba-3.0 is the streaming-native model with lower TTFB and richer expressivity. Currently English only; multilingual coming soon. Non-English voices return 400 until multilingual support ships.

options: typing.Optional[GetSpeechOptionsRequest]

request_options: typing.Optional[RequestOptions] — Request-specific configuration.

client.audio.stream(...) -> typing.Iterator[bytes]

📝 Description

Synthesize speech and stream the audio back as it is generated, for low-latency playback. The Accept header selects the audio container; the response is raw audio bytes (HTTP chunked). For Base64-encoded audio with speech-mark metadata in a single JSON response, use POST /v1/audio/speech.

🔌 Usage

from speechify import Speechify
from speechify.environment import SpeechifyEnvironment

client = Speechify(
    api_key="<token>",
    environment=SpeechifyEnvironment.DEFAULT,
)

client.audio.stream(
    accept="audio/mpeg",
    input="input",
    voice_id="voice_id",
)

⚙️ Parameters

accept: StreamAudioRequestAccept

Selects the audio container/codec for the streamed response. The response Content-Type echoes this value, except audio/pcm returns audio/L16 with rate and channels parameters (raw 16-bit linear PCM, 24 kHz mono, little-endian).

input: str

Plain text or SSML to be synthesized to speech. Refer to https://docs.speechify.ai/docs/api-limits for the input size limits. Emotion, Pitch and Speed Rate are configured in the ssml input, please refer to the ssml documentation for more information: https://docs.speechify.ai/docs/ssml#prosody

voice_id: str — Id of the voice to be used for synthesizing speech. Refer to /v1/voices endpoint for available voices

language: typing.Optional[str]

Language of the input. Follow the format of an ISO 639-1 language code and an ISO 3166-1 region code, separated by a hyphen, e.g. en-US. Please refer to the list of the supported languages and recommendations regarding this parameter: https://docs.speechify.ai/docs/language-support.

model: typing.Optional[GetStreamRequestModel] — Model used for audio synthesis. simba-english is optimized for English, simba-multilingual for non-English or mixed input. simba-3.0 is the streaming-native model with lower TTFB and richer expressivity. Currently English only; multilingual coming soon. Non-English voices return 400 until multilingual support ships.

options: typing.Optional[GetStreamOptionsRequest]

request_options: typing.Optional[RequestOptions] — Request-specific configuration.

voices

client.voices.list() -> typing.List[GetVoice]

📝 Description

Gets the list of voices available for the user

🔌 Usage

from speechify import Speechify
from speechify.environment import SpeechifyEnvironment

client = Speechify(
    api_key="<token>",
    environment=SpeechifyEnvironment.DEFAULT,
)

client.voices.list()

⚙️ Parameters

request_options: typing.Optional[RequestOptions] — Request-specific configuration.

client.voices.create(...) -> CreatedVoice

📝 Description

Create a personal (cloned) voice for the user

🔌 Usage

from speechify import Speechify
from speechify.environment import SpeechifyEnvironment

client = Speechify(
    api_key="<token>",
    environment=SpeechifyEnvironment.DEFAULT,
)

client.voices.create(
    sample="example_sample",
    avatar="example_avatar",
    name="name",
    gender="male",
    consent="consent",
)

⚙️ Parameters

name: str — Name of the personal voice

gender: CreateVoicesRequestGender

Gender marker for the personal voice male GenderMale female GenderFemale notSpecified GenderNotSpecified

sample: core.File — Audio sample file

consent: str

A string representing the user consent information in JSON format This should include the fullName and email of the consenting individual. For example, {"fullName": "John Doe", "email": "john@example.com"}

locale: typing.Optional[str] — Native language (locale) of the personal voice (e.g. en-US, es-ES, etc.)

avatar: typing.Optional[core.File] — Avatar image file

request_options: typing.Optional[RequestOptions] — Request-specific configuration.

client.voices.delete(...)

📝 Description

Delete a personal (cloned) voice

🔌 Usage

from speechify import Speechify
from speechify.environment import SpeechifyEnvironment

client = Speechify(
    api_key="<token>",
    environment=SpeechifyEnvironment.DEFAULT,
)

client.voices.delete(
    id="id",
)

⚙️ Parameters

id: str — The ID of the voice to delete

request_options: typing.Optional[RequestOptions] — Request-specific configuration.

client.voices.download_sample(...) -> typing.Iterator[bytes]

📝 Description

Download a personal (cloned) voice sample

🔌 Usage

from speechify import Speechify
from speechify.environment import SpeechifyEnvironment

client = Speechify(
    api_key="<token>",
    environment=SpeechifyEnvironment.DEFAULT,
)

client.voices.download_sample(
    id="id",
)

⚙️ Parameters

id: str — The ID of the voice to download sample for

request_options: typing.Optional[RequestOptions] — Request-specific configuration.