ElevenLabs 文本转语音 (TTS)

简介

ElevenLabs 提供使用深度学习的自然发声语音合成软件。其 AI 音频模型以 32 种语言生成逼真、多功能且上下文感知的语音、声音和音效。ElevenLabs 文本转语音 API 使用户能够通过超逼真的 AI 旁白将任何书籍、文章、PDF、新闻简报或文本生动地呈现出来。

先决条件

创建 ElevenLabs 帐户并获取 API 密钥。您可以在 ElevenLabs 注册页面注册。登录后，您的 API 密钥可以在您的个人资料页面找到。
将 spring-ai-elevenlabs 依赖项添加到您的项目构建文件中。更多信息，请参阅依赖管理部分。

自动配置

Spring AI 为 ElevenLabs 文本转语音客户端提供 Spring Boot 自动配置。要启用它，请将以下依赖项添加到您的项目 Maven pom.xml 文件中

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-elevenlabs</artifactId>
</dependency>

或添加到您的 Gradle build.gradle 构建文件中

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-elevenlabs'
}

请参阅依赖管理部分，将 Spring AI BOM 添加到您的构建文件中。

语音属性

连接属性

前缀 spring.ai.elevenlabs 用作所有 ElevenLabs 相关配置（包括连接和 TTS 特定设置）的属性前缀。这在 ElevenLabsConnectionProperties 中定义。

财产

描述

默认值

spring.ai.elevenlabs.base-url

ElevenLabs API 的基本 URL。

api.elevenlabs.io

spring.ai.elevenlabs.api-key

您的 ElevenLabs API 密钥。

配置属性

音频语音自动配置的启用和禁用现在通过前缀为 spring.ai.model.audio.speech 的顶级属性进行配置。

要启用，spring.ai.model.audio.speech=elevenlabs（默认已启用）

要禁用，spring.ai.model.audio.speech=none（或任何不匹配 elevenlabs 的值）

此更改是为了允许配置多个模型。

前缀 spring.ai.elevenlabs.tts 用作属性前缀，专门用于配置 ElevenLabs 文本转语音客户端。这在 ElevenLabsSpeechProperties 中定义。

财产	描述	默认值
spring.ai.model.audio.speech	启用音频语音模型	elevenlabs
spring.ai.elevenlabs.tts.options.model-id	要使用的模型 ID。	eleven_turbo_v2_5
spring.ai.elevenlabs.tts.options.voice-id	要使用的语音 ID。这是语音 ID，而不是语音名称。	9BWtsMINqrJLrRacOk9x
spring.ai.elevenlabs.tts.options.output-format	生成的音频的输出格式。请参阅下面的输出格式。	mp3_22050_32

财产

描述

默认值

spring.ai.model.audio.speech

启用音频语音模型

elevenlabs

spring.ai.elevenlabs.tts.options.model-id

要使用的模型 ID。

eleven_turbo_v2_5

spring.ai.elevenlabs.tts.options.voice-id

要使用的语音 ID。这是 语音 ID，而不是语音名称。

9BWtsMINqrJLrRacOk9x

spring.ai.elevenlabs.tts.options.output-format

生成的音频的输出格式。请参阅下面的输出格式。

mp3_22050_32

基本 URL 和 API 密钥也可以使用 spring.ai.elevenlabs.tts.base-url 和 spring.ai.elevenlabs.tts.api-key 专门为 TTS 进行配置。但是，通常建议为了简单起见使用全局 spring.ai.elevenlabs 前缀，除非您有特定原因需要为不同的 ElevenLabs 服务使用不同的凭据。更具体的 tts 属性将覆盖全局属性。

所有以 spring.ai.elevenlabs.tts.options 为前缀的属性都可以在运行时被覆盖。

表 1. 可用输出格式
枚举值	描述
MP3_22050_32	MP3, 22.05 kHz, 32 kbps
MP3_44100_32	MP3, 44.1 kHz, 32 kbps
MP3_44100_64	MP3, 44.1 kHz, 64 kbps
MP3_44100_96	MP3, 44.1 kHz, 96 kbps
MP3_44100_128	MP3, 44.1 kHz, 128 kbps
MP3_44100_192	MP3, 44.1 kHz, 192 kbps
PCM_8000	PCM, 8 kHz
PCM_16000	PCM, 16 kHz
PCM_22050	PCM, 22.05 kHz
PCM_24000	PCM, 24 kHz
PCM_44100	PCM, 44.1 kHz
PCM_48000	PCM, 48 kHz
ULAW_8000	µ-law, 8 kHz
ALAW_8000	A-law, 8 kHz
OPUS_48000_32	Opus, 48 kHz, 32 kbps
OPUS_48000_64	Opus, 48 kHz, 64 kbps
OPUS_48000_96	Opus, 48 kHz, 96 kbps
OPUS_48000_128	Opus, 48 kHz, 128 kbps
OPUS_48000_192	Opus, 48 kHz, 192 kbps

运行时选项

ElevenLabsTextToSpeechOptions 类提供了在进行文本转语音请求时使用的选项。启动时，使用 spring.ai.elevenlabs.tts 指定的选项，但您可以在运行时覆盖这些选项。以下是可用选项：

modelId：要使用的模型 ID。
voiceId：要使用的语音 ID。
outputFormat：生成的音频的输出格式。
voiceSettings：一个包含语音设置的对象，例如 stability（稳定性）、similarityBoost（相似度提升）、style（风格）、useSpeakerBoost（使用扬声器增强）和 speed（速度）。
enableLogging：一个布尔值，用于启用或禁用日志记录。
languageCode：输入文本的语言代码（例如，"en" 代表英语）。
pronunciationDictionaryLocators：发音词典定位器列表。
seed：用于随机数生成的种子，以实现可复现性。
previousText：主要文本之前的文本，用于多轮对话中的上下文。
nextText：主要文本之后的文本，用于多轮对话中的上下文。
previousRequestIds：对话中先前轮次的请求 ID。
nextRequestIds：对话中后续轮次的请求 ID。
applyTextNormalization：应用文本规范化（"auto"、"on" 或 "off"）。
applyLanguageTextNormalization：应用语言文本规范化。

例如：

ElevenLabsTextToSpeechOptions speechOptions = ElevenLabsTextToSpeechOptions.builder()
    .model("eleven_multilingual_v2")
    .voiceId("your_voice_id")
    .outputFormat(ElevenLabsApi.OutputFormat.MP3_44100_128.getValue())
    .build();

TextToSpeechPrompt speechPrompt = new TextToSpeechPrompt("Hello, this is a text-to-speech example.", speechOptions);
TextToSpeechResponse response = elevenLabsTextToSpeechModel.call(speechPrompt);

使用语音设置

您可以通过在选项中提供 VoiceSettings 来自定义语音输出。这使您可以控制稳定性、相似度等属性。

var voiceSettings = new ElevenLabsApi.SpeechRequest.VoiceSettings(0.75f, 0.75f, 0.0f, true);

ElevenLabsTextToSpeechOptions speechOptions = ElevenLabsTextToSpeechOptions.builder()
    .model("eleven_multilingual_v2")
    .voiceId("your_voice_id")
    .voiceSettings(voiceSettings)
    .build();

TextToSpeechPrompt speechPrompt = new TextToSpeechPrompt("This is a test with custom voice settings!", speechOptions);
TextToSpeechResponse response = elevenLabsTextToSpeechModel.call(speechPrompt);

手动配置

将 spring-ai-elevenlabs 依赖项添加到您的项目 Maven pom.xml 文件中

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-elevenlabs</artifactId>
</dependency>

或添加到您的 Gradle build.gradle 构建文件中

dependencies {
    implementation 'org.springframework.ai:spring-ai-elevenlabs'
}

请参阅依赖管理部分，将 Spring AI BOM 添加到您的构建文件中。

接下来，创建一个 ElevenLabsTextToSpeechModel

ElevenLabsApi elevenLabsApi = ElevenLabsApi.builder()
		.apiKey(System.getenv("ELEVEN_LABS_API_KEY"))
		.build();

ElevenLabsTextToSpeechModel elevenLabsTextToSpeechModel = ElevenLabsTextToSpeechModel.builder()
	.elevenLabsApi(elevenLabsApi)
	.defaultOptions(ElevenLabsTextToSpeechOptions.builder()
		.model("eleven_turbo_v2_5")
		.voiceId("your_voice_id") // e.g. "9BWtsMINqrJLrRacOk9x"
		.outputFormat("mp3_44100_128")
		.build())
	.build();

// The call will use the default options configured above.
TextToSpeechPrompt speechPrompt = new TextToSpeechPrompt("Hello, this is a text-to-speech example.");
TextToSpeechResponse response = elevenLabsTextToSpeechModel.call(speechPrompt);

byte[] responseAsBytes = response.getResult().getOutput();

实时音频流式传输

ElevenLabs 语音 API 支持使用分块传输编码进行实时音频流式传输。这允许在生成整个音频文件之前开始音频播放。

ElevenLabsApi elevenLabsApi = ElevenLabsApi.builder()
		.apiKey(System.getenv("ELEVEN_LABS_API_KEY"))
		.build();

ElevenLabsTextToSpeechModel elevenLabsTextToSpeechModel = ElevenLabsTextToSpeechModel.builder()
	.elevenLabsApi(elevenLabsApi)
	.build();

ElevenLabsTextToSpeechOptions streamingOptions = ElevenLabsTextToSpeechOptions.builder()
    .model("eleven_turbo_v2_5")
    .voiceId("your_voice_id")
    .outputFormat("mp3_44100_128")
    .build();

TextToSpeechPrompt speechPrompt = new TextToSpeechPrompt("Today is a wonderful day to build something people love!", streamingOptions);

Flux<TextToSpeechResponse> responseStream = elevenLabsTextToSpeechModel.stream(speechPrompt);

// Process the stream, e.g., play the audio chunks
responseStream.subscribe(speechResponse -> {
    byte[] audioChunk = speechResponse.getResult().getOutput();
    // Play the audioChunk
});

语音 API

ElevenLabs 语音 API 允许您检索有关可用语音、其设置和默认语音设置的信息。您可以使用此 API 发现要在语音请求中使用的 `voiceId`。

要使用语音 API，您需要创建一个 ElevenLabsVoicesApi 实例

ElevenLabsVoicesApi voicesApi = ElevenLabsVoicesApi.builder()
        .apiKey(System.getenv("ELEVEN_LABS_API_KEY"))
        .build();

然后可以使用以下方法：

getVoices()：检索所有可用语音的列表。
getDefaultVoiceSettings()：获取语音的默认设置。
getVoiceSettings(String voiceId)：返回特定语音的设置。
getVoice(String voiceId)：返回特定语音的元数据。

示例

// Get all voices
ResponseEntity<ElevenLabsVoicesApi.Voices> voicesResponse = voicesApi.getVoices();
List<ElevenLabsVoicesApi.Voice> voices = voicesResponse.getBody().voices();

// Get default voice settings
ResponseEntity<ElevenLabsVoicesApi.VoiceSettings> defaultSettingsResponse = voicesApi.getDefaultVoiceSettings();
ElevenLabsVoicesApi.VoiceSettings defaultSettings = defaultSettingsResponse.getBody();

// Get settings for a specific voice
ResponseEntity<ElevenLabsVoicesApi.VoiceSettings> voiceSettingsResponse = voicesApi.getVoiceSettings(voiceId);
ElevenLabsVoicesApi.VoiceSettings voiceSettings = voiceSettingsResponse.getBody();

// Get details for a specific voice
ResponseEntity<ElevenLabsVoicesApi.Voice> voiceDetailsResponse = voicesApi.getVoice(voiceId);
ElevenLabsVoicesApi.Voice voiceDetails = voiceDetailsResponse.getBody();

示例代码

ElevenLabsTextToSpeechModelIT.java 测试提供了一些如何使用该库的通用示例。
ElevenLabsApiIT.java 测试提供了使用低级 ElevenLabsApi 的示例。