Fish TTS API Integration Guide

This document introduces the integration guide for the Fish TTS API. This interface is fully compatible with the Fish Audio Official OpenAPI, allowing you to directly migrate existing code calling https://api.fish.audio/v1/tts to https://api.xhuoapi.ai/v1/fish/tts by simply replacing the authentication information without modifying the request body structure.

Application Process

To use the API, you need to apply for the corresponding service on the Fish TTS API page. After entering the page, click the “Acquire” button. If you are not logged in or registered, you will be automatically redirected to the login page to register and log in. After logging in or registering, you will be automatically returned to the current page. A free quota is granted upon the first application, allowing free use of the API.

Differences from the Official API

This API retains the request and response fields of the Fish Audio official API with the following minor enhancements for better integration on our platform:

Authentication method: Uses Authorization: Bearer {token}, where {token} is the key applied for on our platform, not the Fish official key.
TTS model selection: Specified via the HTTP request header model, options are s1 or s2-pro, with the default being s2-pro. This is consistent with Fish official.
Default latency value: The upstream /fish/v1/tts returns an error if latency is not provided. This interface automatically adds latency=normal if omitted, consistent with Fish official default behavior.
Asynchronous callback (platform extension): When an additional callback_url field is included in the request body, the API immediately returns {task_id, started_at}. After the upstream process completes, the full result {audio_url, ...} is POSTed as JSON to the specified URL. The Fish official API does not support this field; including it triggers our asynchronous process.

Apart from the above differences, all fields in the TTS request body (text, reference_id, references, prosody, format, sample_rate, mp3_bitrate, chunk_length, temperature, top_p, etc.) are transparently passed upstream, behaving exactly as documented by Fish official.

Basic Usage

The minimal request only requires the text field. Example CURL:

curl -X POST 'https://api.xhuoapi.ai/v1/fish/tts' \
  -H 'accept: application/json' \
  -H 'authorization: Bearer {token}' \
  -H 'content-type: application/json' \
  -H 'model: s2-pro' \
  -d '{
    "text": "今天天气真好，我们一起出去散散步吧。"
  }'

Example response:

{
  "audio_url": "https://platform.r2.fish.audio/task/8a72ff9840234006a9f74cb2fa04f978.mp3"
}

The response directly uses Fish official fields, including:

audio_url: The generated audio URL, which can be downloaded or played directly.
latency_ms (optional): Upstream processing time in milliseconds.

If you want to use a cloned voice, add reference_id in the request body:

curl -X POST 'https://api.xhuoapi.ai/v1/fish/tts' \
  -H 'accept: application/json' \
  -H 'authorization: Bearer {token}' \
  -H 'content-type: application/json' \
  -H 'model: s2-pro' \
  -d '{
    "text": "今天天气真好，我们一起出去散散步吧。",
    "reference_id": "d7900c21663f485ab63ebdb7e5905036",
    "format": "mp3",
    "sample_rate": 44100
  }'

Asynchronous Callback

Since Fish TTS generation may take a long time for lengthy texts and maintaining long connections consumes system resources, this API provides asynchronous callback capability (an extension beyond Fish official API). The overall flow is: the client includes an additional callback_url field in the request body. The API immediately returns a response containing task_id. When the upstream generation completes, the final audio_url and other fields are POSTed as JSON to the callback_url, including the same task_id to associate the asynchronous result with the original task. Request example:

curl -X POST 'https://api.xhuoapi.ai/v1/fish/tts' \
  -H 'accept: application/json' \
  -H 'authorization: Bearer {token}' \
  -H 'content-type: application/json' \
  -H 'model: s2-pro' \
  -d '{
    "text": "今天天气真好，我们一起出去散散步吧。",
    "callback_url": "https://webhook.site/4815f79f-a40f-4078-ac85-1cc126b6bb34"
  }'

Immediate response:

{
  "task_id": "2725a2d3-f87e-4905-9c53-9988d5a7b2f5",
  "started_at": "2025-05-09T12:34:56.789Z"
}

After a short wait, the callback_url will receive the complete result:

{
  "task_id": "2725a2d3-f87e-4905-9c53-9988d5a7b2f5",
  "audio_url": "https://platform.r2.fish.audio/task/b627c2f7d38a4083a837570ba6d0962f.mp3"
}

You can also actively poll the task status using the Fish Tasks API with the task_id.

Error Handling

This interface preserves Fish official HTTP status codes for errors but uses a unified platform response format consistent with the /fish/audios and /fish/voices series:

400 token_mismatched: Bad request, possibly due to missing or invalid parameters.
400 api_not_implemented: Bad request, possibly due to missing or invalid parameters.
401 invalid_token: Unauthorized, invalid or missing authorization token.
429 too_many_requests: Too many requests, rate limit exceeded.
500 api_error: Internal server error.

Error Response Example

{
  "success": false,
  "error": {
    "code": "api_error",
    "message": "fetch failed"
  },
  "trace_id": "2cf86e86-22a4-46e1-ac2f-032c0f2a4e89"
}

Conclusion

The Fish TTS API is fully compatible with the Fish Audio Official OpenAPI and allows migration of existing projects with zero code changes while benefiting from unified authentication, usage accounting, and asynchronous callback capabilities provided by the platform. It is recommended to use asynchronous callbacks for generating long texts to avoid resource consumption from long connections.

​Application Process

​Differences from the Official API

​Basic Usage

​Asynchronous Callback

​Error Handling

​Error Response Example

​Conclusion