@stawa/gtts - v0.0.0

Gemini Text-To-Speech

Transform written content into speech using Google AI (Gemini) for text generation and internet-based information retrieval.

📜 Table of Contents

How It Works
Project Note
Project Installation
Project Examples
Contributors

❓ How It Works

This project is based on an example in test/app.ts. It performs the following steps:

Fetches a voice input
Sends a request to the Google Gemini API to receive an AI-generated response
Automatically converts the response to speech using Text-To-Speech (TTS) technology
Plays the generated audio

📌 Project Note

This project has been tested on Linux (Ubuntu 24.04 LTS x86_64). Windows users can install SoX via SourceForge. MacOS-specific information is currently unavailable.

Task	Priority	Status
Implement Gemini Chat	High	✅ Completed
Develop Voice Recognition	High	✅ Completed
Implement Audio Language Detection	High	✅ Completed
Implement Text Language Detection	Medium	✅ Completed
Implement an Audio Player	Low	✅ Completed
Define Enums	Low	✅ Completed
Integrate Debugging	Low	✅ Completed

📦 Project Installation

Before using this repository, ensure the following dependencies are installed on your system:

Linux

SoX: sudo apt-get install sox
libsox-fmt-all: sudo apt-get install libsox-fmt-all
FFmpeg: sudo apt install ffmpeg

Windows

SoX: Download from SourceForge
FFmpeg: choco install ffmpeg (using Chocolatey) or Download from official website

MacOS

MacOS-specific installation instructions are not available at this time.

To install the package, use one of the following commands based on your preferred package manager:

# npm
$ npm install git+https://github.com/Stawa/GTTS.git --legacy-peer-deps
# Bun
$ bun install git+https://github.com/Stawa/GTTS.git --trust

📄 Project Examples

Before diving into the examples, ensure you have the following API keys and credentials:

Google Gemini API Key (lib.GoogleGemini)
- Obtain from Google Cloud Console
TikTok SessionID (lib.TextToSpeech)
- Extract from TikTok browser cookies after logging in
Google Speech API Key (lib.VoiceRecognition.fetchTranscriptGoogle)
- Generate from Google Cloud Console Credentials
Deepgram API Key (lib.VoiceRecognition.fetchTranscriptDeepgram)
- Create an account and obtain from Deepgram Console
EdenAI API Key (lib.SummarizeText)
- Sign up and retrieve from EdenAI Dashboard

Ensure to store these API keys securely and never commit them to version control. Consider using environment variables or a secure key management system.

Here's a concise example demonstrating how to generate a response using the Google Gemini API:

import { GoogleGemini } from "@stawa/gtts";
import dotenv from "dotenv";
dotenv.config();

const gemini = new GoogleGemini({
  apiKey: process.env.GEMINI_API_KEY,
  model: "gemini-1.5-flash",
  enableLogging: true,
});

async function main() {
  try {
    const question = "When was Facebook launched?";
    console.log(`Question: ${question}`);

    const response = await gemini.chat(question);
    console.log(`Gemini's response: ${response}`);
  } catch (error) {
    console.error("An error occurred:", error);
  }
}

main();

👥 Contributors

We appreciate the contributions of all our collaborators. Each person's effort helps make this project better. A special thanks to all our contributors who have helped shape this project!