@stawa/gtts - v0.0.0

Gemini Icon Gemini Text-To-Speech Gemini Icon

Transform written content into speech using Google AI (Gemini) for text generation and internet-based information retrieval.

Google Gemini Made with TypeScript Powered by Bun Documentation SonarCloud Reliability Rating


📜 Table of Contents

  1. How It Works
  2. Project Note
  3. Project Installation
  4. Project Examples
  5. Contributors

❓ How It Works

This project is based on an example in test/app.ts. It performs the following steps:

  1. Fetches a voice input
  2. Sends a request to the Google Gemini API to receive an AI-generated response
  3. Automatically converts the response to speech using Text-To-Speech (TTS) technology
  4. Plays the generated audio

📌 Project Note

This project has been tested on Linux (Ubuntu 24.04 LTS x86_64). Windows users can install SoX via SourceForge. MacOS-specific information is currently unavailable.

Task Priority Status
Implement Gemini Chat High ✅ Completed
Develop Voice Recognition High ✅ Completed
Implement Audio Language Detection High ✅ Completed
Implement Text Language Detection Medium ✅ Completed
Implement an Audio Player Low ✅ Completed
Define Enums Low ✅ Completed
Integrate Debugging Low ✅ Completed

📦 Project Installation

Before using this repository, ensure the following dependencies are installed on your system:

Linux

  • SoX: sudo apt-get install sox
  • libsox-fmt-all: sudo apt-get install libsox-fmt-all
  • FFmpeg: sudo apt install ffmpeg

Windows

MacOS

MacOS-specific installation instructions are not available at this time.

To install the package, use one of the following commands based on your preferred package manager:

# npm
$ npm install git+https://github.com/Stawa/GTTS.git --legacy-peer-deps
# Bun
$ bun install git+https://github.com/Stawa/GTTS.git --trust

📄 Project Examples

Before diving into the examples, ensure you have the following API keys and credentials:

  • Google Gemini API Key (lib.GoogleGemini)
  • TikTok SessionID (lib.TextToSpeech)
    • Extract from TikTok browser cookies after logging in
  • Google Speech API Key (lib.VoiceRecognition.fetchTranscriptGoogle)
  • Deepgram API Key (lib.VoiceRecognition.fetchTranscriptDeepgram)
  • EdenAI API Key (lib.SummarizeText)

Ensure to store these API keys securely and never commit them to version control. Consider using environment variables or a secure key management system.

Here's a concise example demonstrating how to generate a response using the Google Gemini API:

import { GoogleGemini } from "@stawa/gtts";
import dotenv from "dotenv";
dotenv.config();

const gemini = new GoogleGemini({
apiKey: process.env.GEMINI_API_KEY,
model: "gemini-1.5-flash",
enableLogging: true,
});

async function main() {
try {
const question = "When was Facebook launched?";
console.log(`Question: ${question}`);

const response = await gemini.chat(question);
console.log(`Gemini's response: ${response}`);
} catch (error) {
console.error("An error occurred:", error);
}
}

main();

👥 Contributors

We appreciate the contributions of all our collaborators. Each person's effort helps make this project better. A special thanks to all our contributors who have helped shape this project!

Contributors