Day 7: Implementing Voice Modulation & Lip Syncing #VoiceSync #AIAvatars

On Day 7, we’ll add voice detection and lip-syncing, allowing the avatar’s mouth to move in real-time as the user speaks. We’ll also explore voice modulation to alter pitch and tone.

Table of Contents

1. Overview of Voice Modulation & Lip Sync

✅ Lip Syncing: The avatar’s mouth moves based on detected speech patterns.
✅ Voice Modulation: Alters voice pitch to sound robotic, deep, or high-pitched.

We’ll use:
🔹 Expo Speech API (for text-to-speech feedback)
🔹 Expo Audio API (to detect voice input)
🔹 Web Speech API (for real-time speech recognition)

2. Installing Dependencies

Step 1: Install Expo Speech API for TTS (Optional)

expo install expo-speech

Step 2: Install Expo Audio API for Microphone Access

expo install expo-av

Step 3: Install React Native Voice for Speech-to-Text

npm install react-native-voice

3. Detecting Speech and Moving the Avatar’s Mouth

Step 1: Request Microphone Permissions

Modify App.js:

import { Audio } from 'expo-av';

async function requestAudioPermissions() {
    const { status } = await Audio.requestPermissionsAsync();
    if (status !== 'granted') {
        alert('Microphone access is needed for voice tracking.');
    }
}

Call requestAudioPermissions() inside useEffect.

Step 2: Detect Speech Using Web Speech API

Modify VoiceProcessor.js:

import Voice from 'react-native-voice';
import { useState, useEffect } from 'react';

export default function useVoiceRecognition() {
    const [isSpeaking, setIsSpeaking] = useState(false);

    useEffect(() => {
        Voice.onSpeechStart = () => setIsSpeaking(true);
        Voice.onSpeechEnd = () => setIsSpeaking(false);

        return () => {
            Voice.destroy().then(Voice.removeAllListeners);
        };
    }, []);

    const startListening = () => Voice.start('en-US');
    const stopListening = () => Voice.stop();

    return { isSpeaking, startListening, stopListening };
}

Step 3: Sync Speech Detection with Avatar Lip Movement

Modify AvatarModel.js:

function AvatarModel({ facialExpressions, isSpeaking }) {
    return (
        <group>
            <mesh>
                <sphereGeometry args={[1, 32, 32]} />
                <meshStandardMaterial color="orange" />
            </mesh>

            {/* Mouth - moves when speaking */}
            <mesh position={[0, -0.3, 1]} scale={[1, isSpeaking ? 1.2 : 1, 1]}>
                <boxGeometry args={[0.4, 0.2, 0.1]} />
                <meshStandardMaterial color="red" />
            </mesh>
        </group>
    );
}

4. Adding Voice Modulation

We’ll modify the voice pitch to create robotic, deep, or chipmunk-like voices.

Step 1: Modify Voice with Expo Speech

Modify VoiceProcessor.js:

import * as Speech from 'expo-speech';

const speakText = (text, pitch = 1.0) => {
    Speech.speak(text, { pitch, rate: 1.0 });
};

Step 2: Add Voice Pitch Selection UI

Modify VoiceSettings.js:

import { Picker } from '@react-native-picker/picker';

export default function VoiceSettings({ onChangePitch }) {
    return (
        <Picker
            onValueChange={(value) => onChangePitch(value)}
            selectedValue="normal"
        >
            <Picker.Item label="Normal" value={1.0} />
            <Picker.Item label="Deep" value={0.5} />
            <Picker.Item label="Robot" value={1.2} />
        </Picker>
    );
}

Step 3: Apply Modulation in Avatar

Modify AvatarRenderer.js:

import { useState } from 'react';
import VoiceSettings from './VoiceSettings';

export default function AvatarRenderer({ facialExpressions, isSpeaking }) {
    const [voicePitch, setVoicePitch] = useState(1.0);

    return (
        <View style={{ flex: 1 }}>
            <Canvas>
                <AvatarModel facialExpressions={facialExpressions} isSpeaking={isSpeaking} />
            </Canvas>
            <VoiceSettings onChangePitch={setVoicePitch} />
        </View>
    );
}

5. Testing Speech-to-Avatar Sync

Step 1: Run the App

expo start

Step 2: Test Lip Syncing

Speak into the mic → Avatar mouth should open/close.
Stop speaking → Mouth should return to normal.

Step 3: Test Voice Modulation

Select Deep Voice → Speech should sound lower.
Select Robot Voice → Speech should sound high-pitched.

6. Optimizing Voice Sync Performance

Process every 2nd frame to reduce lag:

if (frameCount % 2 === 0) detectVoice();

Use lower-quality audio sampling for better real-time response.

7. Key Concepts Covered

✅ Integrated real-time voice detection.
✅ Synchronized lip movement with speech.
✅ Implemented voice modulation (deep, robotic, normal).

8. Next Steps: Implementing Multilingual Support for Voice Commands

Tomorrow, we’ll: 🔹 Add multilingual support (English, Spanish, Chinese, etc.).
🔹 Implement voice-based avatar control (e.g., “smile,” “wave”).

9. References & Learning Resources

10. SEO Keywords:

React Native voice recognition, AI avatars with speech, lip syncing with AI, voice modulation in React Native, real-time avatar speech detection.

Post Views: 168