On Day 7, we’ll add voice detection and lip-syncing, allowing the avatar’s mouth to move in real-time as the user speaks. We’ll also explore voice modulation to alter pitch and tone.
1. Overview of Voice Modulation & Lip Sync
✅ Lip Syncing: The avatar’s mouth moves based on detected speech patterns.
✅ Voice Modulation: Alters voice pitch to sound robotic, deep, or high-pitched.
We’ll use:
🔹 Expo Speech API (for text-to-speech feedback)
🔹 Expo Audio API (to detect voice input)
🔹 Web Speech API (for real-time speech recognition)
2. Installing Dependencies
Step 1: Install Expo Speech API for TTS (Optional)
expo install expo-speech
Step 2: Install Expo Audio API for Microphone Access
expo install expo-av
Step 3: Install React Native Voice for Speech-to-Text
npm install react-native-voice
3. Detecting Speech and Moving the Avatar’s Mouth
Step 1: Request Microphone Permissions
Modify App.js
:
import { Audio } from 'expo-av';
async function requestAudioPermissions() {
const { status } = await Audio.requestPermissionsAsync();
if (status !== 'granted') {
alert('Microphone access is needed for voice tracking.');
}
}
Call requestAudioPermissions()
inside useEffect
.
Step 2: Detect Speech Using Web Speech API
Modify VoiceProcessor.js
:
import Voice from 'react-native-voice';
import { useState, useEffect } from 'react';
export default function useVoiceRecognition() {
const [isSpeaking, setIsSpeaking] = useState(false);
useEffect(() => {
Voice.onSpeechStart = () => setIsSpeaking(true);
Voice.onSpeechEnd = () => setIsSpeaking(false);
return () => {
Voice.destroy().then(Voice.removeAllListeners);
};
}, []);
const startListening = () => Voice.start('en-US');
const stopListening = () => Voice.stop();
return { isSpeaking, startListening, stopListening };
}
Step 3: Sync Speech Detection with Avatar Lip Movement
Modify AvatarModel.js
:
function AvatarModel({ facialExpressions, isSpeaking }) {
return (
<group>
<mesh>
<sphereGeometry args={[1, 32, 32]} />
<meshStandardMaterial color="orange" />
</mesh>
{/* Mouth - moves when speaking */}
<mesh position={[0, -0.3, 1]} scale={[1, isSpeaking ? 1.2 : 1, 1]}>
<boxGeometry args={[0.4, 0.2, 0.1]} />
<meshStandardMaterial color="red" />
</mesh>
</group>
);
}
4. Adding Voice Modulation
We’ll modify the voice pitch to create robotic, deep, or chipmunk-like voices.
Step 1: Modify Voice with Expo Speech
Modify VoiceProcessor.js
:
import * as Speech from 'expo-speech';
const speakText = (text, pitch = 1.0) => {
Speech.speak(text, { pitch, rate: 1.0 });
};
Step 2: Add Voice Pitch Selection UI
Modify VoiceSettings.js
:
import { Picker } from '@react-native-picker/picker';
export default function VoiceSettings({ onChangePitch }) {
return (
<Picker
onValueChange={(value) => onChangePitch(value)}
selectedValue="normal"
>
<Picker.Item label="Normal" value={1.0} />
<Picker.Item label="Deep" value={0.5} />
<Picker.Item label="Robot" value={1.2} />
</Picker>
);
}
Step 3: Apply Modulation in Avatar
Modify AvatarRenderer.js
:
import { useState } from 'react';
import VoiceSettings from './VoiceSettings';
export default function AvatarRenderer({ facialExpressions, isSpeaking }) {
const [voicePitch, setVoicePitch] = useState(1.0);
return (
<View style={{ flex: 1 }}>
<Canvas>
<AvatarModel facialExpressions={facialExpressions} isSpeaking={isSpeaking} />
</Canvas>
<VoiceSettings onChangePitch={setVoicePitch} />
</View>
);
}
5. Testing Speech-to-Avatar Sync
Step 1: Run the App
expo start
Step 2: Test Lip Syncing
- Speak into the mic → Avatar mouth should open/close.
- Stop speaking → Mouth should return to normal.
Step 3: Test Voice Modulation
- Select Deep Voice → Speech should sound lower.
- Select Robot Voice → Speech should sound high-pitched.
6. Optimizing Voice Sync Performance
- Process every 2nd frame to reduce lag:
if (frameCount % 2 === 0) detectVoice();
- Use lower-quality audio sampling for better real-time response.
7. Key Concepts Covered
✅ Integrated real-time voice detection.
✅ Synchronized lip movement with speech.
✅ Implemented voice modulation (deep, robotic, normal).
8. Next Steps: Implementing Multilingual Support for Voice Commands
Tomorrow, we’ll: 🔹 Add multilingual support (English, Spanish, Chinese, etc.).
🔹 Implement voice-based avatar control (e.g., “smile,” “wave”).
9. References & Learning Resources
10. SEO Keywords:
React Native voice recognition, AI avatars with speech, lip syncing with AI, voice modulation in React Native, real-time avatar speech detection.