Creating a Voice-controlled App with React Native and Speech Recognition Api

Voice-controlled applications are transforming how users interact with their mobile devices, offering hands-free navigation, accessibility enhancements, and a futuristic user experience. By leveraging React Native combined with the Speech Recognition API, developers can build intuitive apps that respond to natural language commands with ease. This comprehensive guide walks through building a full-featured voice-controlled app from scratch, covering setup, implementation, command handling, and advanced techniques.

Why Build Voice-Controlled Apps?

Voice user interfaces (VUIs) reduce friction in multitasking scenarios, enable accessibility for users with motor impairments, and create immersive experiences in areas like home automation, driving modes, and healthcare. React Native allows cross-platform deployment (iOS and Android) with a single codebase, making it an ideal choice for voice apps. The react-native-voice library wraps native speech recognition APIs, providing a consistent JavaScript interface for capturing and processing speech.

Prerequisites

Before diving in, ensure you have the following:

Basic understanding of React Native components and state management (React Hooks).
Node.js (v14 or later) and npm/yarn installed.
React Native development environment configured: official setup guide.
For iOS: Xcode (macOS only) with a real device (simulator has limited speech recognition).
For Android: Android Studio with an emulator that includes Google Play services or a real device.
Speech Recognition API access via the react-native-voice package.

Additionally, familiarity with platform-specific permissions (microphone access) will smooth the process.

Setting Up the Project

1. Initialize React Native Project

Use the React Native CLI (not Expo, since native modules are required):

npx react-native init VoiceApp

If you prefer TypeScript, add --template react-native-template-typescript.

2. Install react-native-voice

Navigate into the project folder and install the library:

cd VoiceApp
npm install @react-native-voice/voice

For React Native 0.60+, auto-linking occurs. For older versions, run npx react-native link @react-native-voice/voice.

3. Configure Platform Permissions

iOS: In ios/VoiceApp/Info.plist, add the microphone usage description:

<key>NSMicrophoneUsageDescription</key>
<string>This app needs microphone access to listen for voice commands.</string>

Android: Add permissions to android/app/src/main/AndroidManifest.xml inside the <manifest> tag:

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.INTERNET" />

On Android 6.0+ you may also need to request runtime permission; we'll handle that in code later.

Implementing Voice Recognition

Core Component Structure

Create a VoiceScreen.js component. We'll use React Hooks (useState, useEffect, useCallback) to manage state and lifecycle.

1. Initialize the Voice Module

Import Voice and set up event listeners inside a useEffect hook:

import Voice from '@react-native-voice/voice';

useEffect(() => { // Bind event handlers Voice.onSpeechStart = onSpeechStartHandler; Voice.onSpeechEnd = onSpeechEndHandler; Voice.onSpeechResults = onSpeechResultsHandler; Voice.onSpeechError = onSpeechErrorHandler; Voice.onSpeechPartialResults = onSpeechPartialResultsHandler; return () => { Voice.destroy().then(Voice.removeAllListeners); }; }, []);

2. Event Handler Functions

Implement handlers to update UI state:

const onSpeechStartHandler = useCallback((e) => { console.log('Speech started', e); setIsListening(true); }, []);

const onSpeechEndHandler = useCallback((e) => { console.log('Speech ended', e); setIsListening(false); }, []);

const onSpeechResultsHandler = useCallback((e) => { if (e.value && e.value.length > 0) { setRecognizedText(e.value[0]); // Optionally trigger command parsing here } }, []);

const onSpeechErrorHandler = useCallback((e) => { console.error('Speech error', e); setIsListening(false); setError(e.error?.message || 'Speech recognition failed'); }, []);

const onSpeechPartialResultsHandler = useCallback((e) => { if (e.value && e.value.length > 0) { setPartialText(e.value[0]); // Show live transcription } }, []);

3. Start and Stop Listening

Create functions to control the microphone:

const startListening = async () => { try { setError(''); setRecognizedText(''); setPartialText(''); await Voice.start('en-US'); } catch (err) { console.error('Start error', err); setError('Failed to start listening'); } };

const stopListening = async () => { try { await Voice.stop(); } catch (err) { console.error('Stop error', err); } };

4. Requesting Runtime Permissions (Android)

Use PermissionsAndroid from React Native to request microphone access before starting:

import { PermissionsAndroid, Platform } from 'react-native';

5. Full Component UI

Wire everything into a polished interface:

return ( <View style={{ flex: 1, justifyContent: 'center', alignItems: 'center', padding: 20 }}> <Text style={{ fontSize: 24, fontWeight: 'bold', marginBottom: 10 }}>Voice Assistant</Text> <View style={{ flexDirection: 'row', marginBottom: 20 }}> <Button title={isListening ? 'Stop' : 'Start'} onPress={isListening ? stopListening : startListening} color={isListening ? 'red' : 'blue'} /> </View> {partialText ? <Text style={{ fontStyle: 'italic', color: 'gray' }}>Listening: {partialText}</Text> : null} {recognizedText ? <Text style={{ fontSize: 18, marginVertical: 10 }}>You said: <Text style={{ fontWeight: 'bold' }}>{recognizedText}</Text></Text> : null} {error ? <Text style={{ color: 'red' }}>{error}</Text> : null} </View> );

Handling Voice Commands

With recognized text available, we can parse it and execute actions. This section builds a command router that maps spoken phrases to app actions, like navigation to different screens or toggling features.

Command Mapping Strategy

Define an object mapping keywords to handler functions:

const commands = { 'open settings': () => Alert.alert('Command', 'Opening Settings'), 'go back': () => console.log('Back navigation'), 'help': () => Alert.alert('Commands', 'Say "open settings", "go back", or "help"'), 'start recording': () => console.log('Recording started'), 'stop recording': () => console.log('Recording stopped'), };

Then, a parser function iterates over the recognized text and triggers the first matching command:

If your app uses react-navigation, inject the navigation prop into the command handlers. Example:

const commands = { 'open settings': () => navigation.navigate('Settings'), 'go back': () => navigation.goBack(), 'home': () => navigation.navigate('Home'), };

Call handleCommand(recognizedText) inside the useEffect that watches recognizedText:

useEffect(() => { if (recognizedText) { handleCommand(recognizedText); } }, [recognizedText]);

Advanced Command Parsing

For more sophisticated voice UIs, consider using a natural language processing (NLP) library like Twitter-text or a lightweight intent parser. For mobile apps, keeping it simple with keyword matching is often sufficient and fast. You can also support multiple languages by using locale-specific command maps (e.g., commands_es.json).

Testing and Debugging

Speech recognition works best on real devices. Simulators on macOS have limited support (only Siri-based commands, not the native API). Follow these testing tips:

Use a real iOS device with a physical microphone.
On Android, ensure Google Speech Services is enabled and the device has an internet connection (offline recognition requires larger language packs).
Check console logs for errors like error code 5 (no speech detected) or error code 3 (network issue).
Simulate partial results to verify the live transcription UI.
Test in noisy environments to see how well the API filters background noise.

For debugging on Android, use adb logcat | grep -i speech to see native logs.

Enhancing User Experience

Visual Audio Feedback

Show a pulsing microphone icon or voice wave animation while listening. You can use react-native-animatable or Animated API to create a responsive indicator.

const pulseAnim = useRef(new Animated.Value(1)).current; useEffect(() => { const pulse = Animated.loop( Animated.sequence([ Animated.timing(pulseAnim, { toValue: 1.2, duration: 500, useNativeDriver: true }), Animated.timing(pulseAnim, { toValue: 1, duration: 500, useNativeDriver: true }), ]) ); if (isListening) { pulse.start(); } else { pulse.stop(); pulseAnim.setValue(1); } }, [isListening]); ... <Animated.Image source={micIcon} style={{ transform: [{ scale: pulseAnim }] }} />

Error Handling and Recovery

Provide clear feedback when recognition fails. Show a retry button and suggest users check their internet connection or speak louder. Use onSpeechError to reset state and display a user-friendly message.

Language Support

Allow the user to choose a locale (e.g., 'en-US', 'es-ES', 'fr-FR'). Store the selected language in state and pass it to Voice.start(locale). Ensure the voice commands map also handles multiple languages.

Advanced Topics

Offline Speech Recognition

Both iOS and Android support offline recognition with downloaded language packs. On Android, use Voice.start('en-US', { EXTRA_SPEECH_INPUT_COMPLETE_SILENCE_LENGTH_MILLIS: 2000, EXTRA_SPEECH_INPUT_MINIMUM_LENGTH_MILLIS: 5000 }) and enable offline in device settings. On iOS, offline works with iOS 10+ using SiriKit – but it's limited. For fully offline, consider Vosk or Web Speech API, though not directly through react-native-voice.

Streaming to a Backend for Custom NLP

Instead of processing commands on-device, you can stream the recognized text (or raw audio via WebRTC) to a cloud service like Google Cloud Speech-to-Text or Amazon Transcribe for more accurate results and custom vocabularies. Use react-native-fs to record audio and upload it.

Background Listening

For always-on voice control, you need a background service. On Android, use HeadlessTask with react-native-voice (experimental). On iOS, background audio is restricted; use push-to-talk or lock screen widgets instead.

Performance Considerations

Memory: Destroy the Voice instance when the component unmounts to release native resources.
Battery: Avoid continuous listening. Use a wake word detection (e.g., via react-native-vosk) or require a button press.
Network: Speech recognition on both platforms uses an internet connection by default. Inform users if offline mode is not available.

Conclusion

Integrating voice control into a React Native app is both accessible and powerful using the @react-native-voice/voice library. From basic transcription to custom command parsing and rich visual feedback, developers can create engaging hands-free experiences that delight users. Key takeaways include proper permission handling, robust error management, and building a scalable command system. As voice technology continues to evolve, combining on-device recognition with cloud NLP will unlock even more possibilities. Start building your voice-controlled app today and give your users a truly modern interaction method.

For further reading, check the React Native documentation and the react-native-voice GitHub repository.