【ReactNative】音声認識で発音をチェックする方法〜アプリ実装済み〜

React Native 用の Speech-to-text ライブラリを用いて、既定の単語をユーザに発音してもらい、その結果をユーザにフィードバックするアプリの一つの機能として実装した実例をご紹介します。

開発環境(Expo Bare)
セットアップする
1. @react-native-voice/voice
リリースする

開発環境(Expo Bare)

“typescript”: “^5.2.2”
“expo”: “~49.0.13”,
“react”: “18.2.0”,
“react-native”: “0.72.6”,
“expo-router”: “^2.0.0”,
“react-native-paper”: “^5.10.6”,
“@react-native-voice/voice”: “^3.2.4”,

セットアップする

@react-native-voice/voice

パッケージをインストールする

$ yarn add @react-native-voice/voice

iOS設定(Expo Bare用)

// ios/<proejct-name>/Info.plist

<dict>
  ...
  <key>NSMicrophoneUsageDescription</key>
  <string>Description of why you require the use of the microphone</string>
  <key>NSSpeechRecognitionUsageDescription</key>
  <string>Description of why you require the use of the speech recognition</string>
  ...
</dict>

Android設定(Expo Bare用)

// android/app/src/main/AndroidManifest.xml
...
<!-- Added permissions -->
<uses-permission android:name="android.permission.RECORD_AUDIO"/>
...

プロジェクトをリビルドする

# For iOS
npx pod-install
npx react-native run-ios

# For Android
npx react-native run-android

コンポーネントを作成する

ソースコード(リンク)をベースに基本動作をチェックし、問題なかった必要に応じてカスタマイズして実装します。

// JSpeechToText.tsx – 基本動作チェック

import React, {memo, useCallback, useEffect, useState} from 'react';
import {Text, View, StyleSheet, TouchableHighlight} from 'react-native';
import {IconButton} from 'react-native-paper';
import Voice, {
  SpeechErrorEvent,
  SpeechResultsEvent,
  SpeechRecognizedEvent,
} from '@react-native-voice/voice';

function JSpeechToText() {
  const [recognized, setRecognized] = useState('');
  const [error, setError] = useState('');
  const [end, setEnd] = useState('');
  const [started, setStarted] = useState('');
  const [results, setResults] = useState<string[] | undefined>([]);
  const [partialResults, setPartialResults] = useState<string[] | undefined>(
    [],
  );

  const _startRecognizing = useCallback(async () => {
    _clearState();
    try {
      await Voice.start('ko-KR'); // ja-JP
    } catch (e) {
      console.error(e);
    }
  }, []);

  const _stopRecognizing = async () => {
    if (end) return;
    try {
      await Voice.stop();
    } catch (e) {
      console.error(e);
    }
  };

  const _cancelRecognizing = async () => {
    if (end) return;
    try {
      await Voice.cancel();
    } catch (e) {
      console.error(e);
    }
  };

  const _destroyRecognizer = async () => {
    try {
      await Voice.destroy();
    } catch (e) {
      console.error(e);
    }
    _clearState();
  };

  const onSpeechStart = (_e: any) => {
    setStarted('√');
  };

  const onSpeechRecognized = (_e: SpeechRecognizedEvent) => {
    setRecognized('√');
  };

  const onSpeechEnd = (_e: any) => {
    setEnd('√');
  };

  const onSpeechError = (e: SpeechErrorEvent) => {
    console.log('onSpeechError: ', e);
    setError(JSON.stringify(e.error));
  };

  const onSpeechResults = (e: SpeechResultsEvent) => {
    setResults(e.value);
  };

  const onSpeechPartialResults = (e: SpeechResultsEvent) => {
    setPartialResults(e.value);
  };

  const _clearState = () => {
    setRecognized('');
    setError('');
    setEnd('');
    setStarted('');
    setResults([]);
    setPartialResults([]);
  };

  useEffect(() => {
    Voice.onSpeechStart = onSpeechStart;
    Voice.onSpeechRecognized = onSpeechRecognized;
    Voice.onSpeechEnd = onSpeechEnd;
    Voice.onSpeechError = onSpeechError;
    Voice.onSpeechResults = onSpeechResults;
    Voice.onSpeechPartialResults = onSpeechPartialResults;

    return () => {
      Voice.destroy().then(Voice.removeAllListeners);
    };
  }, []);

  return (
    <View style={styles.container}>
      <Text style={styles.welcome}>Welcome to React Native Voice!</Text>
      <Text style={styles.instructions}>
        Press the button and start speaking.
      </Text>
      <Text style={styles.stat}>{`Started: ${started}`}</Text>
      <Text style={styles.stat}>{`Recognized: ${recognized}`}</Text>
      <Text style={styles.stat}>{`Error: ${error}`}</Text>
      <Text style={styles.stat}>Results:</Text>
      {results?.map((result, index) => {
        return (
          <Text key={`result-${index}`} style={styles.stat}>
            {result}
          </Text>
        );
      })}
      <Text style={styles.stat}>Partial Results:</Text>
      {partialResults?.map((result, index) => {
        return (
          <Text key={`partial-result-${index}`} style={styles.stat}>
            {result}
          </Text>
        );
      })}
      <Text style={styles.stat}>{`End: ${end}`}</Text>
      <TouchableHighlight onPress={_startRecognizing}>
        <IconButton mode="outlined" icon={'microphone'} size={100} />
      </TouchableHighlight>
      <TouchableHighlight onPress={_stopRecognizing}>
        <Text style={styles.action}>Stop Recognizing</Text>
      </TouchableHighlight>
      <TouchableHighlight onPress={_cancelRecognizing}>
        <Text style={styles.action}>Cancel</Text>
      </TouchableHighlight>
      <TouchableHighlight onPress={_destroyRecognizer}>
        <Text style={styles.action}>Destroy</Text>
      </TouchableHighlight>
    </View>
  );
}

const styles = StyleSheet.create({
  container: {
    flex: 1,
    justifyContent: 'center',
    alignItems: 'center',
    backgroundColor: '#F5FCFF',
  },
  welcome: {fontSize: 20, textAlign: 'center', margin: 10},
  action: {
    textAlign: 'center',
    color: '#0000FF',
    marginVertical: 5,
    fontWeight: 'bold',
  },
  instructions: {textAlign: 'center', color: '#333333', marginBottom: 5},
  stat: {textAlign: 'center', color: '#B0171F', marginBottom: 1},
});

export default memo(JSpeechToText);

カスタマイズの例として、マイクアイコンの円の境界線(borderWidth)の太さを動的に反映する方法を以下で紹介します。

// JSpeechToText.tsx に下記を追加します。

// 追加する
const [volume, setVolume] = useState<number>(0);
const [themedVolume, setThemedVolume] = useState(0);

// 追加する
const onSpeechVolumeChanged = (e: SpeechVolumeChangeEvent) => {
 setVolume(Number(e.value));
};

...
  useEffect(() => {
    Voice.onSpeechStart = onSpeechStart;
    Voice.onSpeechRecognized = onSpeechRecognized;
    Voice.onSpeechEnd = onSpeechEnd;
    Voice.onSpeechError = onSpeechError;
    Voice.onSpeechResults = onSpeechResults;
    Voice.onSpeechPartialResults = onSpeechPartialResults;
    Voice.onSpeechVolumeChanged = onSpeechVolumeChanged;  // <-- 追加する

    return () => {
      Voice.destroy().then(Voice.removeAllListeners);
    };
  }, [onSpeechEnd, onSpeechError, onSpeechResults]);
...

⬆︎の themedVolume 値を使ってアイコンの borderWidth 値に反映します。

// JSpeechToText.tsx

//　追加する
...
  useEffect(() => {
    (async () => {
      const j = Platform.OS === 'ios' ? volume * 1.5 : Math.abs(volume - 0);
      setThemedVolume(j);
    })();
  }, [volume]);
...

// 修正する(Before)
<IconButton mode="outlined" icon={'microphone'} size={100} />

// 修正する(After)      
<IconButton 
  icon="microphone"
  size={100}
  style={{
    borderColor: colors.primary,
    borderWidth: themedVolume,
  }}
/>
...

その他、results 値を比較して発音を判断するロジックを入れて、正解であればフィードバック(e.g. すばらしい！👍 など)を表示したりなど、各自必要に応じて実装すれば良いかと！