How To Speed Up Processing Time Of Aws Transcribe?
Solution 1:
For me, AWS Transcribe took 20 minutes to transcribe a 17 minute file. One possible idea is to split the audio file in chunks and then use multiprocessing with 16 cores at EC2, like a g3.4xlarge instance.
Split the audio file in 16 parts with a silence threshold of -20, then convert to .wav:
$ sudo apt-get install mp3splt
$ sudo apt-get install ffmpeg
$ mp3splt -s -p th=-20,nt=16 splitted.mp3
$ ffmpeg -i splitted.mp3 splitted.wav
Then, use the multiprocessing with 16 cores transcribing simultaneously, mapping your transcribe function (transcribe.start_transcription_job) for each one of the TranscriptionJobName and job_uri's:
import multiprocessing
output=[]
data = range(0,16)
deff(x):
job_name = "Name"+str(x)
job_uri = "https://s3.amazonaws.com/bucket/splitted"+str(x)+".wav"
transcribe.start_transcription_job(
TranscriptionJobName=job_name,
Media={'MediaFileUri': job_uri},
MediaFormat='wav',
LanguageCode='pt-BR',
OutputBucketName= "bucket",
MediaSampleRateHertz=8000,
Settings={"MaxSpeakerLabels": 2,
"ShowSpeakerLabels": True})
whileTrue:
status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED','FAILED']:
breakdefmp_handler():
p = multiprocessing.Pool(16)
r=p.map(f, data)
return r
if __name__ == '__main__':
output.append(mp_handler())
Solution 2:
I have researched for a trascription speed guarantee with no luck
In this forum post (requires an aws account) a poster makes a benchmark with the following results:
- A 10 minute clip took about 5 minutes
- 40 minute clips take around 17 minutes
- a 2 hour file took 36 minutes
What seems to be an official Amazon source states that "At this time, transcription speeds are better optimized for audio longer than 30 seconds. You'll start to see a better processing time to audio duration time ratio when the audio file length is about 2 minutes or longer. Having said, this we are working hard to enhance transcription speeds overall"
I hope it helps researchers
Post a Comment for "How To Speed Up Processing Time Of Aws Transcribe?"