Azure OpenAI Whisper Parser
Azure OpenAI Whisper Parser is a wrapper around the Azure OpenAI Whisper API which utilizes machine learning to transcribe audio files to english text.
The Parser supports
.mp3
,.mp4
,.mpeg
,.mpga
,.m4a
,.wav
, and.webm
.
The current implementation follows LangChain core principles and can be used with other loaders to handle both audio downloading and parsing. As a result of this the parser will yield
an Iterator[Document]
.
Prerequisites
The service requires Azure credentials, Azure endpoint and Whisper Model deployment, which can be set up by following the guide here. Furthermore, the required dependencies must be installed.
%pip install -Uq langchain langchain-community openai
Example 1
The AzureOpenAIWhisperParser
's method, .lazy_parse
, accepts a Blob
object as a parameter containing the file path of the file to be transcribed.
from langchain_core.documents.base import Blob
audio_path = "path/to/your/audio/file"
audio_blob = Blob(path=audio_path)
from langchain_community.document_loaders.parsers.audio import AzureOpenAIWhisperParser
endpoint = "<your_endpoint>"
key = "<your_api_key"
version = "<your_api_version>"
name = "<your_deployment_name>"
parser = AzureOpenAIWhisperParser(
api_key=key, azure_endpoint=endpoint, api_version=version, deployment_name=name
)
documents = parser.lazy_parse(blob=audio_blob)
for doc in documents:
print(doc.page_content)
Example 2
The AzureOpenAIWhisperParser
can also be used in conjuction with audio loaders, like the YoutubeAudioLoader
with a GenericLoader
.
from langchain_community.document_loaders.blob_loaders.youtube_audio import (
YoutubeAudioLoader,
)
from langchain_community.document_loaders.generic import GenericLoader
# Must be a list
url = ["www.youtube.url.com"]
save_dir = "save/directory/"
name = "<your_deployment_name>"
loader = GenericLoader(
YoutubeAudioLoader(url, save_dir), AzureOpenAIWhisperParser(deployment_name=name)
)
docs = loader.load()
for doc in documents:
print(doc.page_content)