Speech to Text in an Azure Function

Sometimes it is just easier to record some information instead of typing it. Recordings are great, you can just sit back and listen to them, but sometimes you just want to scan over the text or maybe your environment is not the best for listening to audio. I recently had a case like that and I made an Azure Function that can process spoken text into written text and store this output in an Azure Table.

The global picture

azure_function_bing_speech

  1. Upload an audio file in a supported format (audio/WAV by using the following codecs PCM single channel / Siren / SirenSR) as a blob to a storage container on an Azure Storage account.
  2. Create an Azure function with a trigger that executes on every upload in the blob container.
  3. In the function connect to the Bing Speech API through a websocket and wait for the results to come in.
  4. Store the results in an Azure Table (of course you can store them where ever you want).

Azure Components

For this example you need to setup 3 components in Azure.

  1. Create an Azure Storage account Create in Azure
  2. Create an Azure Function App Create in Azure
  3. Create an Bing Speech API endpoint Create in Azure

The Azure Function

Azure functions are great when you just want to run a small piece of code on demand. Meaning that you only pay for the compute used when the function is actualy executed.

Azure functions are triggered by many different events like a message on a service bus, a change in an Azure Table, a http trigger or by a timer.

For this example I use an Azure Function that is triggered when a blob is uploaded in an Azure Storage Container.

The code can be found here: Code on GitHub

Getting it all working on your local machine

  1. Download the code from GitHub
  2. Create a container on your Azure Blob Storage with the name: audiofiles
  3. Open the solution in Visual Studio
  4. Edit the local.settings.json
    • AzureWebJobsStorage -> Connectionstring from the storage account
    • AzureWebJobsDashboard -> Connectionstring from the storage account
    • AudioStorage -> Connectionstring from the storage account
    • BingSpeechSubscriptionKey -> Subscription key from your Bing Speech API Endpoint
    • ResultAzureTableName -> Name of the table where the results are stored (lowerkey)
  5. Start your Visual Studio in debug mode (Function SDK might need to be installed)
  6. Use Storage Explorer to connect to your Azure Storage Account and Upload an audio file to the container audiofiles. (download a sample file here).
  7. Use the Storage Explorer to see the results in the Azure Table.

Deploy it to Azure

If the above is working, then it is time to deploy your Function to Azure. Some quick tips on doing this.

  • Use version control a version control system like GIT this can be either in VSTS, GitHub or Bitbucket
  • Deploy through a pipeline. It is easy to build an pipeline to deploy a Azure Function in VSTS, it can even be setup from your Azure Function in Azure. Only use a publishing profile to quickly test a function.

If you have any question please drop me a line or send me a tweet

Resources