The Amplify Series, Part 6: Using the power of AI and Machine Learning with Amplify Predictions

-

In the last part of this series, we added the Amplify Storage category, which allowed us to retrieve and upload files to S3 directly from our application. In this part of the blog series, we will add even more functionality, such as identifying text from an image, converting text to speech, and interpreting the sentiment of text. We will do this by adding a new Amplify category called Amplify Predictions which will allow us to use the power of AI and machine learning to access powerful functionality out of the box.

As usual, we will start by adding the category to the project. We will then create a new page for the Amplify Predictions category and add sections to that page per functionality available. This page will serve as a showcase for all the possibilities within Amplify Predictions. In the end, you will have a better understanding of this category and be able to appreciate how easy it is to get this functionality working in your applications.

Amplify app repository update

We will continue in the repository where we left off in the last blog post.

There have been 2 extra commits to this repository since the last blog post. These are:

Now we are ready to get started and add more functionality to our application.

Identify text from uploaded image

In this section, we will add functionality that will allow us to upload an image and retrieve the identified text inside the image.

Generating the backend resources

We will run amplify add predictions with the following options:

  • Category: identify
  • What would you like to identify: Identify Text
  • Friendly name: <<use default>>
  • Identify documents: yes
  • Access: Auth users only

The Amplify CLI output will look similar to this:


Evertsons-MacBook-Pro:theamplifyapp evertsoncroes$ amplify add predictions
✔ Please select from one of the categories below · Identify
✔ What would you like to identify? · Identify Text
✔ Provide a friendly name for your resource · identifyTextd230b04a
✔ Would you also like to identify documents? (y/N) · yes
✔ Who should have access? · Auth users only
Successfully added resource identifyTextd230b04a locally

As with the previous category, we want only to allow access to authenticated users. This command will add a new directory called predictions to our amplify/backend directory, which will contain information needed to create the resources in AWS to support the new functionality.

Bug: Before continuing, we need to add some manual changes to the generated output since there is a bug in the 10.8.1 version of the Amplify CLI. To fix the issue, open the amplify/backend/predictions/identifyText<<id>>/parameters.json file and add the following three key-value pairs to it:

  • “format”: “PLAIN”
  • “access”: “auth”
  • “identifyDoc”: “document”

We will run amplify push to create these resources in AWS. The AWS services used here are Amazon Rekognition for image recognition and Amazon Textract for document analysis. 

Note that using these services will cost money. Amazon Rekognition will cost around $0.001 per image processed, while Amazon Textract costs around $0.0015 per document processed. As a general rule, be sure to set up a budget with billing alerts for your AWS account so you don’t get surprised by large costs. 

These commands will add the following changes to our repository.

Adding the predictions page

The first thing we need to do is add a new component to our frontend application that will contain all the predictions functionality:

ng generate component components/categories/predictions

This will generate the expected files for our component. We will then link up routing to be able to render this component. Refer to this commit for the details so far. 

Adding Text identification functionality to the Predictions page

We will add a new frontend component that will contain the functionality to upload an image and identify the text in that image:

ng generate component components/categories/predictions/identify-text

Inside our predictions.component.html we must make sure to add the newly generated identify-text component:

<app-identify-text></app-identify-text>

Inside the identify-text.component.html, we will add:

<input
  type="file"
  id="imageUpload"
  name="imageUpload"
  accept="image/png, image/jpeg"
  (change)="imageSelected($event)"
/>

Similar to the last blog, this will give us an input that we can use to select images from our device. We have to add logic to react to the image that is selected in our identify-text.component.ts:


import { Component, OnInit } from '@angular/core';

@Component({
  selector: 'app-identify-text',
  templateUrl: './identify-text.component.html',
  styleUrls: ['./identify-text.component.css']
})
export class IdentifyTextComponent implements OnInit {
  selectedFile: File | undefined = undefined;

  constructor() {}

  ngOnInit(): void {}

  imageSelected = (e: Event) => {
    const input = e.target as HTMLInputElement;

    if (!input.files?.length) {
      return;
    }

    this.selectedFile = input.files[0];
  };
}

We will now be able to select a file and it will be stored in the selectedFile variable. Normally it would be best practice to add this logic to a separate component so that we can reuse this code. However, to keep the blog shorter we will allow duplicate code.

Now that we have the functionality to upload an image, we need to add a button that does something with the selected image. We will also show the identified text and also add some CSS to identify-text.component.html:

<div class="container-fluid card-background">
  <h2>Identify Text</h2>
  <input
    type="file"
    id="imageUpload"
    name="imageUpload"
    accept="image/png, image/jpeg"
    (change)="imageSelected($event)"
  />

  <button class="aws-button" (click)="identifyText()">
    Identify Text
  </button>
  <div class="identified-text" *ngIf="identifiedText">
    Identified words:
    <div *ngFor="let word of identifiedText.text.words">
      {{ word.text }}
    </div>
  </div>
</div>

Once this button is clicked, the identifyText function is called. We will define this function in the following way in our identify-text.component.ts:


import { Component, OnInit } from '@angular/core';
import { Predictions } from 'aws-amplify'; // <---NEW
import { IdentifyTextOutput } from '@aws-amplify/predictions'; // <---NEW

@Component({
  selector: 'app-identify-text',
  templateUrl: './identify-text.component.html',
  styleUrls: ['./identify-text.component.css']
})
export class IdentifyTextComponent implements OnInit {
  selectedFile: File | undefined = undefined; // <---NEW
  identifiedText: IdentifyTextOutput | undefined = undefined; // <---NEW constructor() {} ngOnInit(): void {} identifyText = async () => {
    if (!this.selectedFile) {
      return;
    }

    //ADD THIS FUNCTION
    Predictions.identify(
      {
        text: {
          source: {
            file: this.selectedFile
          }
        }
      },
      {}
    )
      .then(response => (this.identifiedText = response))
      .catch(err => console.log({ err }));
  };

  //OTHER CODE
}

In these changes we import some components we need related to Amplify Predictions. We add two properties to our components, the selectedFile which will hold the latest uploaded image and the identifiedText which will hold the latest results of identified text we received from AWS. We will then call the identify function with the selected image to be sent to AWS. The response will be set to the identifiedText field and the words will show up on the screen.

There is one more step we need to take. The Predictions component needs to be supplied with a Provider. This can be done in the main.ts file:


#other imports

import { Predictions } from 'aws-amplify';
import { AmazonAIPredictionsProvider } from '@aws-amplify/predictions';
Amplify.configure(aws_exports);
Predictions.addPluggable(new AmazonAIPredictionsProvider());

#other code

Once this is all done, we can run our application, upload an image, identify the text and see the results on the screen. When I used this image:

I got the following result:

There are more options to play around with, including identifying entities and labels in images and many ways to finetune the results. For more information on this, checkout the Amplify Predictions documentation.

The changes for this section can be found in this commit, including the changes needed for the CSS.

Convert text to speech

In this section we are going to add functionality to convert text to speech using Amplify Predictions.

Generating the backend resources

We will first generate the backend resources needed. We will run amplify add predictions with the following options:

  • Category: Convert
  • What to convert: Generate speech audio from text
  • Friendly name: <<use default>>
  • Source language: US English
  • Speaker: Kevin – Male
  • Access: Auth users only

The Amplify CLI output will look similar to this:


Evertsons-MacBook-Pro:theamplifyapp evertsoncroes$ amplify add predictions
✔ Please select from one of the categories below · Convert
✔ What would you like to convert? · Generate speech audio from text
✔ Provide a friendly name for your resource · speechGenerator11c4cfca
? What is the source language? ...  (Use arrow keys or type to filter)
✔ What is the source language? · US English
✔ Select a speaker · Kevin - Male
✔ Who should have access? · Auth users only
Successfully added resource speechGenerator11c4cfca locally

This is very similar to what we previously did for the text identification. We can run amplify push again to create the resources in AWS. The AWS service that will be used for this functionality is Amazon Polly. The costs for using Amazon Polly is around $4.00 per 1 million characters. 

These commands will add the following changes to our repository.

Adding text-to-speech functionality to the Predictions page

Similar as we did for the text-identification, we will add a component that will handle all of the text-to-speech functionality:

ng generate component components/categories/predictions/text-to-speech

Inside our predictions.component.html we will make sure to add the newly generated text-to-speech component:

<app-text-to-speech></app-text-to-speech>

Inside the text-to-speech.component.html, we will add a text input and a button to play the text:

<div class="container-fluid card-background">
  <h2>Text to speech</h2>
  <input
    type="text"
    id="textInput"
    name="textInput"
    (change)="textInputUpdated($event)"
  />

  <button class="aws-button" (click)="convertToAudio()">
    Play
  </button>
</div>

Now we need to hook up these elements to our text-to-speech.component.ts and call the Predictions component to do the conversion from text to an audio buffer for us. Finally, we play the audio:

import { Component, OnInit } from '@angular/core';
import { Predictions } from 'aws-amplify';
import { TextToSpeechOutput } from '@aws-amplify/predictions';

@Component({
  selector: 'app-text-to-speech',
  templateUrl: './text-to-speech.component.html',
  styleUrls: ['./text-to-speech.component.css']
})
export class TextToSpeechComponent implements OnInit {
  textInput: string | undefined = undefined;

  constructor() {}

  ngOnInit(): void {}

  convertToAudio = async () => {
    if (!this.textInput) {
      return;
    }

    Predictions.convert({
      textToSpeech: {
        source: {
          text: this.textInput
        },
        voiceId: 'Amy'
      }
    })
      .then(async result => {
        this.playAudio(result);
      })
      .catch(err => console.log({ err }));
  };

  playAudio = async (audio: TextToSpeechOutput) => {
    const context = new AudioContext();
    const buffer = await context.decodeAudioData(audio.audioStream);
    const source = context.createBufferSource();
    source.buffer = buffer;
    source.connect(context.destination);
    source.start();
  };

  textInputUpdated = (e: Event) => {
    const input = e.target as HTMLInputElement;
    this.textInput = input.value;
  };
}

Bug: There is currently, at the time of writing, a bug in Amplify that does not allow us to use the voiceId “Kevin”, which we selected when creating the backend resources. Selecting the voiceId “Amy” works, so we will use that.

In the code snippet above we create a field that will hold the text in the textInput. We have a method that is called to convert the text to audio using the Predictions component. The convert function will send an http request to Amazon Polly and an audioBuffer will be returned. This can be given to an AudioContext component to play to audio in the browser.

The changes made in these steps can be found in this commit.

Interpret the sentiment of text

The final set of functionality we are going to add is the ability to interpret the sentiment of text. 

Generating the backend resources

We will add the backend resources by again running amplify add predictions with the following options:

  • Category: Interpret
  • Friendly name: <<use default>>
  • Kind of interpretation: ALL
  • Access: Auth users only

The Amplify CLI output will look similar to this:

Evertsons-MacBook-Pro:theamplifyapp evertsoncroes$ amplify add predictions
✔ Please select from one of the categories below · Interpret
Only one option for [What would you like to interpret?]. Selecting [Interpret Text].
✔ Provide a friendly name for your resource · interpretText9001208a
✔ What kind of interpretation would you like? · All
✔ Who should have access? · Auth users only
Successfully added resource interpretText9001208a locally

And now we can run amplify push to create the resources in AWS. The AWS service that will be used for this functionality is Amazon Comprehend. The pricing for this service can be found here

These commands will add the following changes to our repository.

Adding text-interpret functionality to the Predictions page

We will first create a component that will handle the text-interpret functionality:

ng generate component components/categories/predictions/text-interpret

Inside our predictions.component.html we will make sure to add the newly generated text-interpret component:

<app-text-interpret></app-text-interpret>

Inside the text-interpret.component.html, we will add a text area input and a button to trigger the interpretation of the text and a text to show what the sentiment is:

<div class="container-fluid card-background">
    <h2>Interpret text</h2>
    <textarea
        id="textAreaInput"
        name="textAreaInput"
        rows="5"
        cols="66"
        (change)="textInputUpdated($event)"
    ></textarea>

    <button class="aws-button interpret-button" (click)="interpretText()">
        Interpret
    </button>
    <div *ngIf="interpretation">
        Interpretation = {{ interpretation.textInterpretation.sentiment?.predominant }}
    </div>
</div>

Now we will update the text-interpret.component.ts to hook up the functions defined here and call the Predictions component to interpret the text:


import { Component, OnInit } from '@angular/core';
import { Predictions } from 'aws-amplify';
import { InterpretTextCategories, InterpretTextOutput } from '@aws-amplify/predictions';

@Component({
  selector: 'app-text-interpret',
  templateUrl: './text-interpret.component.html',
  styleUrls: ['./text-interpret.component.css']
})
export class TextInterpretComponent implements OnInit {
  textInput: string | undefined = undefined;
  interpretation: InterpretTextOutput | undefined = undefined;
  constructor() { }

  ngOnInit(): void {
  }

  interpretText = async () => {
    if (!this.textInput) {
      return;
    }

    Predictions.interpret({
      text: {
        source: {
          text: this.textInput
        },
        type: InterpretTextCategories.ALL
      }
    }).then(result => this.interpretation= result)
    .catch(err => console.log({err}))
  }


  textInputUpdated = (e: Event) => {
    const input = e.target as HTMLInputElement;
    this.textInput = input.value;
  };
}

We can now try our entering text in our text area and doing a sentiment check. I Googled “happy poems”  and entered the first one I found:

Try adding different types of text to check the interpretation. Furthermore, the response to the interpret function also contains more information related to the interpretation of the text. Check the documentation for more information and possibilities. 

The changes made in these steps can be found in this commit

Up next: Tracking app usage with Amplify Analytics

In this blog we have used Amplify Predictions to identify text in images, convert text to speech and interpret the sentiment of text. There are more possibilities in this category, however, these examples should give you an idea. In the next article, we will look at using Amplify Analytics to collect analytics data for your application.