Enter the URL of the YouTube video to download subtitles in many different formats and languages.
JASON MAYES: Now\nlet's dive deeper
These models have already\nbeen trained by someone else
so you don't need to\ngather your own data
or spend time and resources\ntraining them yourself.
Instead, you can load the\nmodel and use it directly
for the task it was trained\nfor within your own production
Now pre-trained models\nwithin the TensorFlow.js
Some, like the ones\nfor TensorFlow.js team
here at Google\nhave produced, are
wrapped in easy-to-use\nJavaScript classes
that you can use in\njust a few lines of code
and are available for\nmany common use cases.
These are great for people\nnew to machine learning
and can be used in minutes,\nand you'll learn more
Others require more knowledge\nof machine learning to use
as they come in their raw form\nwith no easy-to-use helper
functions wrapped\naround them, and you'll
be learning how\nto use these, too.
So here, you've got an example\nof a pre-trained model known
as BERT Q&A that can perform\nadvanced text search in the web
Using this model, you can\nfind an answer to a question
within any piece of\ntext you present to it.
Notice here how in\nthe demo, the question
uses words that are\nnot in the answer.
If you ask it, what are\nthe best stargazing days
it finds the answer\nreferring to the nights
during certain moon cycles,\neven though the days were not
This model can be used with\nany text and any question.
And here, it's shown running\nin a Chrome extension
so you can also use\nit on any web page.
Now this pre-trained model\nis actually one of many
that the TensorFlow.js\nteam have created and have
You may be wondering how hard it\nis to use something like this.
Well, using that set of\nofficial TensorFlow.js models
In fact, for [? core ?]\ncode for this one
fits on a single slide,\nso let's walk through it.
So first, you import for\nTensorFlow.js library
and then the pre-made\nmodel that you want to use.
Next, you can define the\ntext you wish to search.
This could be just\nsome text on a website
but here, I just\nuse a simple string.
You can then define the\nquestion the user wants to ask
which, of course, could come\nin some form of input box
Now you load the question\nand answer model itself.
As this takes time to\nload, it's performed
So you use the then keyword\nto wait for it to be ready.
And once the model is\navailable, a function
will be called, which is\npassed for loaded model
Finally, you can then\ncall model.findAnswers.
You pass to this\nfunction the question
you want to answer\nalong with the text you
Again, this is an\nasynchronous operation
as it might take a few\nmilliseconds to execute.
But once ready,\nthis promise will
resolve to return an answers\nobject, which you can then
iterate through to find the most\nlikely answer from the given
In this case, it would\npredict cats as the answer
to the question proposed,\nwhich is correct
given the text you had\nto search on this slide.
It's no different to\nwriting regular web apps.
Now since launch, the\nTensorFlow.js team
have released many\neasy-to-use pre-made models
and we're continually\nexpanding our selection, which
you'll hear more about shortly.
Models exist across\nmany categories
such as vision,\nbody, text, and sound
that you can use in just\na few lines of code
You can check out\ntensorflow.org/js/models to see
them all and to find the code\nsnippets that show you how
Even better, you do\nnot need a background
in machine learning\nto use these.
Just a working knowledge\nof JavaScript is required
but they are still\nvery powerful.
So let's take a look at\nsome of these in action.
And as I show you\neach one, try to think
about how you could use\nit to solve problems
that you or someone else\nmight actually have.
First up you have\nobject recognition.
Here you're able to run the\npopular COCO-SSD model live
in the browser to\nprovide bounding boxes
for 80 common objects the\nmodel has been trained on.
What this means is that\na rectangle or square
can be drawn that shows\nexactly where in the image
Now before I\ncontinue, you may have
noticed that some of\nthe names of models
are not particularly\nfriendly sounding
This is something\nyou'll get used to
and it should be noted\nthat in many cases
the name often originates from\nsome combination of the data
it's trained on, the machine\nlearning architecture
it uses behind the scenes, or\nthe utility that it provides.
As you get more familiar\nwith these things
these names become\nless mysterious.
COCO-SSD, for\nexample, was trained
on Microsoft's COCO\ndataset, which stands
This is a famous dataset that\ncontains hundreds of thousands
of images that were annotated\nby humans for typical things
you might see in\nyour daily lives.
Furthermore, this model\nuses an SSD architecture
which stands for\nSingle Shot Detector
and the scope of which is\nbeyond this introductory course.
But know that this\nis just describing
some of the inner workings\nof the model itself.
And as you can see from\nthe image on the right
this COCO-SSD model allows\nus to not only understand
where in the image\nthe object is located
but also, how many\nexist, which is much more
powerful than image\nrecognition that would tell us
that something exists\nin a given image
And that's the key difference\nbetween object recognition
So here, you can see\nCOCO-SSD running live
in a web browser\non a real web page.
If I click on any one of\nthese images at the top
you can see the classification\nis coming back in real time.
Now here's just a few\nexamples of the objects
it can recognize,\nand you can see
how you might use it for\nsomething useful, even right
On the image on\nthe left, you can
see that this dog is very\nclose to this bowl of treats.
And you can imagine that you\ncould detect this quite easily
and send yourself an\nalert when this occurs.
But of course, we can\ndo better than that.
We can enable our\nwebcam and now live
as I'm talking to you here\ntoday, if I scroll down
you can see it classifying\nme in real time, as well.
And as I move my\nhands around here
you can see the bounding\nbox expand and contract all
in real time at a high\nframes per second.
You can see here\nit's recognizing me
as a person with\nabout 86% confidence
Now what's really cool about\nthis is that all of this
is running live\nin my web browser
on the client side\nin JavaScript
meaning none of these\nimages are being
sent to the server\nfor classification.
And that protects my privacy\nas an end user, which
OK, let's head on\nto the next model.
Now you're not just\nlimited to using images.
Here you can use our\nsound recognition model
You can even retrain the model\nto recognize custom sounds
We even got models for\nunderstanding language.
Here, you can use our\ntext toxicity model
to automatically\ndiscover if some text is
potentially insulting,\nthreatening, or toxic.
Maybe you could hide\npotentially offensive things
as a page is rendered for a\nmore pleasant user experience.
Next is our Face\nMesh model, which
provides high resolution\nface tracking that's
just three megabytes in\nsize and can recognize
468 points on the human\nface across multiple faces
A number of companies are\nusing this with existing web
technologies, and a\ngreat example of this
is by Modi Face, who's part of\na L'Oreal group, that combines
face mesh with WebGL shaders\nfor augmented reality makeup
On the image on\nthe right it should
be noted that the lady is\nnot wearing any lipstick.
This is being augmented in\nreal time in the browser.
And then the user can select\ndifferent shades at will
to see what's best for\nthem without needing
to install an app or\neven walk into a store.
OK, so here you can see\nFace Mesh running live
On the left hand side, you\ncan see the machine learning
in action, rendering this\nnice mesh-like object
And you can even see where\nit thinks my irises are
And if I just scrunch\nmy face a little bit
you can see how well it updates.
So ah, and then I\nsqueeze my eyes
you can see that updating\nall in real time very nicely.
Now then, not only am I able\nto do the machine learning
on the left hand\nside here, I can also
render this 3D point cloud\non the right using three.js.
And this is one of the beautiful\nthings about JavaScript
is that not only am I able\nto do the machine learning
but there's also plenty of\nother very powerful libraries
out there for data\nvisualization or 3D graphics
as you see here, that you\ncan use in a matter of hours
and make something\nvery, very quickly.
Now, the keen eyed\namong you will
have noticed that my\nperformance right now
is around 20 to 25\nframes per second.
That's because I'm running on\nmy graphics card via WebGL here
and my graphics card\nis actually pretty old.
If I change this\nto WebAssembly, you
can see it's now going\nto execute on my CPU
and that shoots up to 30\nframes per second instead.
So you can change at\nwill what hardware
you want to execute on,\nand that's very powerful.
So with that, let's head\non to the next demo.
We also recently released two\nnew pose estimation models
in collaboration with\nresearch teams at Google.
The first, MoveNet, is an\nultra fast and accurate model
that tracks 17 key points\noptimized for diverse poses
and actions and can run at\nover 120 frames per second
on an NVIDIA 1070 GPU\nclient side in the browser.
The second, MediaPipe BlazePose,\ngives us 33 key points
and is also tailored for\na diverse set of poses.
This extra granularity,\nsuch as tracking both hands
could enable gesture-based\napplications that might
be useful for certain projects.
There is also now a 3D version\nof this model available too.
Now both models have higher\naccuracy and performance
over our original PoseNet\nimplementation that some of you
So we recommend you upgrade\nand try them both out
to see what works best\nfor your intended use case
if you're looking to use pose\nestimation in a future project.
If you'd like to instead\nfocus on the hands
you can do that using our\nhand pose tracking model.
As you can see, it can track\nup to 21 points in three
And with some extra\nlogic, you can
use this data to detect\ngestures, sign language
or even control user\ninterfaces in a touchless way
opening up a whole new world for\nhuman computer interaction use
Next, you've got\nbody segmentation.
This model enables segmentation\nof multiple human bodies
as you can see in the\nimage on the right.
Even better, some\nsegmentation models
also bring back\nthe pose, which you
can see by the light blue\nlines inside the bodies
This, particularly\nnamed Bodypix
can distinguish between\n24 different body
parts, represented by\nthe different colored
Now the premade\nmodels you just saw
allow you to create pretty much\nanything you might dream up.
So let's take a look\nat some real examples.
Here inSpace use real\ntime toxicity filters
You can see that when a\nuser types something bad
it's flagged before\nit's even sent
And it alerts the\nuser if they might
want to reconsider what\nthey're about to send
creating a more pleasant\nconversational experience
This is powered by\nour text toxicity
model that was pre-trained\non a dataset of over 2
Well, how about this\nIncludeHealth system that uses
pose estimation models to\nenable physiotherapy at scale.
With many folk unable\nto leave their homes
or travel remotely these\ndays, this technology
allows for a remote\ndiagnosis from the comfort
of their own home using\noff-the-shelf technology
such as a standard webcam, that\nmany people will have access
Well, how about enhancing\nthe capabilities
Here, I use a body segmentation\nmodel with some custom logic
to estimate my\nbody measurements
allowing the website\nto automatically select
the correct sized\nT-shirt at checkout.
Even better, this was\nmade in just two days
using our pre-made\nbody segmentation
model that you just saw\non the previous slides.
And with a bit of creativity,\nyou can take a model
add some custom code, and\nquite literally give yourself
This is more advanced than\nsimply replacing the background
For that, you wouldn't even need\nmachine learning, of course.
But notice here how\nwhen I go in the bed
the bed still deforms in\nthe image on the right
as I move around to give\nyou this ghostly effect
or how the laptop\nscreen still plays.
This prototype uses\nBodypix that you
saw to calculate where the body\nis not so it can eventually
learn all the background\nand then keep updating parts
And even better, this\nwas made in under one day
and runs entirely in the\nbrowser, meaning many people
could try it out globally,\neven without having
You simply click a\nlink and it just works.
No images are even sent to\nthe server for classification
Another member of the\ncommunity combined his love
for WebGL shaders with\na TensorFlow.js model
to enable him to shoot lasers\nfrom his eyes and mouth.
This actually uses\nthe Face Mesh model
you previously saw to run\nin real time in the browser
Now whilst this\nis a fun demo, you
can imagine using this\nfor a movie launch
to amplify the reach with a\ncreative experience for fans
By combining TensorFlow.js\nmodels with other emerging web
technologies, like WebRTC\nfor real time communication
or AFrame for mixed\nreality in the browser
or even Three.js\nfor 3D, you can now
create a digital\nteleportation of yourself
anywhere in the\nworld in real time.
Here, I can segment\nmyself in the bedroom
transmit my segmentation\nto save bandwidth
and then recreate\nmyself in the real world
Remember, all of this is\nrunning in a web browser.
No app install is\nrequired, leading
to a frictionless\nexperience for the end user.
Having tried this\nmyself, it really
feels more personal than\na regular video call
as you can walk up to the\nperson and hear the audio
Maybe next time I'm\npresenting to you
I'll be able to do so in\nyour own room like this
as if I was standing\nright in front of you.
And you saw it here\nfirst, of course.
Now everything you\njust saw was created
using a pre-made\noff-the-shelf model that
typically can be used in\njust a few lines of code.
My point for showing you\nall of these examples
is that with a little\nbit of creativity
and by leveraging your existing\nweb engineering skills
you can use many of\nthe pre-trained models
like the ones you just saw,\nfor pretty much any industry
out there, providing your\ncustomers with new features
that were previously impossible\nto achieve within the same time
So keep this in mind as you\nlearn more in this course.
Think about how\nyou can relate what
you learn so that it can be\ncombined with your existing web
engineering skills to\nproduce something new.
And with that, it's time to try\nsome of these out for yourself.
Choose three of the pre-trained\nTensorFlow.js models
from the ones currently\nshown on this slide
read the documentation, and\ntry the live demo of each
to get yourself a feel for\nthe inputs the model expects
such as image, text, or\nsound, along with the outputs
Now, some parts of\nthe documentation
might seem overwhelming at\nthis stage, but fear not.
You will learn how to integrate\na model into a real web
application later on in\nthe chapter step by step
So no coding is\nrequired right now.
I just want you to\nfamiliarize yourself
with the models\nthat are available.
And then, of course, you\ncan answer the questions
What inputs does the model\nneed and what outputs
What problems in your\nor someone else's life
can it solve if you were to\nuse it in a real application?
And finally, did the model\ndemo perform well for you?
Share some examples\nof when it did
or when it did not work\nwell, along with how
you might be able to\novercome those limitations
For example, maybe you find\nthat the estimated pose
points move around slightly\nbetween webcam frames.
You might choose to average\nthe found coordinates over time
Or maybe you're\nusing an older device
and the model runs\nslower than expected.
Remember, as you're\nrunning on your machine
everyone will have a\nslightly different experience
based on the hardware that\nyou've got available to you.
Maybe you can change the user\nexperience to account for this.
Or if a demo supports it,\ntry a different backend
to execute the model\non different hardware
such as the CPU\nor graphics card.
So head on to the next section\nand share your findings