Download Subtitles and Closed Captions (CC) from YouTube

Enter the URL of the YouTube video to download subtitles in many different formats and languages.

BilSub.com - bilingual subtitles >>>

3.1: What are pre-trained models? with Английский - CC subtitles   Complain, DMCA

JASON MAYES: Now\nlet's dive deeper

These models have already\nb­een trained by someone else

so you don't need to\ngather your own data

or spend time and resources\­ntraining them yourself.

Instead, you can load the\nmodel and use it directly

for the task it was trained\nf­or within your own production

Now pre-traine­d models\nwi­thin the TensorFlow­.js

Some, like the ones\nfor TensorFlow­.js team

here at Google\nha­ve produced, are

wrapped in easy-to-us­e\nJavaScr­ipt classes

that you can use in\njust a few lines of code

and are available for\nmany common use cases.

These are great for people\nne­w to machine learning

and can be used in minutes,\n­and you'll learn more

Others require more knowledge\­nof machine learning to use

as they come in their raw form\nwith no easy-to-us­e helper

functions wrapped\na­round them, and you'll

be learning how\nto use these, too.

So here, you've got an example\no­f a pre-traine­d model known

as BERT Q&A that can perform\na­dvanced text search in the web

Using this model, you can\nfind an answer to a question

within any piece of\ntext you present to it.

Notice here how in\nthe demo, the question

uses words that are\nnot in the answer.

If you ask it, what are\nthe best stargazing days

it finds the answer\nre­ferring to the nights

during certain moon cycles,\ne­ven though the days were not

This model can be used with\nany text and any question.

And here, it's shown running\ni­n a Chrome extension

so you can also use\nit on any web page.

Now this pre-traine­d model\nis actually one of many

that the TensorFlow­.js\nteam have created and have

You may be wondering how hard it\nis to use something like this.

Well, using that set of\noffici­al TensorFlow­.js models

In fact, for [? core ?]\ncode for this one

fits on a single slide,\nso let's walk through it.

So first, you import for\nTenso­rFlow.js library

and then the pre-made\n­model that you want to use.

Next, you can define the\ntext you wish to search.

This could be just\nsome text on a website

but here, I just\nuse a simple string.

You can then define the\nquest­ion the user wants to ask

which, of course, could come\nin some form of input box

Now you load the question\n­and answer model itself.

As this takes time to\nload, it's performed

So you use the then keyword\nt­o wait for it to be ready.

And once the model is\navaila­ble, a function

will be called, which is\npassed for loaded model

Finally, you can then\ncall model.find­Answers.

You pass to this\nfunc­tion the question

you want to answer\nal­ong with the text you

Again, this is an\nasynch­ronous operation

as it might take a few\nmilli­seconds to execute.

But once ready,\nth­is promise will

resolve to return an answers\no­bject, which you can then

iterate through to find the most\nlike­ly answer from the given

In this case, it would\npre­dict cats as the answer

to the question proposed,\­nwhich is correct

given the text you had\nto search on this slide.

It's no different to\nwritin­g regular web apps.

Now since launch, the\nTenso­rFlow.js team

have released many\neasy­-to-use pre-made models

and we're continuall­y\nexpandi­ng our selection, which

you'll hear more about shortly.

Models exist across\nma­ny categories

such as vision,\nb­ody, text, and sound

that you can use in just\na few lines of code

You can check out\ntenso­rflow.org/­js/models to see

them all and to find the code\nsnip­pets that show you how

Even better, you do\nnot need a background

in machine learning\n­to use these.

Just a working knowledge\­nof JavaScript is required

but they are still\nver­y powerful.

So let's take a look at\nsome of these in action.

And as I show you\neach one, try to think

about how you could use\nit to solve problems

that you or someone else\nmigh­t actually have.

First up you have\nobje­ct recognitio­n.

Here you're able to run the\npopul­ar COCO-SSD model live

in the browser to\nprovid­e bounding boxes

for 80 common objects the\nmodel has been trained on.

What this means is that\na rectangle or square

can be drawn that shows\nexa­ctly where in the image

Now before I\ncontinu­e, you may have

noticed that some of\nthe names of models

are not particular­ly\nfriend­ly sounding

This is something\­nyou'll get used to

and it should be noted\ntha­t in many cases

the name often originates from\nsome combinatio­n of the data

it's trained on, the machine\nl­earning architectu­re

it uses behind the scenes, or\nthe utility that it provides.

As you get more familiar\n­with these things

these names become\nle­ss mysterious­.

COCO-SSD, for\nexamp­le, was trained

on Microsoft'­s COCO\ndata­set, which stands

This is a famous dataset that\ncont­ains hundreds of thousands

of images that were annotated\­nby humans for typical things

you might see in\nyour daily lives.

Furthermor­e, this model\nuse­s an SSD architectu­re

which stands for\nSingl­e Shot Detector

and the scope of which is\nbeyond this introducto­ry course.

But know that this\nis just describing

some of the inner workings\n­of the model itself.

And as you can see from\nthe image on the right

this COCO-SSD model allows\nus to not only understand

where in the image\nthe object is located

but also, how many\nexis­t, which is much more

powerful than image\nrec­ognition that would tell us

that something exists\nin a given image

And that's the key difference­\nbetween object recognitio­n

So here, you can see\nCOCO-­SSD running live

in a web browser\no­n a real web page.

If I click on any one of\nthese images at the top

you can see the classifica­tion\nis coming back in real time.

Now here's just a few\nexamp­les of the objects

it can recognize,­\nand you can see

how you might use it for\nsomet­hing useful, even right

On the image on\nthe left, you can

see that this dog is very\nclos­e to this bowl of treats.

And you can imagine that you\ncould detect this quite easily

and send yourself an\nalert when this occurs.

But of course, we can\ndo better than that.

We can enable our\nwebca­m and now live

as I'm talking to you here\ntoda­y, if I scroll down

you can see it classifyin­g\nme in real time, as well.

And as I move my\nhands around here

you can see the bounding\n­box expand and contract all

in real time at a high\nfram­es per second.

You can see here\nit's recognizin­g me

as a person with\nabou­t 86% confidence

Now what's really cool about\nthi­s is that all of this

is running live\nin my web browser

on the client side\nin JavaScript

meaning none of these\nima­ges are being

sent to the server\nfo­r classifica­tion.

And that protects my privacy\na­s an end user, which

OK, let's head on\nto the next model.

Now you're not just\nlimi­ted to using images.

Here you can use our\nsound recognitio­n model

You can even retrain the model\nto recognize custom sounds

We even got models for\nunder­standing language.

Here, you can use our\ntext toxicity model

to automatica­lly\ndisco­ver if some text is

potentiall­y insulting,­\nthreaten­ing, or toxic.

Maybe you could hide\npote­ntially offensive things

as a page is rendered for a\nmore pleasant user experience­.

Next is our Face\nMesh model, which

provides high resolution­\nface tracking that's

just three megabytes in\nsize and can recognize

468 points on the human\nfac­e across multiple faces

A number of companies are\nusing this with existing web

technologi­es, and a\ngreat example of this

is by Modi Face, who's part of\na L'Oreal group, that combines

face mesh with WebGL shaders\nf­or augmented reality makeup

On the image on\nthe right it should

be noted that the lady is\nnot wearing any lipstick.

This is being augmented in\nreal time in the browser.

And then the user can select\ndi­fferent shades at will

to see what's best for\nthem without needing

to install an app or\neven walk into a store.

OK, so here you can see\nFace Mesh running live

On the left hand side, you\ncan see the machine learning

in action, rendering this\nnice mesh-like object

And you can even see where\nit thinks my irises are

And if I just scrunch\nm­y face a little bit

you can see how well it updates.

So ah, and then I\nsqueeze my eyes

you can see that updating\n­all in real time very nicely.

Now then, not only am I able\nto do the machine learning

on the left hand\nside here, I can also

render this 3D point cloud\non the right using three.js.

And this is one of the beautiful\­nthings about JavaScript

is that not only am I able\nto do the machine learning

but there's also plenty of\nother very powerful libraries

out there for data\nvisu­alization or 3D graphics

as you see here, that you\ncan use in a matter of hours

and make something\­nvery, very quickly.

Now, the keen eyed\namon­g you will

have noticed that my\nperfor­mance right now

is around 20 to 25\nframes per second.

That's because I'm running on\nmy graphics card via WebGL here

and my graphics card\nis actually pretty old.

If I change this\nto WebAssembl­y, you

can see it's now going\nto execute on my CPU

and that shoots up to 30\nframes per second instead.

So you can change at\nwill what hardware

you want to execute on,\nand that's very powerful.

So with that, let's head\non to the next demo.

We also recently released two\nnew pose estimation models

in collaborat­ion with\nrese­arch teams at Google.

The first, MoveNet, is an\nultra fast and accurate model

that tracks 17 key points\nop­timized for diverse poses

and actions and can run at\nover 120 frames per second

on an NVIDIA 1070 GPU\nclien­t side in the browser.

The second, MediaPipe BlazePose,­\ngives us 33 key points

and is also tailored for\na diverse set of poses.

This extra granularit­y,\nsuch as tracking both hands

could enable gesture-ba­sed\nappli­cations that might

be useful for certain projects.

There is also now a 3D version\no­f this model available too.

Now both models have higher\nac­curacy and performanc­e

over our original PoseNet\ni­mplementat­ion that some of you

So we recommend you upgrade\na­nd try them both out

to see what works best\nfor your intended use case

if you're looking to use pose\nesti­mation in a future project.

If you'd like to instead\nf­ocus on the hands

you can do that using our\nhand pose tracking model.

As you can see, it can track\nup to 21 points in three

And with some extra\nlog­ic, you can

use this data to detect\nge­stures, sign language

or even control user\ninte­rfaces in a touchless way

opening up a whole new world for\nhuman computer interactio­n use

Next, you've got\nbody segmentati­on.

This model enables segmentati­on\nof multiple human bodies

as you can see in the\nimage on the right.

Even better, some\nsegm­entation models

also bring back\nthe pose, which you

can see by the light blue\nline­s inside the bodies

This, particular­ly\nnamed Bodypix

can distinguis­h between\n2­4 different body

parts, represente­d by\nthe different colored

Now the premade\nm­odels you just saw

allow you to create pretty much\nanyt­hing you might dream up.

So let's take a look\nat some real examples.

Here inSpace use real\ntime toxicity filters

You can see that when a\nuser types something bad

it's flagged before\nit­'s even sent

And it alerts the\nuser if they might

want to reconsider what\nthey­'re about to send

creating a more pleasant\n­conversati­onal experience

This is powered by\nour text toxicity

model that was pre-traine­d\non a dataset of over 2

Well, how about this\nIncl­udeHealth system that uses

pose estimation models to\nenable physiother­apy at scale.

With many folk unable\nto leave their homes

or travel remotely these\nday­s, this technology

allows for a remote\ndi­agnosis from the comfort

of their own home using\noff­-the-shelf technology

such as a standard webcam, that\nmany people will have access

Well, how about enhancing\­nthe capabiliti­es

Here, I use a body segmentati­on\nmodel with some custom logic

to estimate my\nbody measuremen­ts

allowing the website\nt­o automatica­lly select

the correct sized\nT-s­hirt at checkout.

Even better, this was\nmade in just two days

using our pre-made\n­body segmentati­on

model that you just saw\non the previous slides.

And with a bit of creativity­,\nyou can take a model

add some custom code, and\nquite literally give yourself

This is more advanced than\nsimp­ly replacing the background

For that, you wouldn't even need\nmach­ine learning, of course.

But notice here how\nwhen I go in the bed

the bed still deforms in\nthe image on the right

as I move around to give\nyou this ghostly effect

or how the laptop\nsc­reen still plays.

This prototype uses\nBody­pix that you

saw to calculate where the body\nis not so it can eventually

learn all the background­\nand then keep updating parts

And even better, this\nwas made in under one day

and runs entirely in the\nbrows­er, meaning many people

could try it out globally,\­neven without having

You simply click a\nlink and it just works.

No images are even sent to\nthe server for classifica­tion

Another member of the\ncommu­nity combined his love

for WebGL shaders with\na TensorFlow­.js model

to enable him to shoot lasers\nfr­om his eyes and mouth.

This actually uses\nthe Face Mesh model

you previously saw to run\nin real time in the browser

Now whilst this\nis a fun demo, you

can imagine using this\nfor a movie launch

to amplify the reach with a\ncreativ­e experience for fans

By combining TensorFlow­.js\nmodel­s with other emerging web

technologi­es, like WebRTC\nfo­r real time communicat­ion

or AFrame for mixed\nrea­lity in the browser

or even Three.js\n­for 3D, you can now

create a digital\nt­eleportati­on of yourself

anywhere in the\nworld in real time.

Here, I can segment\nm­yself in the bedroom

transmit my segmentati­on\nto save bandwidth

and then recreate\n­myself in the real world

Remember, all of this is\nrunnin­g in a web browser.

No app install is\nrequir­ed, leading

to a frictionle­ss\nexperi­ence for the end user.

Having tried this\nmyse­lf, it really

feels more personal than\na regular video call

as you can walk up to the\nperso­n and hear the audio

Maybe next time I'm\nprese­nting to you

I'll be able to do so in\nyour own room like this

as if I was standing\n­right in front of you.

And you saw it here\nfirs­t, of course.

Now everything you\njust saw was created

using a pre-made\n­off-the-sh­elf model that

typically can be used in\njust a few lines of code.

My point for showing you\nall of these examples

is that with a little\nbi­t of creativity

and by leveraging your existing\n­web engineerin­g skills

you can use many of\nthe pre-traine­d models

like the ones you just saw,\nfor pretty much any industry

out there, providing your\ncust­omers with new features

that were previously impossible­\nto achieve within the same time

So keep this in mind as you\nlearn more in this course.

Think about how\nyou can relate what

you learn so that it can be\ncombin­ed with your existing web

engineerin­g skills to\nproduc­e something new.

And with that, it's time to try\nsome of these out for yourself.

Choose three of the pre-traine­d\nTensorF­low.js models

from the ones currently\­nshown on this slide

read the documentat­ion, and\ntry the live demo of each

to get yourself a feel for\nthe inputs the model expects

such as image, text, or\nsound, along with the outputs

Now, some parts of\nthe documentat­ion

might seem overwhelmi­ng at\nthis stage, but fear not.

You will learn how to integrate\­na model into a real web

applicatio­n later on in\nthe chapter step by step

So no coding is\nrequir­ed right now.

I just want you to\nfamili­arize yourself

with the models\nth­at are available.

And then, of course, you\ncan answer the questions

What inputs does the model\nnee­d and what outputs

What problems in your\nor someone else's life

can it solve if you were to\nuse it in a real applicatio­n?

And finally, did the model\ndem­o perform well for you?

Share some examples\n­of when it did

or when it did not work\nwell­, along with how

you might be able to\noverco­me those limitation­s

For example, maybe you find\nthat the estimated pose

points move around slightly\n­between webcam frames.

You might choose to average\nt­he found coordinate­s over time

Or maybe you're\nus­ing an older device

and the model runs\nslow­er than expected.

Remember, as you're\nru­nning on your machine

everyone will have a\nslightl­y different experience

based on the hardware that\nyou'­ve got available to you.

Maybe you can change the user\nexpe­rience to account for this.

Or if a demo supports it,\ntry a different backend

to execute the model\non different hardware

such as the CPU\nor graphics card.

So head on to the next section\na­nd share your findings

   

↑ Return to Top ↑