Łukasz Piłatowski

IT passionate with a head full of ideas

Posted by : at

Category : janitor

Oh my, there last two evenings after work were totally crazy.

What I did, both when I had time at work, and after work, when my kids went asleep, I sat and I was working on this authorization PoC I mentioned in previous post.

When discovering WebRTC I felt like a newbie again and this was GREAT. So many new concepts, and the feeling that this technology is available since 2013 and I have not heard about it once!

Finally, I managed to understand the whole WebRTC example, and even watch and understand the core concepts behind it. If you find it interesting, please check the video below

Afterwards, the time has come to integrate WebRTC with Janitor mechanism…

And, as you may expected, not everything went quite well…

Basically, the symptoms were simple - live stream from web was soo delayed that it was far from real-time and there was a certain lag that grew as the stream continued. Tried to decrease the video resolution - didn’t work. It was even worse… even though the lag was a bit smaller, the face detection stopped working - it just didn’t have enough data!

And these problems are the best!

After hours of wondering what should I do, after numerous trials and errors, here’s the final solution:

  • Introduce frame skipping - now, every 4th frame is being fed to the face recognition. This is done since the face is not moving so much between these frames, and we can save a lot of computing power thanks to this trick. Nice improvement here would be to integrate this with video framerate, so that if it goes up, more frames would be skipped, and if it goes down, more frames would be analyzed.

  • Change face encoding learning process - well, this change it major. Up till now, every new encoding, even if it matched, was fed into the encodings table, and was used for comparison in the following frames. This worked out pretty well with video analysis, where we had many people in one frame, and we could then easily detect which person is which.

    The downside, however, is a computation power required to parse larger and larger array of encodings - and this is what caused the lag.

    Now, I assumed that in learning mode, only one person should be visible in the frame, and it’s users responsibility to make it happen. This way, we don’t need to store every encoding, only those that doesn’t match to each other. This way, for my face, it only took 3 face encodings to correctly learn my face pattern.

When these changes were applied, we’re now able to use Janitor face detection pretty much real-time, with little-to-no lag, and video resolution of 640x480.

Now, since everything is set up and running - let’s move on to liveness detection, so that we can prevent photo attacks on our app.