Cancel

Paper Review. SlowFast Networks for Video Recognition@ICCV' 2019

JooChan Park on Jul 14, 20202020-07-14T19:00:00+09:00

1 min read

Paper link

Abstract

AVA Challenge 2019에서 Action recognition 1등
Video Recognition을 위한 네트워크 구조
두개의 pathway가 존재
- Slow pathway : spatial semantic 정보를 획득
- Fast pathway : motion at fine temporal resolution 정보를 획득
사람의 시각 시스템을 모방하여 모델을 구축함

Introduction

Spatial Structure

Temporal Events

SlowFast Networks

Biological Derivation

Proposed Method

Slow Pathway

Fast Pathway

Overall Process

Experiments

Dataset
- Kinetics-400
  - 306,245 video clips
  - 400 human action classes
- Kinetics-600
  - 495,547 video clips
  - 600 human action classes
- Charades
  - 9,848 video clips
  - 157 human action classes
- AVA dataset
  - 430 video clips
  - 80 human action classes

Action Classification

AVA Action Detection

Conclusions & Reviews

사람의 인지 시스템을 모방해 모델구조를 설계하였기 때문에 왜 이렇게 모델 구조를 만들었는지 이해가 됨
사람을 탐지하는 모델(Detectron)의 성능이 90프로가 넘으므로 모델을 그대로 사용하고, 사람의 액션을 예측하는데 초점을 둔 모델(Slowfast)을 만든 것 같음
AI Grand Challenge 첫번째 Task에서는 단순히 실신한 사람만 찾으면 되므로, 이 모델이 적합하지 않을 수 있음. 하지만, 나중에 복잡한 영상 인식 Task가 진행된다면 사용해 볼 수 있음 (클래스수가 많아지거나, 영상이 길어지는 경우)

Reference

Paper Reviews, Activity Recognition

Activity Recognition Object Detection CV

Recent Update

Trending Tags

CV Object Detection Deep Learning NLP Oriented Object Detection Probability Attention Machine Translation 3D Activity Recognition

Contents

Trending Tags

CV Object Detection Deep Learning NLP Oriented Object Detection Probability Attention Machine Translation 3D Activity Recognition