Human Activity Knowledge Engine (HAKE) aims at promoting human activity/action understanding. As a large-scale knowledge base, HAKE is built upon existing activity datasets and provides finer-grained body part-level atomic action labels (Part States). We use Activity2Vec pre-trained on HAKE to recognize human part states first, then composes and reasons out the instance-level human activities. Activity2Vec works like ImageNet pre-trained backbone. With a human box as input, Activity2Vec will convert it into a fixed-size vector combining visual and linguistic features for diverse downstream tasks, i.e. image/video action recognition/detection, image caption, VQA, visual reasoning, image retrieval and so on. With the power of HAKE, conventional instance-based methods enhanced with part-based Activity2Vec outperform state-of-the-art approaches on several large-scale activity benchmarks (HICO, HICO-DET, V-COCO, AVA, etc).
HAKE is still under construction, we will keep enriching and enlarging it. Come and join us!