Scroll down for additional information on the dataset/annotation format and licensing…
HOOT Dataset Structure
Folder Structure
HOOT dataset has the following folder structure:
- The dataset is organized by object class and video name.
- The video frames have been extracted in the
.png
format and named using 0-indexing. - Basic video-level metadata information (e.g. target and motion attributes) can be found in the
meta.info
file under each video folder. - Video annotations are given in the
anno.json
file in the video folder. More information on the annotation format can be found below. - Videos in the training split, in the
train.txt
file. - Videos in the test split, in the
test.txt
file. - License information for the dataset can be found in the root HOOT folder, in the
license.txt
file.
HOOT/ ├── apple/ │ ├── 001/ │ │ ├── 000000.png │ │ ├── 000001.png │ │ ├── ... │ │ ├── 000949.png │ │ ├── meta.info │ │ ├── anno.json │ ├── 002/ │ ├── ... │ └── 020/ ├── .../ ├── zebra/ ├── test.txt ├── train.txt └── license.txt
Annotation Format
HOOT annotations are in the JSON format, and include the following:
- Video key in the
class-video_name
format, (e.g.apple-001
). - Percentage of frames that have occlusion,
frame_occlusion_level
. - Median and mean occlusion level of the target across frames, computed using mask IoU with the ground truth target box.
- A list of frame annotations in the form of python dictionaries, each dict includes:
- A
frame_id
key with the frame index as its value. rot_bb
andaa_bb
keys for the rotated and axis-aligned bounding boxes for the object. These bounding boxes are in the form of[(x1,y1),(x2,y2),(x3,y3),(x4,y4)]
, and contain floats for each of the points.- The
mask
dictionary contains the occlusion masks for each frame. If a specific occlusion mask does not appear in the frame, the value will be[]
. Otherwise, the mask will be given in the RLE format popularized by the COCO annotations. The toolkit will requirepycocotools
and provides examples for reading the annotations. If there's an occlusion in the frame, any of the following mask types might be annotated:all
: the mask for all occluders computed by taking the union of all masks.s
: the mask for all solid occluders combined.sp
: the mask for all sparse occluders combined.st
: the mask for all semi-transparent occluders combined.st
: the mask for all transparent occluders combined.
- A
- An
attributes
dictionary with the frame-level occlusion tags for absent, full_occlusion, cut_by_frame, partial_obj_occlusion and similar_occluder.
{ "video_key": apple-001, "frame_occlusion_level": 0.6715, "median_target_occlusion_level": 0.6080, "mean_target_occlusion_level": 0.6487, "frames": [ ... { "frame_id": 20, "aa_bb": [(x1,y1),(x2,y2),(x3,y3),(x4,y4)], "rot_bb": [(x1,y1),(x2,y2),(x3,y3),(x4,y4)], "masks": { "all": "RLE encoded mask" "s": "RLE encoded mask" "sp": [] "st": [] "t": "RLE encoded mask" }, "attributes": { "absent": false "full_occlusion": false "cut-by-frame": true "partial_obj_occlusion": true "similar_occluder": false } }, ... ] }
License Information
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
For inquiries regarding a commercial license to this work please contact the USC Stevens Center for Innovation at licensing@stevens.usc.edu and reference case number 2022-179.