SGGpoint

Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis (CVPR2021)

[Project page] [Repo. page] [Arxiv] [Supp.] [Video]

Dataset Description

For our real-world SGGpoint studies, we manually checked and carefully revised the raw 3DSSG datasets (extended from 3RScan) to a cleaned version (3DSSG-O27R16), with mappings to more popular and meaningful sets of instance semantics & structural relationships. 3DSSG-O27R16 stands for the preprocessed 3DSSG dataset annotated with 27 Object classes and 16 types of structural Relationships - please refer to our [Supp.] for more info.

To download our preprocessed 3DSSG-O27R16 dataset, please follow the instructions in our project page - or you could also derive these preprocessed data yourselves by following the step-by-step guidance below.

Structure of our 3DSSG-O27R16. There are mainly two kinds of files in our dataset, namely the dense 10-dim point cloud representitaons ("10dimPoints") and our updated scene graph annotations ("SceneGraphAnnotation.json"). Two preprocessing scripts (one for point_cloud_sampling and one for scene_graph_remapping) are available in this repository. The dict structure of SceneGraphAnnotation.json is summarized below:

SceneGraphAnnotation.json

Structure -> {
    ...,
    scene_id: {
    
        'nodes': {
            ..., 
            obj_id: {
                'instance_id': obj_id,
                'instance_color': instance_color_encoding,
                'rio27_enc': rio27_class_id, 
                'rio27_name': rio27_class_name, 
                'raw528_enc': raw528_class_id, 
                'raw528_name': raw528_class_name, 
            },
            ...
        },
        
        'edges': [
            ...,
            [src_obj_id, dst_obj_id, rel_class_id, rel_class_name],
            ...
        ]
    },
    ...
}


Example -> {
    ...,
    'f62fd5fd-9a3f-2f44-883a-1e5cf819608e': {
    
        'nodes': {
            ...,
            '11': {
                'instance_id': 11,
                'instance_color': '#c49c94',
                'rio27_enc': 3, 
                'rio27_name': 'cabinet', 
                'raw528_enc': 68, 
                'raw528_name': 'cabinet', 
            }, 
            '40': {
                'instance_id': 40,
                'instance_color': '#cd7864',
                'rio27_enc': 12, 
                'rio27_name': 'curtain', 
                'raw528_enc': 129, 
                'raw528_name': 'curtain', 
            },
            ...
        },
        
        'edges': [
            ...,
            ['54', '8', '3', 'lying on'],
            ['35', '8', '15', 'close by'],
            ['20', '2', '14', 'spatial proximity'],
            ['15', '4', '1', 'attached to'],
            ...
        ]
    },
    ...
}

Dataset Preprocessing

Below we would go through and explain each preprocessing step in generating our 3DSSG-O27R16 dataset (for your information and interests only):

A. Point Cloud Sampling

sampling 10-dim point cloud from obj files with mesh info. discarded.

  1. CloudCompare software is adopted for sampling dense point cloud representations (PC9-dim) for scenes encoded within the raw mesh.refined.obj files. The surface density was set to 10k points per square unit, and the generated point clouds are of 9-dim (3-dim coordinate, 3-dim color, and 3-dim normal vector). Our corresponding batch sampling script is available here.
  2. Fetch a relatively sparse point cloud instance mask (IM4-dim) (with per-point instance_idx annotation) from the raw labels.instances.annotated.ply files.
  3. Overlap IM4-dim to PC9-dim to generate the desired 10-dim dense point cloud representations, including the information of 3-dim coordinate (x, y, z), 3-dim color (R, G, B), 3-dim normal vector (Nx, Ny, Nz), and 1-dim instance_idx. Specifically, the instance annotation for each point from PC9-dim is decidied as the corresponding annotation of its nearest point from IM4-dim. These outputs are saved as "10dimPoints/[Scene_IDx].txt" in our 3DSSG-O27R16 dataset.
  4. Some quick demo. img

B. Updates on Scene Graph Annotations

full lists of remapped & relabelled annotations can be found in our [Supp.].

  1. Node (object) Remapping: Remap raw object classes into their another popular 27-class set (RIO27). Note: there exist 28 unique ID-Label pairs under the [“RIO27 Mapping” coloum], including an unexpected "0 : -" annotation - which contains a few rare objects such as juicer, hand dryer, baby bed, and cosmetics kit. As the handling of data imbalance is out of our focused research scope, we would treat them as invalid objects under our task context in Recalibration Step below.
  2. Edge (Relationship) Relabelling: The raw semi-automatically generated 3DSSG dataset is a mix beween semi-automatic computation and manual annotation. To improve edge-level annotation quality, we manually checked and corrected through the following sub-steps to derive a multi-class label set (rather than a multi-label set) of meaningful structure relationships:
    1. We firstly removed all comparative annotations (e.g., big1ger-than, same-material, and more-comfortable-than) to focus more on the structural relationships (e.g., standing-on, leaning-against), resulting in a 19-class label set.
    2. We aggregated four directional proximity relationships (namely, left, right, behind, and front - which could be easily decided by the bbox centers in post-processing we suppose) into one NEW type named as Spatial Proximity. This aggregation addresses ill-posed orientation-variant observations on inter-object relationships (and therefore, releases the power of rotation-involved data augmentation techniques), resulting in a 16-class label set now.
      * Recall from 3DSSG: these directional proximity relationships are only considered and annotated between the nodes that share a support parent => The same condition applies to our 'Spatial Proximity' now.
    3. We then noticed the majority (nearly 80%) of edge annotation was under multi-class setting already, while the resting multi-label edges mainly conflicted to the ambiguous close-by annotation. After a careful study, we managed to produce a multi-class label set through strategies below. TL;NR: we re-defined close-by1 and Spatial Proximity2 relations, such that a priority list of interests can be retained as “? > Spatial Proximity > close-by”.
      Let '?' denotes any other labels except for 'Spatial Proximity' and 'close-by', such as 'connected-to', 'standing-on', and 'attached-to'.
      All Cases (Ratio) Descriptions Solution w/ Explanation
      Normal (79.4%) [Spatial Proximity] or [close-by] or [?] N/A. Within expected multi-class setting already, i.e., 1 label per edge.
      Case 1 (20.5%) [close-by, Spatial Proximity] (20.4%) or [close-by, ?] (0.1%) Relabel Case 1 by the removal of close-by. Ignoring close-by as long as there exists a more meaningful and specific structural relationship.1
      Case 2 (0.001%) [Spatial Proximity, ?] Relabel Case 2 as ?. For these very rare cases where two objects share one support parent (Spatial Proximity) AND they also have another mearningful relation (?), we’d like to focus more on the higher-level one.2
      Case 3 (0.04%) [Spatial Proximity, close-by, ?] Relabel Case 3 as ?. Merging from Case 1 & 2 above.

      1In other words, close-by (after relabelling) is now describing the relationship between two objects who are spatially close to each other AND they have no extra high-level structural relationships including Spatial Proximity.
      2Spatial Proximity(after relabelling) - two objects have a spatial relationship (among left, right, behind, and front) with each other AND they share a support parent AND they have no higher-level relations (?) in between.

    4. Now we derive a multi-class structural relationship set for edge predictions.
  3. Recallibraition between node and edge annotations simultaneously. 1) Remove all edges connected to invalid nodes. 2) Iterate among edges, find the isolated nodes (with no edges connected to) for each scene, and remove them as well. 3) Apart from the small / paritial scenes (list), there is one more special scene to be removed (i.e., 87e6cf79-9d1a-289f-845c-abe4deb8642f) who contains no elements after our above cleanings (see more details in our scene_graph_remapping script). Our scene graph annotations derived after this step are saved as "SceneGraphAnnotation.json" in our 3DSSG-O27R16 dataset.

Last Few Steps

Within dataloader.py