Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis (CVPR2021)
For our real-world SGGpoint studies, we manually checked and carefully revised the raw 3DSSG datasets (extended from 3RScan) to a cleaned version (3DSSG-O27R16), with mappings to more popular and meaningful sets of instance semantics & structural relationships. 3DSSG-O27R16 stands for the preprocessed 3DSSG dataset annotated with 27 Object classes and 16 types of structural Relationships - please refer to our [Supp.] for more info.
To download our preprocessed 3DSSG-O27R16 dataset, please follow the instructions in our project page - or you could also derive these preprocessed data yourselves by following the step-by-step guidance below.
Structure of our 3DSSG-O27R16. There are mainly two kinds of files in our dataset, namely the dense 10-dim point cloud representitaons ("10dimPoints") and our updated scene graph annotations ("SceneGraphAnnotation.json"). Two preprocessing scripts (one for point_cloud_sampling and one for scene_graph_remapping) are available in this repository. The dict structure of SceneGraphAnnotation.json is summarized below:
SceneGraphAnnotation.json
Structure -> {
...,
scene_id: {
'nodes': {
...,
obj_id: {
'instance_id': obj_id,
'instance_color': instance_color_encoding,
'rio27_enc': rio27_class_id,
'rio27_name': rio27_class_name,
'raw528_enc': raw528_class_id,
'raw528_name': raw528_class_name,
},
...
},
'edges': [
...,
[src_obj_id, dst_obj_id, rel_class_id, rel_class_name],
...
]
},
...
}
Example -> {
...,
'f62fd5fd-9a3f-2f44-883a-1e5cf819608e': {
'nodes': {
...,
'11': {
'instance_id': 11,
'instance_color': '#c49c94',
'rio27_enc': 3,
'rio27_name': 'cabinet',
'raw528_enc': 68,
'raw528_name': 'cabinet',
},
'40': {
'instance_id': 40,
'instance_color': '#cd7864',
'rio27_enc': 12,
'rio27_name': 'curtain',
'raw528_enc': 129,
'raw528_name': 'curtain',
},
...
},
'edges': [
...,
['54', '8', '3', 'lying on'],
['35', '8', '15', 'close by'],
['20', '2', '14', 'spatial proximity'],
['15', '4', '1', 'attached to'],
...
]
},
...
}
Below we would go through and explain each preprocessing step in generating our 3DSSG-O27R16 dataset (for your information and interests only):
sampling 10-dim point cloud from obj files with mesh info. discarded.
full lists of remapped & relabelled annotations can be found in our [Supp.].
"0 : -" annotation - which contains a few rare objects such as juicer, hand dryer, baby bed, and cosmetics kit. As the handling of data imbalance is out of our focused research scope, we would treat them as invalid objects under our task context in Recalibration Step below.big1ger-than, same-material, and more-comfortable-than) to focus more on the structural relationships (e.g., standing-on, leaning-against), resulting in a 19-class label set.left, right, behind, and front - which could be easily decided by the bbox centers in post-processing we suppose) into one NEW type named as Spatial Proximity. This aggregation addresses ill-posed orientation-variant observations on inter-object relationships (and therefore, releases the power of rotation-involved data augmentation techniques), resulting in a 16-class label set now. close-by annotation. After a careful study, we managed to produce a multi-class label set through strategies below. TL;NR: we re-defined close-by1 and Spatial Proximity2 relations, such that a priority list of interests can be retained as “? > Spatial Proximity > close-by”.| All Cases (Ratio) | Descriptions | Solution w/ Explanation |
|---|---|---|
| Normal (79.4%) | [Spatial Proximity] or [close-by] or [?] |
N/A. Within expected multi-class setting already, i.e., 1 label per edge. |
| Case 1 (20.5%) | [close-by, Spatial Proximity] (20.4%) or [close-by, ?] (0.1%) |
Relabel Case 1 by the removal of close-by. Ignoring close-by as long as there exists a more meaningful and specific structural relationship.1 |
| Case 2 (0.001%) | [Spatial Proximity, ?] |
Relabel Case 2 as ?. For these very rare cases where two objects share one support parent (Spatial Proximity) AND they also have another mearningful relation (?), we’d like to focus more on the higher-level one.2 |
| Case 3 (0.04%) | [Spatial Proximity, close-by, ?] |
Relabel Case 3 as ?. Merging from Case 1 & 2 above. |
1In other words, close-by (after relabelling) is now describing the relationship between two objects who are spatially close to each other AND they have no extra high-level structural relationships including Spatial Proximity.
2Spatial Proximity(after relabelling) - two objects have a spatial relationship (among left, right, behind, and front) with each other AND they share a support parent AND they have no higher-level relations (?) in between.
Within dataloader.py
for each scene in scene_data:
# split pointcloud (9-dim) and instance_idx (1-dim) from the 10dimPoints file
unique_instance_idx_in_pointcloud = set(np.unique(instance_idx))
# suppose unique_instance_idx_in_scenegraph denotes the unique ids per scene from SceneGraphAnnotation file
diff = list(unique_instance_idx_in_pointcloud.difference(unique_instance_idx_in_scenegraph)) # ignore the mismatched instances
num_invalid_node = np.sum(instance_idx <= 0) # ignore the invalid nodes
if len(diff) > 0 or num_invalid_node > 0:
valid_index = (instance_idx > 0) # ignore the invalid nodes
for diff_value in diff: # ignore the mismatched instances
curr_valid_index = (instance_idx != diff_value)
valid_index *= curr_valid_index
pointcloud = pointcloud[valid_index]
instance_idx = instance_idx[valid_index]
# print('after :', instance_idx.shape, pointcloud.shape, valid_index.shape) # uncomment if you'd like to check the point_num chanages
# DESIRED OUTPUTS now: pointcloud & instance_idx
diff1 = unique_instance_idx_in_scenegraph.difference(unique_instance_idx_in_pointcloud)
diff2 = unique_instance_idx_in_pointcloud.difference(unique_instance_idx_in_scenegraph)
assert len(diff1) == len(diff2) == 0, "Something wrong: {}- \n{} - \n{}".format(scene_name, unique_instance_idx_in_pointcloud, unique_instance_idx_in_scenegraph)
Data augmentation - random sampling constant number of (4096) points for each scene of larger size. To ensure a uniform sampling (that could be applied on each individual object), we compute a dynamic sampling_ratio for each scene of different size - and use it to achieve cropping_by_instance_idx() to guarantee that all objects could be fairly sampled each loading time (without the influences caused by object scale differences). See [Supp.] for more relevant details.
def cropping_by_instance_idx(self, instance_labels):
# instance_labels is the 1-dim instance_idx above.
if instance_labels.shape[0] > self.max_npoint: # self.max_npoint is set to 4096
sampling_ratio = self.max_npoint / instance_labels.shape[0]
all_idxs = [] # scene-level instance_idx of points being selected this time
for iid in np.unique(instance_labels): # sample points on object-level
indices = (instance_labels == iid).nonzero()[0] # locate these points of a specific instance_idx
end = int(sampling_ratio * len(indices)) + 1 # num_of_points_to_be_sampled + 1
np.random.shuffle(indices) # uniform sampling among each object instance
selected_indices = indices[ :end] # get the LuckyPoints that get selected in fortune's favourite
all_idxs.extend(selected_indices) # append them to the scene-level list
valid_idxs = np.array(all_idxs)
else:
valid_idxs = np.ones(instance_labels.shape, dtype=bool) # no sampling is required
return valid_idxs
# somewhere in dataloader
_valid_idxs_ = self.cropping_by_instance_idx(instance_labels)
# 10dimPoints = xyz + rgb + normal + instance_idx (semantic_labels are fetched from searching instance_idx within SceneGraphAnnotation)
xyz = xyz[_valid_idxs_]
rgb = rgb[_valid_idxs_]
normal = normal[_valid_idxs_]
semantic_labels = semantic_labels[_valid_idxs_]
instance_labels = instance_labels[_valid_idxs_]
range list of length (N1 + N2 + … + NB).