Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis (CVPR2021)
For our real-world SGGpoint studies, we manually checked and carefully revised the raw 3DSSG datasets (extended from 3RScan) to a cleaned version (3DSSG-O27R16), with mappings to more popular and meaningful sets of instance semantics & structural relationships. 3DSSG-O27R16 stands for the preprocessed 3DSSG dataset annotated with 27 Object classes and 16 types of structural Relationships - please refer to our [Supp.] for more info.
To download our preprocessed 3DSSG-O27R16 dataset, please follow the instructions in our project page - or you could also derive these preprocessed data yourselves by following the step-by-step guidance below.
Structure of our 3DSSG-O27R16. There are mainly two kinds of files in our dataset, namely the dense 10-dim point cloud representitaons ("10dimPoints") and our updated scene graph annotations ("SceneGraphAnnotation.json"). Two preprocessing scripts (one for point_cloud_sampling and one for scene_graph_remapping) are available in this repository. The dict structure of SceneGraphAnnotation.json is summarized below:
SceneGraphAnnotation.json
Structure -> {
...,
scene_id: {
'nodes': {
...,
obj_id: {
'instance_id': obj_id,
'instance_color': instance_color_encoding,
'rio27_enc': rio27_class_id,
'rio27_name': rio27_class_name,
'raw528_enc': raw528_class_id,
'raw528_name': raw528_class_name,
},
...
},
'edges': [
...,
[src_obj_id, dst_obj_id, rel_class_id, rel_class_name],
...
]
},
...
}
Example -> {
...,
'f62fd5fd-9a3f-2f44-883a-1e5cf819608e': {
'nodes': {
...,
'11': {
'instance_id': 11,
'instance_color': '#c49c94',
'rio27_enc': 3,
'rio27_name': 'cabinet',
'raw528_enc': 68,
'raw528_name': 'cabinet',
},
'40': {
'instance_id': 40,
'instance_color': '#cd7864',
'rio27_enc': 12,
'rio27_name': 'curtain',
'raw528_enc': 129,
'raw528_name': 'curtain',
},
...
},
'edges': [
...,
['54', '8', '3', 'lying on'],
['35', '8', '15', 'close by'],
['20', '2', '14', 'spatial proximity'],
['15', '4', '1', 'attached to'],
...
]
},
...
}
Below we would go through and explain each preprocessing step in generating our 3DSSG-O27R16 dataset (for your information and interests only):
sampling 10-dim point cloud from obj files with mesh info. discarded.
full lists of remapped & relabelled annotations can be found in our [Supp.].
"0 : -"
annotation - which contains a few rare objects such as juicer
, hand dryer
, baby bed
, and cosmetics kit
. As the handling of data imbalance is out of our focused research scope, we would treat them as invalid objects under our task context in Recalibration Step below.big1ger-than
, same-material
, and more-comfortable-than
) to focus more on the structural relationships (e.g., standing-on
, leaning-against
), resulting in a 19-class label set.left
, right
, behind
, and front
- which could be easily decided by the bbox centers in post-processing we suppose) into one NEW type named as Spatial Proximity
. This aggregation addresses ill-posed orientation-variant observations on inter-object relationships (and therefore, releases the power of rotation-involved data augmentation techniques), resulting in a 16-class label set now. close-by
annotation. After a careful study, we managed to produce a multi-class label set through strategies below. TL;NR: we re-defined close-by
1 and Spatial Proximity
2 relations, such that a priority list of interests can be retained as “?
> Spatial Proximity
> close-by
”.All Cases (Ratio) | Descriptions | Solution w/ Explanation |
---|---|---|
Normal (79.4%) | [Spatial Proximity ] or [close-by ] or [? ] |
N/A. Within expected multi-class setting already, i.e., 1 label per edge. |
Case 1 (20.5%) | [close-by , Spatial Proximity ] (20.4%) or [close-by , ? ] (0.1%) |
Relabel Case 1 by the removal of close-by . Ignoring close-by as long as there exists a more meaningful and specific structural relationship.1 |
Case 2 (0.001%) | [Spatial Proximity , ? ] |
Relabel Case 2 as ? . For these very rare cases where two objects share one support parent (Spatial Proximity ) AND they also have another mearningful relation (? ), we’d like to focus more on the higher-level one.2 |
Case 3 (0.04%) | [Spatial Proximity , close-by , ? ] |
Relabel Case 3 as ? . Merging from Case 1 & 2 above. |
1In other words, close-by
(after relabelling) is now describing the relationship between two objects who are spatially close to each other AND they have no extra high-level structural relationships including Spatial Proximity
.
2Spatial Proximity
(after relabelling) - two objects have a spatial relationship (among left
, right
, behind
, and front
) with each other AND they share a support parent AND they have no higher-level relations (?
) in between.
Within dataloader.py
for each scene in scene_data:
# split pointcloud (9-dim) and instance_idx (1-dim) from the 10dimPoints file
unique_instance_idx_in_pointcloud = set(np.unique(instance_idx))
# suppose unique_instance_idx_in_scenegraph denotes the unique ids per scene from SceneGraphAnnotation file
diff = list(unique_instance_idx_in_pointcloud.difference(unique_instance_idx_in_scenegraph)) # ignore the mismatched instances
num_invalid_node = np.sum(instance_idx <= 0) # ignore the invalid nodes
if len(diff) > 0 or num_invalid_node > 0:
valid_index = (instance_idx > 0) # ignore the invalid nodes
for diff_value in diff: # ignore the mismatched instances
curr_valid_index = (instance_idx != diff_value)
valid_index *= curr_valid_index
pointcloud = pointcloud[valid_index]
instance_idx = instance_idx[valid_index]
# print('after :', instance_idx.shape, pointcloud.shape, valid_index.shape) # uncomment if you'd like to check the point_num chanages
# DESIRED OUTPUTS now: pointcloud & instance_idx
diff1 = unique_instance_idx_in_scenegraph.difference(unique_instance_idx_in_pointcloud)
diff2 = unique_instance_idx_in_pointcloud.difference(unique_instance_idx_in_scenegraph)
assert len(diff1) == len(diff2) == 0, "Something wrong: {}- \n{} - \n{}".format(scene_name, unique_instance_idx_in_pointcloud, unique_instance_idx_in_scenegraph)
Data augmentation - random sampling constant number of (4096) points for each scene of larger size. To ensure a uniform sampling (that could be applied on each individual object), we compute a dynamic sampling_ratio for each scene of different size - and use it to achieve cropping_by_instance_idx() to guarantee that all objects could be fairly sampled each loading time (without the influences caused by object scale differences). See [Supp.] for more relevant details.
def cropping_by_instance_idx(self, instance_labels):
# instance_labels is the 1-dim instance_idx above.
if instance_labels.shape[0] > self.max_npoint: # self.max_npoint is set to 4096
sampling_ratio = self.max_npoint / instance_labels.shape[0]
all_idxs = [] # scene-level instance_idx of points being selected this time
for iid in np.unique(instance_labels): # sample points on object-level
indices = (instance_labels == iid).nonzero()[0] # locate these points of a specific instance_idx
end = int(sampling_ratio * len(indices)) + 1 # num_of_points_to_be_sampled + 1
np.random.shuffle(indices) # uniform sampling among each object instance
selected_indices = indices[ :end] # get the LuckyPoints that get selected in fortune's favourite
all_idxs.extend(selected_indices) # append them to the scene-level list
valid_idxs = np.array(all_idxs)
else:
valid_idxs = np.ones(instance_labels.shape, dtype=bool) # no sampling is required
return valid_idxs
# somewhere in dataloader
_valid_idxs_ = self.cropping_by_instance_idx(instance_labels)
# 10dimPoints = xyz + rgb + normal + instance_idx (semantic_labels are fetched from searching instance_idx within SceneGraphAnnotation)
xyz = xyz[_valid_idxs_]
rgb = rgb[_valid_idxs_]
normal = normal[_valid_idxs_]
semantic_labels = semantic_labels[_valid_idxs_]
instance_labels = instance_labels[_valid_idxs_]
range
list of length (N1 + N2 + … + NB).