Shape from Shading for Robotic Manipulation
(Accepted to WACV 2024)

Arkadeep Narayan Chaudhury
CMU RI
Leonid Keselman
CMU RI

Chris Atkeson
CMU RI

AltText

AltText

AltText

Schematic of our setup. We have seven lights (\(L_1\) to \( L_7\)) and two cameras (\(C_1\) and \( C_2\))

We capture directionally illuminated images

Normals of the bas relief

Surface reconstruction of the bas relief

Normals of the star

Surface reconstruction of the star

Abstract

Controlling illumination can generate high quality information about object surface normals and depth discontinuities at a low computational cost. In this work we demonstrate a robot workspace-scaled controlled illumination setup that generates high quality information for table top scale objects for robotic manipulation. With our low angle of incidence directional illumination setup we can precisely capture surface normals and depth discontinuities of Lambertian objects. We demonstrate three use cases of our setup for robotic manipulation. We show that 1) by using the captured information we can perform general purpose grasping with a single point vacuum gripper, 2) we can visually measure the deformation of known objects, and 3) we can estimate pose of known objects and track unknown objects in the robot's workspace.

Narrated summary :: 9:11 mins

Firefox users: Please use the pop-out button if inline videos are not working.

Slides with our results

Demonstration of the data captured and processed using our setup

Directionally illuminated images

Normal map

Surface reconstruction

Depth edges overlaid on images

AltText

AltText

AltText

AltText

AltText

AltText

AltText

AltText

Some more single view 3D reconstructions from our pipeline.

Comparison between ground truth .

Demonstration of picking up tasks using our setup

We could pick up objects with arbitrarily oriented pickable faces.

Firefox users: Please use the pop-out button if inline videos are not working.

We could also pick up camouflaged objects

The robot demonstrations have not been sped up.

Comparison between ground truth and measured normals

The ground truth and the recovered normals are overlaid below. We register the objects to the ground truth meshes with maximum errors of upto 5 pixels (@ ~8 pixels/mm) using the pipeline described in the paper. Our multi-model pose estimation pipeline was robust to recovered data even with 50% of the pixels having a error of \(20^o\) or more in estimating local normals and upto 3mm of local warp due to 3D printing.

Firefox users: Please hit refresh if the comparators are frozen.

Pose estimation

With the object mask, depth edges and surface normals measured with our setup, we could align objects at sub-5-pixels at a resolution of 7.5 pixels/mm. Resulting pose estimation errors are sub millimeter and sub \(0.5^o\). We present some results below where the object normals captured with our setup has been overlaid on the ground truth (mesh) normals after registering the two.

AltText

AltText

AltText

AltText

Demonstration of our deformation estimation pipeline

We estimated the shapes of known objects under deformation

Five 1 mm vertical deformation steps

Mesh curvature overlaid on view from \(C_1\)

Mesh curvature overlaid on view from \(C_2\)

Reconstructed meshes at 3mm, 5mm and 9mm vertical deformations.

Measuring dough and clay

Dough and clay lumps are Lambertian, so we can measure them with out any modifications.

Source images

Depth edges on dough

Surface normals

3D measurements of the surface of the dough. Ours vs. Intel RealSense L515.

Ours (camera at 500mm)

Ours (camera at 250mm)

Ours (camera at 300mm)

RealSense L515 (camera at 300mm, 50 frame spatio-temporally filtered)

The source code and design of this webpage is adapted from Ref-NeRF project page. We would like to thank the authors for the inspiration.