Visual comparisons

We compare the generated novel view videos among ADE20K, ADE20K-outdoor and NYU datasets.

(a) Direct (U-Net) synthesizes the multi-plane images directly from the semantic layout using a fully-convolutional encoder-decoder architecture [Zhou et al. SIGGRAPH 2018].

(b) Direct (SPADE) also synthesizes the multi-plane images directly from the semantic layout, but uses a generator with spatially-adaptive normalization [Park et al. CVPR 2019].

(c) Cascade (MPI) first synthesizes a color image from the semantic layout using SPADE [Park et al. CVPR 2019], then apply an MPI predictor using the synthesized image as input. Here, we modify the original MPI generation model in StereoMag [Zhou et al. SIGGRAPH 2018] so that it takes a single image as input.

(d) Ours

Method: