In VLN an embodied agent is instructed by natural language to follow a route in a given environment. In our project, the environment is build from Street View panoramas in the area of Manhattan.
The agent is a machine learning model that is given navigation instructions and tries to follow them by deciding its next action from observing the current panorama image. The agent can thus freely navigate the environment until it decides to stop when it thinks the goal location is reached.
PaperPlease cite the following paper when using the map2seq data for Vision and Language Navigation:
title = "Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas",
author = "Raphael Schumann and Stefan Riezler",
year = "2022",
publisher = "Association for Computational Linguistics"