Vision and Language Navigation (VLN)

In VLN an embodied agent is instructed by natural language to follow a route in a given environment. In our project, the environment is build from Street View panoramas in the area of Manhattan.

The agent is a machine learning model that is given navigation instructions and tries to follow them by deciding its next action from observing the current panorama image. The agent can thus freely navigate the environment until it decides to stop when it thinks the goal location is reached.