Evaluation in the real world is often time-consuming and expensive, so we propose a targeted contrast set-based evaluation strategy to efficiently evaluate the linguistic and visual capabilities of an end-to-end VLN policy.