The World Through Vision Pro: How Apple Is Redefining Object Tracking

Apple’s Vision Pro is redefining how we interact with the digital and physical worlds, and a key component of this revolution is its sophisticated object detection and tracking capabilities. While traditional, broad-category object detection seen on other platforms has a different implementation on visionOS, the device introduces powerful tools for developers to recognize and track specific real-world objects, paving the way for a new generation of immersive and context-aware applications.

Precise Object Tracking

Unlike the general object detection APIs that can identify a wide array of generic objects in an image or video feed on iOS, visionOS, in its current iteration (visionOS 2), focuses on a more precise and powerful capability: object tracking. This approach allows developers to train the Vision Pro to recognize specific, pre-defined 3D objects in the user’s environment. Once an object is recognized, the device can track its position and orientation in real-time with remarkable accuracy, enabling applications to seamlessly anchor digital content to it.

This shift is crucial for creating robust spatial computing experiences. Instead of just knowing that a “cup” is on the table, the Vision Pro can be trained to recognize a specific user’s favorite mug, allowing an app to overlay it with personalized information, animations, or interactive controls.

How Object Tracking Works on Vision Pro

The workflow for implementing object tracking on the Apple Vision Pro involves a few key steps, primarily leveraging the power of ARKit and the Vision framework:

  • Creating a 3D Model: The first step is to have a high-quality 3D model of the object you want to track. This can be created using 3D modeling software or by scanning a real-world object using iPhone reality composer. The supported format for these models is USDZ.
  • Training a Reference Object: Using Apple’s CreateML tool, developers can train a machine learning model with the 3D model. This process generates a .referenceobject file, which contains the necessary information for the Vision Pro to recognize the physical object.
  • Integration with ARKit: Within a visionOS application, use an Object Tracking Provider in ARKit session. This provider is configured with the previously created .referenceobject files.
  • Real-Time Tracking and Interaction: Once the ARKit session is running, the Object Tracking Provider will continuously scan the user’s environment. When it detects an object that matches one of the reference objects, it provides the application with an Object Anchor. This anchor contains the 3D position, orientation, and scale of the tracked object, allowing developers to attach virtual content that will remain locked to the physical object as it moves.

The Role of the Vision Framework

While ARKit handles the spatial tracking aspect, the Vision framework complements it by providing a suite of tools for image analysis. On visionOS, the Vision framework can be used for tasks like barcode detection, text recognition, and more, which can be used in conjunction with object tracking to create even more powerful applications. For instance, an application could first use the Vision framework to scan a QR code on a device and then initiate object tracking for that specific device.

Use Cases 

  1. Virtual Try-On: A customer can use the Vision Pro to scan a physical piece of furniture or a clothing item, and the app will overlay it with different colors, textures, or even show how it would look in their home. This helps consumers visualize products before buying.
  2. Interactive Learning: A student can place a trained 3D model of a molecule or a historical artifact on their desk. The Vision Pro can then overlay information, animations, or a voice-over, bringing the subject to life and making learning more engaging.
  3. Augmented Reality Board Games: Game developers can create AR games that track a physical game board and its pieces. The Vision Pro could then overlay animations, effects, or even 3D characters, making the game more immersive and dynamic.

Warnings on Usage

Apple has provided explicit safety and usage guidelines for the Vision Pro, which are crucial for users to follow.

  • Awareness of Surroundings: While the device has “passthrough” video that allows users to see their environment, it does not guarantee complete situational awareness. Users are warned to be mindful of physical obstacles, stairs, windows, and other hazards, and to not use the device while walking in uncontrolled environments, operating a vehicle, or in any situation that requires full attention.

Author Details

Renuka Bhramanna

I’m Technology Lead at Infosys in the iCETS team. I build and design apps for iPhones and Apple Vision Pro using Swift, Objective-C and SwiftUI to create modern, intuitive experiences, and I've got hands-on experience in Augmented Reality (AR), Virtual Reality (VR), Mixed Reality (MR), and Artificial Intelligence/Machine Learning (AIML).

Leave a Comment

Your email address will not be published. Required fields are marked *