Posted inAI

Apple’s Feret Model Is Way Better Than GPT 4 Vision

apple-feret-model

Apple has introduced the Feret Model, a multimodal AI system that has sent ripples through the tech world, challenging the reign of GPT 4 in certain aspects.

The Feret model developed by Apple researchers stands out as an advanced vision model. Leveraging a tool called Clip Viit L14, it interprets images and processes textual input, showcasing its prowess in understanding and interpreting complex scenarios.

How Feret Works

Feret’s workflow involves using Clip Viit L14 to comprehend image content and convert textual inputs into a format it understands. It goes a step further by identifying specific areas in images, utilizing special coordinates for precise location accuracy. Unlike traditional models, Feret excels in processing various shapes, demonstrating its intelligence in handling complex scenarios.

how feret works edited

Benchmarking Feret

Comparing Ferert against GPT-4, particularly the GPT-4 Region of Interest), reveals Feret’s superiority in fine-grained multimodal understanding. While GPT-4 Roi is designed for special tasks, Feret excels in pinpoint accuracy for smaller regions, showcasing its capabilities in image analysis.

feret benchmarking

The Feret Advantage

Examining benchmarks for the Feret model reveals its proficiency in handling different input types (Point, Box, Free Form) and output grounding. It outshines in data construction, GPT generation, robustness, and quantitative evaluation of refer SLG ground with chat.

Feret Vs. GPT-4 Vision

Feret’s standout performance is evident in comparisons with GPT-4 Vision. While GPT-4 shows knowledgeability in common sense, Feret excels in precise understanding of smaller regions, making it a crucial tool for detailed image analysis.

Apple’s Vision For The Future

If looking ahead, Apple’s foray into generative AI continues with the rumored Apple GPT, a language model set to enhance Siri’s capabilities and other AI features. Anticipated features include improved natural language understanding, enhanced text generation, and personalized conversational abilities.

Journal Feature

Apple’s on-device machine learning powers the journal feature, offering personalized writing suggestions based on a user’s photos, location, music, workouts, and more. This innovative tool marks a step towards more effective and personalized interactions.