Gemini 2.5 Computer Use Model Preview: Web & Android Automation (2025)

Imagine a world where artificial intelligence doesn't just answer questions or generate images—it actually takes control of your computer, navigating websites and apps like a skilled human operator. That's the thrilling promise of Google's latest innovation, and it's here to revolutionize how we interact with technology. But here's where it gets controversial: is this the dawn of seamless automation, or a slippery slope toward AI overstepping into our digital lives? Stick around, because this preview of the Gemini 2.5 Computer Use model is about to blow your mind with its capabilities.

Google has just opened the doors for developers to test out the Gemini 2.5 Computer Use model, which powers exciting projects like Project Mariner and the agentic features in AI Mode. This isn't your average AI; it's a specialized tool designed to handle graphical user interfaces—think of those visual screens on your devices where you click buttons, type text, and scroll through content. Specifically, it's built to work seamlessly with web browsers and websites, making it a game-changer for tasks that require real-time interaction.

To understand how it works, let's break it down step by step in a simple way, even if you're new to AI concepts. The process runs in a continuous loop until the job is finished, ensuring everything happens smoothly and efficiently. First, you send a request to the model, including your specific ask, a snapshot of the current screen (like a screenshot), and a record of what actions have been taken recently. The model then digs into these details, using its smarts to come up with a response—usually in the form of a function call that represents a single action on the interface, such as clicking a button or entering text.

Next, the response is received, and some code on your device (the client-side part) carries out that action right away. Once that's done, a fresh screenshot of the interface and the current web address are sent back to the model as feedback, kicking off the loop again. It's like having a virtual assistant that learns and adapts with each step, making complex tasks feel effortless. For beginners, picture it as AI playing a video game where it sees the screen, decides on the next move, executes it, and updates its view to plan ahead—all without you lifting a finger.

The model supports a wide range of user interface actions to keep things versatile. You can go back or forward in your browsing history, perform web searches, jump to a particular URL, hover your cursor over elements, use keyboard shortcuts, scroll through pages, and even drag and drop items. This flexibility means it can handle everything from simple navigation to intricate workflows, like organizing digital sticky notes or booking appointments.

Google has shared some eye-opening examples to show this in action, sped up three times for clarity. In one scenario, the prompt is: 'Starting from https://tinyurl.com/pet-care-signup, gather all information for any pet with a California address and add them as guests in my spa CRM at https://pet-luxe-spa.web.app/. Afterward, schedule a follow-up appointment with specialist Anima Lavar for October 10th, any time after 8 AM, using the same reason as their requested treatment.' Imagine the AI browsing the signup page, extracting details, switching to the CRM site, inputting data, and setting up the appointment—all autonomously. It's like having a personal assistant who never gets tired.

Another demo involves organizing tasks for an art club fair. The prompt reads: 'My art club came up with tasks for our upcoming fair, but the board is a mess. Help me sort them into categories I've set up. Visit sticky-note-jam.web.app and make sure the notes are in the correct sections, dragging them if needed.' Here, the AI navigates to the site, assesses the chaotic sticky notes, and rearranges them into neat categories, turning disorder into order with precision.

Now, this is the part most people miss: while Gemini 2.5 Computer Use shines brightest on web browsers, Google has tested it on an 'AndroidWorld' benchmark, showing real potential for controlling mobile user interfaces on Android devices. However, it's not quite ready for full desktop operating system control just yet—think of it as a tool that's evolving, with mobile and web as its current strengths.

When stacked up against competitors like Claude from Anthropic and offerings from OpenAI, this model delivers impressive results in web and mobile control tests, boasting top-notch quality in browser interactions with the fastest response times. It's built on the robust visual understanding and reasoning powers of Gemini 2.5 Pro, and Google notes that similar versions fuel the agentic features in Project Mariner and AI Mode. Internally, it's already speeding up software development through UI testing, and there's an early access program for external developers creating AI assistants and automating workflows.

The best part? Gemini 2.5 Computer Use is now in public preview, accessible via the Gemini API in Google AI Studio and Vertex AI. Want to see it in action? Head over to a demo environment powered by Browserbase at http://gemini.browserbase.com/ and give it a whirl.

But here's where it gets controversial: as AI gains the ability to control our devices and automate personal tasks, questions arise about privacy and security. Could this lead to unintended data sharing or even misuse by bad actors? And what about job displacement for those in roles that involve repetitive computer work? On the flip side, some might argue it's a liberating force, freeing us from mundane chores to focus on creativity. What do you think—does this excite you as a step toward a smarter future, or does it raise red flags about AI overreach? Share your thoughts in the comments below; I'd love to hear if you agree, disagree, or have your own take on this groundbreaking tech!

Add 9to5Google to your Google News feed for more updates. FTC: We use income-earning auto affiliate links. More.

Gemini 2.5 Computer Use Model Preview: Web & Android Automation (2025)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Rev. Leonie Wyman

Last Updated:

Views: 5901

Rating: 4.9 / 5 (79 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Rev. Leonie Wyman

Birthday: 1993-07-01

Address: Suite 763 6272 Lang Bypass, New Xochitlport, VT 72704-3308

Phone: +22014484519944

Job: Banking Officer

Hobby: Sailing, Gaming, Basketball, Calligraphy, Mycology, Astronomy, Juggling

Introduction: My name is Rev. Leonie Wyman, I am a colorful, tasty, splendid, fair, witty, gorgeous, splendid person who loves writing and wants to share my knowledge and understanding with you.