-6.1 C
Beijing
Wednesday, January 28, 2026

UPS to cut 30,000 jobs and close 24 sites as Amazon volume “glide-down” continues

UPS plans to cut up to 30,000 jobs and close 24 facilities in 2026 as it reduces Amazon delivery volume, even as the company beat earnings estimates and forecast higher revenue for the year.

Wall Street pushes S&P 500 toward 7,000 while Dow falls as healthcare stocks tumble

Wall Street lifted the S&P 500 to a record and closer to 7,000 as earnings rolled in and the Fed meeting began, while the Dow slipped after health insurers sank on Medicare Advantage news.

Tesla stock in focus ahead of results as Wall Street debates growth vs profitability

Tesla is set for a key earnings test as investors watch margins, demand and pricing strategy, with the results expected to influence sentiment across the EV and tech markets.

Alibaba’s MAI-UI Emerges as a New Leader in AI GUI Agents, Outperforming Gemini 2.5 Pro

BusinessAlibaba's MAI-UI Emerges as a New Leader in AI GUI Agents, Outperforming Gemini 2.5 Pro

Alibaba’s Tongyi Lab has unveiled MAI-UI, a groundbreaking family of foundation GUI agents designed to revolutionize mobile interaction. These agents demonstrate superior performance in general GUI grounding and mobile navigation, surpassing established models like Gemini 2.5 Pro, Seed1.8, and UI-Tars-2 on the AndroidWorld benchmark. MAI-UI uniquely addresses critical gaps in current GUI agents by natively integrating user interaction, MCP tool use, and a device-cloud collaboration architecture that prioritizes privacy.

What is MAI-UI?

MAI-UI is a collection of multimodal GUI agents built upon the Qwen3 VL architecture, with model sizes varying from 2B to 235B A22B. These agents process natural language instructions and UI screenshots to generate structured actions for Android environments. Their capabilities extend beyond basic operations like clicking and typing to include answering user queries, seeking clarification, and invoking external tools via MCP calls, enabling a seamless blend of GUI actions, direct language responses, and API operations.

The system’s architecture unifies three key components: a self-evolving data pipeline for navigation that incorporates user interactions and MCP cases, an online Reinforcement Learning (RL) framework capable of scaling to hundreds of parallel Android instances, and a native device-cloud collaboration system that intelligently routes execution based on task status and privacy considerations.

GUI Grounding with Instruction Reasoning

A fundamental aspect of GUI agents is grounding – mapping natural language commands to specific on-screen controls. MAI-UI employs a novel grounding strategy inspired by multi-perspective instruction descriptions. For each UI element, the training pipeline utilizes multiple views, including appearance, function, spatial location, and user intent, as reasoning evidence. This approach mitigates issues arising from flawed or ambiguous instructions. The models achieve impressive accuracy on public GUI grounding benchmarks, outperforming Gemini 3 Pro and Seed1.8 on ScreenSpot Pro and significantly exceeding earlier open models on UI Vision.

Self-Evolving Navigation Data and MobileWorld

Navigating complex mobile interfaces requires maintaining context across multiple steps and applications. To foster robust navigation, Tongyi Lab developed a self-evolving data pipeline. This pipeline starts with seed tasks from app manuals and designed scenarios, which are then expanded through parameter perturbation and object-level substitutions. Multiple agents and human annotators execute these tasks, and a judge model refines the resulting trajectories. This continuous feedback loop ensures the training data distribution aligns with the current agent policy.

MAI-UI is evaluated on MobileWorld, a benchmark featuring 201 tasks across 20 applications, encompassing pure GUI tasks, agent-user interaction tasks, and MCP-augmented tasks. MAI-UI demonstrates strong performance on this benchmark, significantly improving over existing end-to-end GUI baselines and showing competitiveness with proprietary agentic frameworks.

Online RL in Containerized Android Environments

To ensure robustness in dynamic mobile applications, MAI-UI leverages an online RL framework operating within containerized Android Virtual Devices. This setup allows the agent to learn directly from interactions. The RL framework utilizes an asynchronous on-policy method (GRPO) that supports extensive parallelism and long context sequences, enabling learning from trajectories with up to 50 steps. Rewards are generated by verifiers or judge models, with penalties for looping behaviors. The research highlights that scaling the number of parallel GUI environments and increasing the allowed environment steps significantly boosts navigation success rates.

On the AndroidWorld benchmark, the largest MAI-UI variant achieved 76.7% success, setting a new standard and surpassing previous leading models.

Check out our other content

Check out other tags:

Most Popular Articles