Bing Visual Search: Microsoft's AI Masters Image Recognition

Bing Visual Search: Microsoft's AI Masters Image Recognition


Microsoft keeps elevating Bing Chat's capabilities, now embracing multimodal search. The Visual Search feature empowers the AI to scrutinize images within queries. Can it truly compete with Bard and Google Lens?


Bing Visual Search

Microsoft's Bing Chat breathed new life into its search engine, soaring in popularity with the integration of the chatbot, nearly overshadowing the unyielding Google. Yet, the hype has dwindled somewhat, witnessing a considerable decline in usage on both OpenAI's AI and Microsoft's search engine. Figures revealed by CNBC journalist Carl Quintanilla showed that Bing matched Google's visitor count during Q1 of 2023, even surpassing it in March. Since then, Bing's connections have waned, while Mountain View's company remains unaffected.


Nevertheless, Microsoft persists, adding fresh attributes and enhancements to its AI. The latest gem is Visual Search, rivaling Google Lens and its multimodal exploration. As the name implies, it allows users to blend images into their queries, enabling the AI to scrutinize and address them. After successful tests, this new tool becomes accessible to all, residing in both the desktop version and Bing's app.


Bing Visual Search: Empowering AI with Multimodal Exploration


Microsoft's chatbot now integrates multimodal visual search, courtesy of GPT-4. Visual Search lets anyone upload images and explore the web for relevant content. Snap a photo or use one found elsewhere, and ask Bing to unravel its secrets. Bing comprehends the image's context, interprets it, and answers queries about it, Microsoft clarifies in its blog. 


In a demo video, a rudimentary user interface sketch reaches Bing Chat, drawn schematically by hand. The chatbot is then tasked with generating HTML and CSS code for this design. In mere seconds, it produces scores of lines of code, crafting a functional HTML program with an appearance akin to the model.


This is merely one of Visual Search's numerous potential applications. For instance, a photo of multiple power adapters is sent to the chatbot along with the question, "Which one should I take to the UK?" One could also envision inquiring Bing Chat for a recipe based on an image of available ingredients or seeking guidance on repairing a damaged device. 


Microsoft has indeed outmaneuvered Google's move. Google recently introduced a multimodal approach to Bard, granting it the ability to grasp queries incorporating both text and an image, courtesy of Google Lens – a feature presently exclusive to English . However, Redmond is wasting no time and has just unveiled Bing Chat Enterprise, alongside the pricing for Copilot in Microsoft 365.

Next Post Previous Post
No Comment
Add Comment
comment url