In a recent demonstration video titled “Hands-on with Gemini: Interacting with multimodal AI,” Google showcased its GPT-4 competitor, Gemini, with high expectations. However, a Bloomberg opinion piece highlights concerns about the video’s accuracy and transparency.
According to Bloomberg, Google admitted that parts of the video were staged, with edits made to accelerate the outputs, as disclosed in the video description. The implied voice interaction between a human user and the AI, touted in the demonstration, was revealed to be non-existent. Instead, the actual demo involved the creation of interactions by “using still image frames from the footage and prompting via text,” rather than responding to real-time drawing or object changes on the table.
The lack of a disclaimer about the actual input method raises questions about the readiness of Gemini, portraying a less impressive capability than the video implies. While Google denies any wrongdoing, referencing a post by Gemini’s co-lead, Oriol Vinyals, stating that “all the user prompts and outputs in the video are real,” critics argue that the tech giant should exercise more sensitivity in its presentations, especially given the increased scrutiny from both the industry and regulatory authorities on AI practices.
Google has announced a substantial update for Bard, its generative AI chatbot and competitor to ChatGPT. The company claims that this update will significantly boost Bard’s capabilities by integrating Gemini, Google’s latest and most advanced AI model. The incorporation of Gemini is expected to enhance Bard’s reasoning, planning, understanding, and other functionalities.
Gemini is available in three sizes – Ultra, Pro, and Nano – making it adaptable for deployment on a range of devices, from mobile phones to data centers.
The rollout of Gemini to Bard will occur in two phases. Initially, Bard will receive an upgrade with a specially tuned version of Gemini Pro. In the following year, Google plans to introduce Bard Advanced, offering users access to the top AI model, starting with Gemini Ultra.
The version of Bard featuring Gemini Pro will initially be available in English across more than 170 countries and territories globally, with additional languages and countries, including the EU and U.K., to follow soon.
Before its public launch, Gemini Pro underwent industry-standard benchmark testing. Google reports that Gemini outperformed GPT-3.5 in six out of eight benchmarks, including significant multitask language understanding tasks and grade school math reasoning. However, it’s worth noting that GPT-3.5 is over a year old, leading some to view this as more of a catch-up move rather than an outright improvement, as highlighted by TechCrunch’s Kyle Wiggers.
The enhancements brought by Gemini are expected to make Bard more proficient in tasks such as content understanding and summarization, reasoning, brainstorming, writing, and planning.
Sissie Hsiao, VP and GM of Assistant and Bard at Google, described this upgrade as the “biggest single quality improvement of Bard since we’ve launched.”
Gemini Pro will initially power text-based prompts in Bard, with plans to expand to multimodal support (texts, images, or other modalities) in the coming months.
In 2024, Bard Advanced will debut, offering a new experience powered by Gemini’s most capable model, known as Gemini Ultra. This model can comprehend and act on various types of information, including text, images, audio, video, and code, with multimodal reasoning capabilities. Gemini Ultra can also understand, explain, and generate high-quality code in popular programming languages, according to Google.
Google will launch a trusted tester program for Bard Advanced before a broader release early next year. Additionally, the company will subject Bard Advanced to additional safety checks before its official launch.
This update follows several improvements to Bard over the past eight months, including features like answering questions about YouTube videos and integrating with various Google apps and services. With Gemini, Google aims to bring users “the best AI collaborator in the world,” acknowledging that Bard is not quite there yet.
Mastercard has announced the launch of its innovative generative AI shopping tool, “Shopping Muse,” aiming to revolutionize the way users explore and discover products in digital catalogs. Powered by Dynamic Yield, a personalization company acquired by Mastercard in April 2022, Shopping Muse utilizes advanced algorithms to provide personalized product recommendations based on users’ colloquial language and modern trends.
The tool enables users to ask questions in natural language, such as “What should I wear for a summer wedding?” or “Can you recommend pieces for a minimalist capsule wardrobe?” Shopping Muse analyzes the context of the user’s shopping experience, direct questions, and conversation content to deliver tailored suggestions. It leverages data from the retailer’s product catalog, on-site behavior, real-time preferences, and known consumer preferences.
Shopping Muse goes beyond text-based queries, employing integrated advanced image recognition tools. Retailers can recommend relevant products based on visual similarities, even in cases where users struggle to describe what they are looking for. Although initially focused on fashion, Mastercard anticipates extending the technology to other categories like furniture and grocery.
Ori Bauer, CEO of Dynamic Yield by Mastercard, emphasized the significance of AI-driven innovation in providing immersive and tailored online shopping experiences. He stated, “By harnessing the power of generative AI in Shopping Muse, we’re meeting the consumer’s standards and making shopping smarter and more seamless than ever.”
Highlighting the need for retailers to embrace technology, Mastercard emphasized that over one in four retailers already use generative AI solutions, with an additional 13% planning to adopt them in the next year. The company urged adaptation to changing demands and noted that personalization through AI-driven technology is crucial for meeting consumer expectations.
As part of a broader trend in AI-powered shopping tools, Google now enables users to receive AI-generated gift recommendations on Search, while Microsoft’s Bing automatically generates buying guides for queries like “college supplies.” Industry experts, as reflected in a recent Gartner report, predict that 80% of customer service and support organizations will integrate generative AI technology in some form by 2025. The release of Shopping Muse signifies a significant addition to the evolving landscape of AI-driven shopping solutions.
In a recent development, Meta’s efforts to boost the install base of Instagram Threads have proven successful, with the app now surpassing X (formerly Twitter) in terms of new downloads. According to insights from app intelligence firm Apptopia, Threads’ daily downloads, which had seen a decline since September, reversed the trend, growing from approximately 350,000 in early November to 620,000 since Thursday, November 23.
The surge in Threads’ downloads is attributed to Meta’s advertising campaign for the new app, positioning it as a Twitter alternative. Apptopia estimates that Threads has outpaced X in terms of new downloads, with 41 million compared to X’s 27 million since September.
Interestingly, the traction for Threads is not limited to the U.S., as the majority of new downloads are coming from outside the country. India leads the way, contributing 11.2% (9.2 million) of the new downloads, followed by the U.S. at 7.4% (6.1 million). This trend aligns with India’s role as a significant growth driver for Instagram.
Contrastingly, X’s new downloads are led by Indonesia, followed by India. However, the combined new downloads from the U.S., Indonesia, and India for X are fewer than India alone contributes to Threads.
The loss of new downloads for X is attributed to its rebranding efforts, causing a drop in growth momentum. The addition of “formerly Twitter” to its App Store description aimed to capitalize on users searching for the keyword “Twitter,” but Threads continued to outpace X in new installs.
Despite Threads’ recent success, X remains a larger platform with over 500 million monthly active users, compared to Threads’ reported just under 100 million as of Meta’s recent earnings in October. However, Threads has demonstrated substantial growth within its three-month launch period, reaching 141 million users by November 10, according to Quiver Quantitative.
X’s future is uncertain, facing an advertiser exodus, including major brands like Apple, IBM, Disney, Paramount, Comcast/NBCU, Warner Bros., Lionsgate, Sony, and Paris Hilton’s 11:11 Media and Walmart. Elon Musk’s controversial statements in a November 29 interview have contributed to this exodus.
Despite these challenges, X remains a platform where news breaks regularly, as seen with the recent OpenAI boardroom drama. While Threads has gained attention, X’s status as a main platform for news announcements, coupled with its user stickiness, keeps it a significant player in the social media landscape.
In the not-so-distant past, just a year ago in November, the world of machine learning was focused on constructing models for specific tasks such as loan approvals and fraud protection. Fast forward to today, the landscape has shifted with the emergence of generalized Large Language Models (LLMs). However, the era of task-based models, described by Amazon CTO Werner Vogels as “good old-fashioned AI,” is far from over and continues to thrive in the enterprise.
Task-based models, the foundation of AI in the corporate world before LLMs, remain a crucial component. Atul Deo, general manager of Amazon Bedrock, a product introduced to connect with large language models via APIs, emphasizes that task models haven’t vanished; instead, they’ve become an additional tool in the AI toolkit.
In contrast to LLMs, task models are tailored for specific functions, whereas LLMs exhibit versatility beyond the predefined model boundaries. Jon Turow, a partner at investment firm Madrona and former AWS executive, notes the ongoing discourse about the capabilities of LLMs, such as reasoning and out-of-domain robustness. While acknowledging their potential, Turow highlights the enduring relevance of task-specific models due to their efficiency, speed, cost-effectiveness, and performance in specialized tasks.
Despite the allure of all-encompassing models, the practicality of task models remains undeniable. Deo argues that having numerous separately trained machine learning models within a company is inefficient, making a compelling case for the reusability benefits offered by large language models.
For Amazon, SageMaker remains a pivotal product within its machine learning operations platform, catering specifically to data scientists. SageMaker, with tens of thousands of customers building millions of models, continues to be indispensable. Even with the current dominance of LLMs, the established technology preceding them remains relevant, as evidenced by recent upgrades to SageMaker geared toward managing large language models.
In the pre-LLM era, task models were the sole option, prompting companies to assemble teams of data scientists for model development. Despite the shift towards tools aimed at developers, the role of data scientists remains crucial. Turow emphasizes that data scientists will continue to critically evaluate data, providing insights into the relationship between AI and data within large enterprises.
The coexistence of task models and large language models is expected to persist, acknowledging that sometimes bigger is better, while at other times, it’s not. The key lies in understanding the unique strengths and applications of each approach in the evolving landscape of artificial intelligence.
In a disturbing trend, hackers are intensifying their assaults on Booking.com users, utilizing dark web forums to seek assistance in locating potential victims. Cyber-criminals are enticing with rewards of up to $2,000 (£1,600) for hotel login credentials, exploiting unsuspecting customers since at least March.
Despite Booking.com being one of the leading platforms for holidaymakers, reports of fraud have surfaced from customers in the UK, Indonesia, Singapore, Greece, Italy, Portugal, the US, and the Netherlands. While Booking.com itself has not been compromised, cyber-security experts reveal that hackers are infiltrating individual hotels’ administration portals linked to the service.
A spokesperson from Booking.com acknowledged the targeted attacks, stating that some accommodation partners are falling victim to hackers utilizing various well-known cyber-fraud techniques.
Research conducted by cyber-security firm Secureworks sheds light on the clandestine methods employed by these hackers. The attackers initiate their scheme by tricking hotel staff into downloading a malicious software called Vidar Infostealer. Posing as former guests, they send emails to hotels, alleging that they left their passport in the room. The email contains a Google Drive link supposedly containing an image of the passport, but instead downloads malware onto staff computers, scanning for Booking.com access.
Once inside the Booking.com portal, hackers gain visibility into all customers with current room or holiday reservations. Using the official app, they then contact customers and manipulate them into making payments directly to the hackers rather than the hotel.
The financial success of these attacks is evident as hackers are now offering substantial sums to criminals who share access to hotel portals. Rafe Pilling, director of threat intelligence for Secureworks Counter Threat Unit, states, “The scam is working and it’s paying serious dividends,” attributing its success to the high rate of social engineering effectiveness.
Victims like Lucy Buckley have fallen prey to this scheme. Contacted through the Booking.com app, hackers convinced her to send £200, claiming to be staff at the Paris hotel where she booked a room. Fortunately, she managed to secure a refund after realizing the deception.
In response to the growing threat, cyber-security expert Graham Cluley suggests that Booking.com hotels implement multi-factor authentication to bolster security. He emphasizes that the platform could do more, such as restricting links in chat that lead to websites less than a few days old, preventing the use of freshly-created fake sites to deceive customers into making payments.
In a surprising turn of events, ChatGPT, OpenAI’s groundbreaking chatbot, celebrates its first birthday amid a year of upheavals and triumphs. The user-friendly generative AI technology managed to captivate the world, breathing new life into Silicon Valley and sparking an AI arms race.
Unveiling the AI Revolution
A year since its public release, ChatGPT continues to be a driving force behind the ongoing AI revolution. Tech giants invest billions, nations hoard essential chips, and the promises and challenges of generative AI echo through boardrooms and homes worldwide.
Pandora’s Box: Navigating the Impact
ChatGPT’s impact on society unfolds with a mix of excitement and challenges. Schools grapple with integrating the technology into education, artists face disruption, and the labor market experiences shifts, both positive and negative. As AI’s long-prophesied impacts emerge, society collectively navigates the implications of an open Pandora’s box.
Imagining a New Equitable Future
The future of generative AI sparks debates about its impact on society’s equity. As jobs shift from human to AI hands, questions arise about societal adaptation. The transition poses challenges, urging the public to pressure policymakers for creative solutions, such as taxing AI companies or experimenting with universal basic income.
OpenAI’s Rollercoaster Ride
OpenAI, the creator of ChatGPT, faces internal strife, with CEO Sam Altman ousted and reinstated within a span of five days. The turmoil casts a shadow on the organization’s mission of ensuring AI benefits humanity, raising questions about its ability to keep itself together while navigating the transformative potential of AI.
In a significant move into the realm of AI image generation, Amazon has introduced its latest innovation, the Titan Image Generator, a text-to-image AI model. Unveiled at the AWS re:Invent conference, this advanced tool is designed to produce “realistic, studio-quality images” with built-in safeguards against toxicity and bias. Unlike standalone applications or websites, Titan serves as a versatile tool for developers, enabling them to construct their image generators using the underlying model, contingent on access to Amazon Bedrock.
During his keynote address, Swami Sivasubramanian, AWS Vice President of Database, Analytics, and Machine Learning, showcased the Titan Image Generator’s capabilities, emphasizing its proficiency not only in generating images from natural language prompts but also in seamlessly altering backgrounds. This marks a departure from the consumer-oriented focus of existing image generators like OpenAI’s DALL-E, targeting a more enterprise-centric audience.
All images generated by Titan Image Generator will automatically carry invisible watermarks, as part of voluntary commitments made by Amazon to the White House in July. Vasi Philomin, AWS Vice President for Generative AI, explained that this watermarking strategy was devised to distinctly label AI-created images without affecting their visual integrity, latency, or susceptibility to cropping or compression.
However, the detection of the invisible watermark poses a challenge, addressed by Amazon through the creation of an API. Users can connect to this API to verify the image’s provenance, adhering to the intentional design of Titan as a model rather than a finished product. Developers utilizing Titan Image Generator have the flexibility to determine how they convey this information to end-users.
The incorporation of invisible watermarks aligns with the Biden administration’s executive order on AI, emphasizing the identification of AI-generated content. Notably, companies like Microsoft and Adobe have adopted systems such as the Content Credentials system developed by the Coalition for Content Provenance and Authenticity (C2PA). Adobe goes a step further by introducing an icon to signify content credentials in both image and video content.
In addition to Titan Image Generator, Amazon has announced the general availability of other Titan models, including Titan Text Lite—a smaller model suitable for lightweight text generation tasks such as copywriting—and Text Express, designed for more extensive applications like conversational chat apps.
Amazon further extends copyright indemnity to customers utilizing its Titan foundation models, encompassing text-to-image functionalities. Legal coverage is also offered to users of any Amazon-created AI application, even if the application employs a different foundation model sourced from Amazon’s Bedrock AI model repository, which includes models like Meta’s Llama 2 or Anthropic’s Claude 2. Prominent applications under this umbrella include AWS HealthScribe, CodeWhisperer, Amazon Personalize, Amazon Lex, and Amazon Connect Contact Lens.
In a groundbreaking initiative, Google has successfully launched a pioneering geothermal project in Nevada, in collaboration with startup Fervo. This venture employs cutting-edge technology to harness geothermal power, different from conventional methods, with a capacity to generate 3.5 MW. The electricity generated will support two of Google’s data centers outside Las Vegas and Reno, contributing to Google’s commitment to achieving carbon pollution-free electricity 24/7 by 2030.
This unique geothermal endeavor, conceived in 2021, deviates from typical geothermal plants by utilizing an existing geothermal field on the outskirts. Fervo’s innovative approach involves drilling two horizontal wells to pump water through hot rocks, creating steam on the surface. The closed-loop system not only reuses water but also incorporates fiber optic cables for real-time data monitoring, drawing inspiration from practices in the oil and gas industry.
Google’s investment in geothermal energy aligns with its strategy to diversify clean energy sources, viewing geothermal power as a crucial element in maintaining a consistent energy supply alongside intermittent sources like wind and solar. Besides the Nevada project, Google has partnered with Project InnerSpace to address geothermal development challenges globally, signaling a broader commitment to sustainable energy solutions.
While the specifics of future geothermal deployments for data centers remain undisclosed, Google’s move reflects a strategic shift to reduce the environmental impact of its energy-intensive data operations. This innovative geothermal project represents a significant leap forward, supported not only by Google but also by climate-focused entities such as Breakthrough Energy Ventures and the US Department of Energy.