According to Google CEO Sundar Pichai, the Gemini era of AI is about to begin. Pichai initially hinted about Google’s newest huge language model, Gemini, during the I/O developer conference in June. Gemini is currently being made available to the general public. It’s a significant advancement in an AI model, according to Pichai and Google DeepMind CEO Demis Hassabis, and it will eventually impact almost all of Google’s products. Pichai states, “Working on one underlying technology and making it better so that it immediately flows across our products is one of the most powerful things about this moment.”
Gemini isn’t just one AI model. Gemini Nano is a more lightweight version designed to run natively and offline on Android devices. A more robust version known as Gemini Pro is Bard’s foundation and will soon power several Google AI applications. Additionally, Google has produced the most potent LLM to date, the Gemini Ultra, which is much more capable and is primarily intended for use in data centers and enterprise applications.
Currently, Google is deploying the model in the following ways: Bard is now powered by Gemini Pro, and Gemini Nano is giving Pixel 8 Pro owners access to some additional features. (Next year, Gemini Ultra will be released.) Beginning on December 13th, developers and enterprise clients will have access to Gemini Pro via Google Cloud’s Vertex AI or Google Generative AI Studio. Gemini is only available in English; more languages should be added shortly. However, Pichai claims the model will eventually be included globally in the Chrome browser, Google’s search engine, and ad business. It is Google’s future, and it has arrived far too quickly.
A year and a week ago, OpenAI introduced ChatGPT, and the business and its product quickly rose to prominence in the AI space. Now, Google is finally prepared to fight back after claiming to be an “AI-first” company for almost ten years and producing a large portion of the foundational technology behind the current AI boom. It was obviously and embarrassingly taken by surprise by ChatGPT’s superior quality and the speed at which OpenAI’s technology has taken the industry by storm.
Currently, the most basic Gemini models are text in and text out, while more capable ones, such as Gemini Ultra, can also handle photos, videos, and audio. Furthermore, Hassabis predicts that “it will get even more general than that.” “Some elements still resemble robotics, such as touch and action.” He claims that these qualities will come with time as Gemini gains additional senses, awareness, accuracy, and grounding. “These models simply have a better understanding of their environment.” Naturally, these models exhibit biases and other issues besides their hallucinations. But according to Hassabis, they will get better the more they know.
These models simply have a better understanding of their environment.
However, benchmarks are only tests, and in the end, Gemini’s real test will come from regular users who want to use it for much more than just writing code and conducting information searches. Google views coding as the Gemini killer app; it uses a new code-generating engine called AlphaCode 2, which it claims outperforms 50% of competitors in coding competitions. However, Pichai claims that almost everything the model touches will enhance user experience.
The fact that Gemini is reportedly a significantly more effective model is also significant to Google. Compared to Google’s earlier models, such as PaLM, it is faster and less expensive to operate. It was trained on Google’s tensor processing units. Google is releasing the TPU v5p, a processing machine intended for use in data centers for large-scale model training and execution, along with the new model.
Also Check: Bitcoin Touches $40000
So, shall we address the crucial question now? GPT-4 from OpenAI vs. Gemini from Google: get set to battle. Google has been thinking about this for some time. According to Hassabis, “We’ve done an in-depth analysis of the systems side by side and the benchmarking.” Google performed 32 well-known benchmark comparisons between the two models, ranging from general assessments such as the Multi-task Language Understanding benchmark to comparing the models’ capacity to produce Python code. With a little smirk, Hassabis states, “I think we’re substantially ahead on 30 out of 32” of those milestones. A few of them are extremely limited. Some are bigger than others.
According to Google, Gemini outperforms GPT-4 in 30 of the 32 benchmarks.
Gemini’s primary advantage in those benchmarks—which are relatively close to one another—lies in its capacity to comprehend and engage with audio and video. Multimodality has always been a part of the Gemini strategy, so this is very much by design. Unlike OpenAI, which developed DALL-E and Whisper, Google created a single multimodal model instead of training separate models for speech and visuals. Hassabis states, “We’ve always been interested in very general systems.” His particular interest is combining all those modalities to get as much information as possible from as many sources and senses as possible and then provide equally varied answers.
It’s evident from speaking with Pichai and Hassabis that they regard the Gemini launch as a significant milestone in and of itself and the start of a more substantial initiative. It’s possible that Google should have had Gemini ready before OpenAI and ChatGPT caught the world by storm, but this is the model it has been waiting for and working toward for years.
Despite being viewed as lagging since ChatGPT’s introduction, Google, which issued a “code red” shortly after, appears to be clinging to its motto of “bold and responsible.” Both Hassabis and Pichai assert that they are unwilling to keep up with the current pace of advancement, particularly as we approach artificial general intelligence (AGI), the name for an AI that is capable of self-improvement, has superior intelligence to that of humans and is ready to revolutionize society. Things “are going to be different as we approach AGI,” claims Hassabis. Since it’s a somewhat active technology, we should proceed with caution. Cautiously yet with optimism.
Google claims to have put much effort into ensuring Gemini’s responsibility and safety through red-teaming and internal and external testing. Pichai notes that most generative AI revenue comes from enterprise-first products, for which guaranteeing data security and dependability is crucial. However, Hassabis admits that one of the dangers of implementing a cutting-edge AI system would undoubtedly include bugs and vulnerabilities that no one could have foreseen. “To see and learn, you must let things go,” he explains. Hassabis compares Google’s Ultra release to a constrained beta, with a “safer experimentation zone” reserved for the company’s most capable and unfettered model. Google is taking its time with the Ultra release. In other words, Google is searching for your alternate personality before you do if it exists and might destroy your marriage.
For years, Pichai and other Google executives have been gushing about artificial intelligence’s promise. Pichai has repeatedly stated that artificial intelligence (AI) will revolutionize humankind more than fire or electricity. The Gemini model won’t alter anything in this generation. In the best-case scenario, it might enable Google to overtake OpenAI in the competition to develop excellent generative AI. (Worst-case scenario: ChatGPT continues to win while Bard remains dull and subpar.) However, Hassabis, Pichai, and all the other Google executives appear to believe this is only the start of something big. Google became a tech powerhouse because of the web; Gemini might be even more significant.