Unveiling Google Gemini: An In-Depth Look at the New Generative AI Platform

Google’s endeavour to dominate the AI sector is manifested in Gemini, a comprehensive suite of generative AI models, applications, and services.

In this piece, we explore Gemini, its applications, and how well it fares against its counterparts.

This guide aims to make it easier for you to stay updated on the recent advancements in Gemini. We will continuously update it as Google releases new models, features and plans for Gemini.

Gemini is Google’s long-awaited, ambitious project in the GenAI model family. It is a combined effort of Google’s AI research labs, DeepMind and Google Research, distilled into three categories:

Gemini models have been designed to be “natively multimodal” which means they’re capable of interacting with more than simply words. These models have undergone pretraining and fine-tuning on an array of audios, images, videos, extensive set of codebases, and text in varied languages.

Unlike models such as Google’s LaMDA, which was exclusively trained on text data, Gemini has a broader range. LaMDA is limited to understanding or generating text only (like essays, email drafts), but that’s not the case with the Gemini models.

Image Credits: Google

Google reaffirms once again its weak spot in branding, as it didn’t clarify from the beginning that Gemini is a separate entity from the Gemini apps on the web and mobile (formerly known as Bard). The Gemini apps are just an interface permitting access to certain Gemini models — imagine it being a client for Google’s GenAI.

The Gemini apps and models are completely independent from Imagen 2, Google’s text-to-image model currently available in various dev tools.

Due to the multimodal nature of Gemini models, they have the potential to perform diverse tasks like speech transcription, captioning images and videos, and creation of artwork. Although some functionalities are still not fully developed (we’ll touch upon that in a moment), Google gives us a hopeful glimpse of these features and more emerging in the near future.

However, skepticism surrounds Google’s promises.

The company underachieved extensively during the initial launch of Bard. More recently, it stirred controversy by releasing a video claiming to illustrate Gemini’s functionality, which was later discovered to have been heavily manipulated and was essentially only an aspiration.

Google’s best Gemini demo was faked

Still, assuming Google is being more or less truthful with its claims, here’s what the different tiers of Gemini will be able to do once they reach their full potential:

Google says that Gemini Ultra — thanks to its multimodality — can be used to help with things like physics homework, solving problems step-by-step on a worksheet and pointing out possible mistakes in already filled-in answers.

Gemini Ultra can also be applied to tasks such as identifying scientific papers relevant to a particular problem, Google says — extracting information from those papers and “updating” a chart from one by generating the formulas necessary to re-create the chart with more recent data.

Gemini Ultra technically has the ability to generate images, as previously mentioned. However, this feature has not yet been integrated into the finalized version of the model – possibly due to the more intricate process that it involves, compared to apps like ChatGPT. Instead of utilizing prompts to generate images (the way DALL-E 3 does in ChatGPT’s case), Gemini creates images “natively”, omitting any intermediary steps.

Through Vertex AI, Google’s completely managed AI developer platform, and AI Studio, Google’s web-based tool for developers of apps and platforms, Gemini Ultra is made available as an API. It also drives the Gemini apps — but at a cost. Using Gemini Ultra via what Google labels as Gemini Advanced necessitates a subscription to the Google One AI Premium Plan, which is priced at $20 per month.

The AI Premium Plan also links Gemini to your broader Google Workspace account — incorporating your Gmail emails, Docs documents, Sheets presentations, and Google Meet recordings. This can be helpful for tasks like summarising emails or allowing Gemini to record notes during a video conference.

Google claims that, in terms of comprehension, planning and understanding capabilities, Gemini Pro surpasses LaMDA.

An independent study by Carnegie Mellon and BerriAI researchers found that the initial version of Gemini Pro was indeed better than OpenAI’s GPT-3.5 at handling longer and more complex reasoning chains. But the study also found that, like all large language models, this version of Gemini Pro particularly struggled with mathematics problems involving several digits, and users found examples of bad reasoning and obvious mistakes.

Early impressions of Google’s Gemini aren’t great

Google promised remedies, though — and the first arrived in the form of Gemini 1.5 Pro.

Designed to be a drop-in replacement, Gemini 1.5 Pro is improved in a number of areas compared with its predecessor, perhaps most significantly in the amount of data that it can process. Gemini 1.5 Pro can take in ~700,000 words, or ~30,000 lines of code — 35x the amount Gemini 1.0 Pro can handle. And — the model being multimodal — it’s not limited to text. Gemini 1.5 Pro can analyze up to 11 hours of audio or an hour of video in a variety of different languages, albeit slowly (e.g., searching for a scene in a one-hour video takes 30 seconds to a minute of processing).

Gemini 1.5 Pro entered public preview on Vertex AI in April.

An additional endpoint, Gemini Pro Vision, can process text and imagery — including photos and video — and output text along the lines of OpenAI’s GPT-4 with Vision model.

Using Gemini Pro in Vertex AI.

Within Vertex AI, developers can customize Gemini Pro to specific contexts and use cases using a fine-tuning or “grounding” process. Gemini Pro can also be connected to external, third-party APIs to perform particular actions.

Google brings Gemini Pro to Vertex AI

In AI Studio, workflows have been developed for constructing structured chat prompts utilizing Gemini Pro. Developers are provided with tools to access both Gemini Pro and Gemini Pro Vision endpoints. They are empowered to alter the model temperature for controlling the creativity range of the outputs and provide examples for instruction of tone and style. Fine tuning of the safety settings is also possible.

A more compact version of the Gemini Pro and Ultra models is Gemini Nano. It is efficient enough to be run directly on certain phones, eliminating the requirement of engaging a server. As of now, it supports several features on Pixel 8 Pro, Pixel 8 and Samsung Galaxy S24. This includes features like Summarize in Recorder and Smart Reply in Gboard.

The Recorder app is featured with a button which allows users to record and transcribe audio. The app also provides a Gemini-powered summary of recorded conversations, interviews, presentations and various other clips. These summaries can be availed by users even in the absence of a signal or Wi-Fi connection. And they do not have to worry about their data leaving their phone, providing a secure privacy measure.

Gemini Nano is incorporated in Gboard, which is Google’s keyboard application. This powers Smart Reply, a feature that recommends what you might want to express next during a messaging app conversation. Initially, this function was only compatible with WhatsApp, however, Google plans to expand it to more applications over time.

On devices that support it, Nano also powers Magic Compose in the Google Messages app. This can create messages in styles such as “excited”, “formal”, and “lyrical”.

Google has often promoted Gemini’s excellence in benchmarks. It claims that Gemini Ultra surpasses the current best results on “30 of the 32 widely utilized academic benchmarks employed in large language model research and development”. The corporation states that in certain situations, Gemini 1.5 Pro is better at tasks like content summarization, brainstorming, and writing than Gemini Ultra. This will presumably change with the introduction of the next Ultra model.

Ignoring the discussion of whether benchmarks really indicate a superior model, the results Google points to seem slightly superior to corresponding models from OpenAI. Moreover, initial feedback has not been positive. Users and scholars note that the older version of Gemini Pro often gets basic details incorrect, struggles with translations, and provides poor coding suggestions.

Gemini 1.5 Pro is free to utilize in the Gemini apps and, presently, AI Studio and Vertex AI.

However, once Gemini 1.5 Pro concludes its preview phase in Vertex, the model pricing will be $0.0025 per character while the output will charge $0.00005 per character. Vertex clients pay per 1,000 characters (which roughly amounts to 140 to 250 words) and, for models like Gemini Pro Vision, per image at $0.0025.

We can infer that a 500-word article comprises about 2,000 characters. Summing up such an article with Gemini 1.5 Pro would cost $5. At the same time, the generation of an article of similar length would cost $0.1.

The pricing for Ultra has not yet been revealed.

The easiest place to experience Gemini Pro is in the Gemini apps. Pro and Ultra are answering queries in a range of languages.

Gemini Pro and Ultra are also accessible in preview in Vertex AI via an API. The API is free to use “within limits” for the time being and supports certain regions, including Europe, as well as features like chat functionality and filtering.

Elsewhere, Gemini Pro and Ultra can be found in AI Studio. Using the service, developers can iterate prompts and Gemini-based chatbots and then get API keys to use them in their apps — or export the code to a more fully featured IDE.

Code Assist (formerly Duet AI for Developers), Google’s suite of AI-powered assistance tools for code completion and generation, is using Gemini models. Developers can perform “large-scale” changes across codebases, for example updating cross-file dependencies and reviewing large chunks of code.

Google’s brought Gemini models to its dev tools for Chrome and Firebase mobile dev platform, and its database creation and management tools. And it’s launched new security products underpinned by Gemini like Gemini in Threat Intelligence, a component of Google’s Mandiant cybersecurity platform that can analyze large portions of potentially malicious code and let users perform natural language searches for ongoing threats or indicators of compromise.

Gemini Nano is on the Pixel 8 Pro, Pixel 8 and Samsung Galaxy S24 — and will come to other devices in the future. Developers interested in incorporating the model into their Android apps can sign up for a sneak peek.

It might! Apple and Google are reportedly in talks to put Gemini to use for a number of features to be included in an upcoming iOS update later this year. Nothing’s definitive, as Apple is also reportedly in talks with OpenAI, and has been working on developing its own GenAI capabilities.

This post was originally published Feb. 16, 2024 and has since been updated to include new information about Gemini and Google’s plans for it.

Discover the pinnacle of WordPress auto blogging technology with AutomationTools.AI. Harnessing the power of cutting-edge AI algorithms, AutomationTools.AI emerges as the foremost solution for effortlessly curating content from RSS feeds directly to your WordPress platform. Say goodbye to manual content curation and hello to seamless automation, as this innovative tool streamlines the process, saving you time and effort. Stay ahead of the curve in content management and elevate your WordPress website with AutomationTools.AI—the ultimate choice for efficient, dynamic, and hassle-free auto blogging. Learn More

Leave a Reply

Your email address will not be published. Required fields are marked *