Nagesh Singh Chauhan
- Mar 2
- 6 min read

Google Gemma- Open Source LLM: Everything You Need to Know

What exactly is Gemma?

Gemma stands as Google's newest lineup of four Large Language Models (LLMs), developed as part of the Gemini initiative. These models come in two sizes: 2B and 7B parameters, with each size offering both a base (pretrained) and an instruction-tuned variant. Engineered to operate seamlessly across a variety of consumer hardware, Gemma models require no quantization and boast an impressive context length of 8,000 tokens.

Language understanding and generation performance of Gemma 7B across different capabilities compared to similarly sized open models. We group together standard academic benchmark evaluations by capability and average the respective scores. Credits

Gemma's exceptional performance is underscored by its remarkable metrics. Available in two configurations — one with 7 billion parameters and the other with 2 billion — Gemma exhibits superior accuracy compared to Meta's LLM, Llama 2, across diverse benchmarks. Notably, Gemma's 7 billion parameter model achieves a general accuracy of 64.3%, surpassing Llama 2 in reasoning, mathematical tasks, and various other categories.

Credits

Exploring Gemma Variants

Gemma, Google's open-source family of Large Language Models (LLMs), provides a versatile array of models tailored to various requirements. Let's venture into the different sizes and editions, uncovering their strengths, applications, and technical intricacies for developers:

Size Considerations: Selecting Your Gemma Variant

2B: This nimble contender excels in scenarios with limited resources, such as CPUs and mobile devices. With a memory footprint of approximately 1.5GB and rapid inference capabilities, it's well-suited for tasks like text classification and straightforward question answering.

7B: Balancing power and efficiency, the 7B version thrives on consumer-grade GPUs and TPUs. Its 5GB memory requirement enables tackling more intricate tasks such as summarization and code generation.

Model Architecture

The Gemma model architecture is built upon the transformer decoder framework introduced by Vaswani et al. Key parameters of the architecture are outlined in below table , and during training, the models process a context length of 8192 tokens.

Key model parameters.

Additionally, Gemma incorporated several enhancements proposed post the original transformer paper. Here's a breakdown of these improvements:

Multi-Query Attention: Leveraging findings from Shazeer (2019), the 7B model employs multi-head attention, while the 2B variants utilize multi-query attention (with num_kv_heads = 1). These variations have been observed to enhance performance across different scales.
RoPE Embeddings: Instead of absolute positional embeddings, they implemented rotary positional embeddings in each layer, sharing embeddings across inputs and outputs to reduce model size.
GeGLU Activations: In place of the standard ReLU non-linearity, they adopt the GeGLU activation function as proposed by Shazeer (2020).
Normalizer Location: Departing from conventional practices, they normalizes both input and output of each transformer sub-layer. It utilizes RMSNorm as the normalization layer.

Training Dataset

These models underwent training on a massive text dataset, encompassing a staggering 6 trillion tokens, comprising the following key components:

Web Documents: An extensive array of web text sources was included to expose the model to a rich spectrum of linguistic styles, topics, and vocabulary. The majority of this content is in English.
Code: Integration of code samples enables the model to grasp programming language syntax and patterns, enhancing its capability to generate code or comprehend code-related inquiries.
Mathematics: Incorporating mathematical text aids the model in acquiring skills related to logical reasoning, symbolic representation, and handling mathematical queries.

Measuring personal and sensitive data memorization rates. No sensitive data was memorized, hence it is omitted from the figure. Credits

This amalgamation of diverse data sources plays a pivotal role in training a robust language model capable of effectively addressing an array of tasks and text formats

How Was Gemma Trained?

The Gemma 2B and 7B models underwent training on vast datasets comprising 2 trillion and 6 trillion tokens, respectively, primarily consisting of English content sourced from Web Docs, mathematical texts, and code. Unlike Gemini models, which incorporate multimodal elements and are optimized for multilingual tasks, Gemma models are tailored specifically for processing English text. Prior to training, the dataset underwent meticulous filtering to eliminate any unwanted or unsafe content, including personal information and sensitive data. This filtration process utilized a combination of heuristic methods and model-based classifiers to ensure the quality and safety of the dataset.

To further enhance their performance, both the Gemma 2B and 7B models underwent supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). The supervised fine-tuning phase involved a diverse mix of text-only, English-only synthetic, and human-generated prompt-response pairs. The selection of data mixtures for fine-tuning was carefully curated based on LM-based side-by-side evaluations, with different prompt sets designed to emphasize specific capabilities such as instruction following, factuality, creativity, and safety.

Even synthetic data underwent several layers of filtering to exclude examples containing personal information or producing toxic outputs, adhering to the established approach by Gemini to enhance model performance while upholding safety standards. Finally, reinforcement learning from human feedback entailed gathering pairs of preferences from human raters and training a reward function using the Bradley-Terry model. This function was subsequently optimized using a form of REINFORCE to further refine the models' performance and mitigate potential issues such as reward manipulation.

How to use Gemma using Transformers?

!pip install -U "transformers==4.38.1" --upgrade
!pip install accelerate
!pip install -i https://pypi.org/simple/ bitsandbytes

!huggingface-cli login --token <Huggingface_token>

Fine-tuning Gemma 7B with QLORA and unsloth

Before delving into the fine-tuning process, the notebook begins by installing the Unsloth library and importing the necessary modules.

The code showcases how to instantiate a FastLanguageModel, a component of Unsloth, with specific configurations such as maximum sequence length, data type, and 4-bit loading.

!pip install "unsloth[colab] @ git+https://github.com/unslothai/unsloth.git" --q

PEFT (Parameter-Efficient Fine-Tuning)

PEFT (Partial Embedding Fine-Tuning) strategies selectively fine-tune a limited number of additional model parameters while keeping the majority of pretrained LLM parameters frozen. This not only significantly reduces computational and storage costs but also addresses the challenge of catastrophic forgetting observed in full fine-tuning of LLMs.

FastLanguageModel object provides a get_peft_model attribute where we can configure various parameters for finetuning, such as the number of attention heads, target modules, dropout rate, LoRa alpha and more.The use of gradient checkpointing and other advanced techniques showcases Unsloth’s capability to optimize model performance.

Data Preparation and Formatting

In order to finetune, we need to give LLM the access to an unseen dataset with a prompt template that it will get trained on.In our case, we are using a text generation dataset : databricks/databricks-dolly-15k

To make the data accessible to the model, we undertake the task of mapping all datapoints into a standardized format. This involves ensuring a consistent prompt structure with labeled inputs and outputs, specifically categorized as Instruction, Context, and Response (in this scenario)

This standardized format ensures effective utilization of the dataset during the fine-tuning phase.

Finetuning of LLM

The penultimate step is initialising a Supervised Fine-tuning Trainer that aids in the fine-tuning process.It initialises the model, along with the dataset that it has to finetune on, along with a tokenizer and all the required Training Arguments (learning rate, maximum steps, weight decay, optimisation, etc).

The fine-tuning looks something like this:

Generating Responses

We conclude the process by showcasing the model’s response generation capabilities. It utilizes a prompt format, including instructions, context, and an initial response.

The Path Forward for Gemma

Gemma's emergence as an open-source project, coupled with its remarkable performance, has generated considerable excitement within the Large Language Model (LLM) community.

So, what lies on the horizon for this burgeoning model family?

Advancements in the LLM Landscape: Gemma's open-source framework cultivates collaboration and innovation, allowing researchers and developers worldwide to contribute to its refinement. This collaborative effort is poised to accelerate progress in various areas, including interpretability, fairness, and efficiency. Gemma could potentially lead the charge in exploring multi-modal LLMs, capable of processing and generating not only textual data but also images, audio, and video.

An Optimistic Outlook: With its inclusive approach and impressive capabilities, Gemma signifies a significant stride toward democratizing AI and making it accessible and beneficial for all. As development continues, we anticipate witnessing further groundbreaking applications and advancements. Gemma's open-source nature nurtures a dynamic community, ensuring its ongoing evolution and profound impact on the future landscape of LLMs.

Conclusion

Gemma’s arrival in the LLM landscape marks a significant turning point. Unlike its larger, more resource-intensive cousins, Gemma offers accessibility and flexibility, making advanced AI capabilities available to a wider audience. Its open-source nature fuels innovation and collaboration, accelerating progress in natural language processing and shaping the future of AI. With its democratizing approach and impressive capabilities, Gemma represents a significant step towards making AI accessible and beneficial for everyone. As development progresses, we can expect even more groundbreaking applications and advancements. Gemma’s open-source nature fosters a vibrant community, ensuring its continued evolution and impact on the future of LLMs.