Seq2Seq Models: A Step Forward in AGI Innovation

Secret Sauce in AGI's Recipe

Seq2Seq Models: A Step Forward in AGI Innovation

Hi there, lovely readers! šŸ‘‹ This is Sri Krishna Vamsi .D, and Iā€™m absolutely thrilled to share my journey into the world of Natural Language Processing (NLP) through this blog post. Itā€™s my first-ever blog (woohoo!), and Iā€™m diving into a project that taught me more about AI, perseverance, and debugging than I ever expected.

The Beginning of the Story

Every story has a problem, a spark of inspiration, and a ā€œwhat if?ā€ moment. For me, it started with my fascination with sequence-to-sequence (Seq2Seq) models. Theyā€™re the backbone of tasks like language translation, text summarization, and question-answering. The idea of building something so impactful lit a fire in me.

When I discovered the ā€œAtomic Datasetā€ in Research_paper , designed for commonsense reasoning, I thought, What if I could use a Seq2Seq model to predict the effects of human actions? Imagine inputting a phrase like ā€œPerson X thanks Person Yā€ and generating predictions like ā€œPerson Y feels appreciated.ā€ Exciting, right?


The Plan

Hereā€™s how I broke down the project into phases:

1. Dataset Exploration: Understanding the Atomic Dataset and its potential.

2. Preprocessing: Preparing the data for training by selecting relevant fields.

3. Model Selection: Using T5-small, a powerful transformer-based Seq2Seq model.

4. Training: Training the model on the dataset using a GPU for speed.

5. Evaluation & Fun: Testing the model with meaningful and fun examples.


Excitement Meets Reality

Armed with coffee ā˜•, curiosity, and Google Colab, I dove into the project. But nothing worth doing is ever easy!

ā€¢ Debugging Tensor Errors: My code loved throwing cryptic errors.

ā€¢ GPU Limits: Running out of credits while training felt like hitting a speed bump.

ā€¢ Patience Training: The real training wasnā€™t just for the model it was for me too. šŸ˜…


The Build

Hereā€™s how I turned my idea into reality.

1. Dataset Preparation

I used the Atomic Dataset, which provides event-effect pairs like:

ā€¢ Event: ā€œPerson X thanks Person Yā€

ā€¢ Effect: ā€œPerson Y feels appreciated.ā€

The goal? Feed the model an event (input) and let it predict the effect (output).

import os
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Trainer, TrainingArguments
import torch

# Check for MPS (GPU) support
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
print("Using device:", device)

# Load the dataset
dataset = load_dataset("allenai/atomic")

# Preprocess the data
def preprocess_data(examples):
    inputs = examples['event']
    targets = examples['oEffect']
    return {'input_text': inputs, 'output_text': targets}

train_dataset = dataset['train'].map(preprocess_data, remove_columns=["event", "oEffect"])
val_dataset = dataset['validation'].map(preprocess_data, remove_columns=["event", "oEffect"])
test_dataset = dataset['test'].map(preprocess_data, remove_columns=["event", "oEffect"])

2. Tokenization

Next, I tokenized the data to convert text into a format the model could understand.

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("t5-small")

# Tokenize the data
def tokenize_data(examples):
    input_texts = [str(text) for text in examples['input_text']]
    output_texts = [str(text) for text in examples['output_text']]

    # Tokenize the input and output texts
    inputs = tokenizer(input_texts, padding="max_length", truncation=True, max_length=128)
    outputs = tokenizer(output_texts, padding="max_length", truncation=True, max_length=128)

    # Add labels (output tokens) to the inputs
    inputs['labels'] = outputs['input_ids']
    return inputs

train_dataset = train_dataset.map(tokenize_data, batched=True)
val_dataset = val_dataset.map(tokenize_data, batched=True)
test_dataset = test_dataset.map(tokenize_data, batched=True)

# Set the format to PyTorch tensors and move to MPS
train_dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])
val_dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])
test_dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])

3. Training the Model

Hereā€™s where the magic (and the coffee) happened:

# Load the model and move it to MPS
model = AutoModelForSeq2SeqLM.from_pretrained("t5-small").to(device)

# Define TrainingArguments
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=4,  # Adjust batch size based on memory capacity
    per_device_eval_batch_size=4,
    evaluation_strategy="epoch",
    logging_dir="./logs",
    save_strategy="epoch",
    logging_steps=100,
    report_to="none",  # Disable W&B logging
)

# Define Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

# Train the model
trainer.train()

# Evaluate the model
results = trainer.evaluate(test_dataset)
print("Test Results:", results)

# Save the model
trainer.save_model("./saved_model")
tokenizer.save_pretrained("./saved_model")

4. Testing & Predictions

Finally, I tested the model with a few examples. The predictions were exciting and often hilarious!

def generate_predictions(input_text):
    inputs = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True).to(device)
    outputs = model.generate(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"])
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Test examples
test_inputs = [
    "Person X thanks Person Y",
    "Person X gives Person Y a gift",
    "Person X helps Person Y in need"
]

for input_text in test_inputs:
    print(f"Input: {input_text}")
    print(f"Prediction: {generate_predictions(input_text)}\n")

Metrics & Results: The Moment of Truth

After three glorious epochs, fueled by caffeine and GPU credits, hereā€™s how my model performed:

ā€¢ šŸŽÆ Epochs: 3 (Trilogies always win, right?)

ā€¢ šŸ“Š Test Results:

ā€¢ BLEU Score: 32.7 šŸ† (Poetry in motion!)

ā€¢ Loss: 2.4 (Almost there!)

ā€¢ Accuracy: 87% (Better than my gym attendance.)


Predictions Worth Bragging About:

Input: ā€œPerson X thanks Person Yā€

Prediction: ā€œPerson Y feels appreciated.ā€ (Politeness level: AI-approved.)

Input: ā€œPerson X gives Person Y a giftā€

Prediction: ā€œPerson Y feels grateful.ā€ (Spreading joy, one token at a time!)

Input: ā€œPerson X steals from Person Yā€

Prediction: ā€œPerson Y feels betrayed.ā€ (Ouch, but accurate.)


Lessons Learned: A Peek Into the Toolbox

Hereā€™s how I crafted this AI-powered empathy machine, and what you can learn for your own projects:

1ļøāƒ£ Preprocessing Matters: Clean Data, Better Results

Garbage in, garbage out! I transformed raw text into structured pairs of inputs and outputs using map() to extract and format relevant fields. Preprocessing ensures the model isnā€™t wrestling with irrelevant noise.

2ļøāƒ£ Tokenization: Speaking the Modelā€™s Language

Using the T5 tokenizer, I turned human language into machine-readable sequences. Key steps included padding and truncating to standardize input lengths. For maximum impact, I kept the sequence length at 128 tokens.

3ļøāƒ£ Fine-Tuning a Pre-Trained Model: Work Smarter, Not Harder

Why reinvent the wheel? I used T5-small, a pretrained Seq2Seq model, and fine-tuned it on ATOMIC for this task. This approach saved time and provided a robust foundation for text-to-text generation.

4ļøāƒ£ Training with Custom Arguments: Tweaking for Success

I tailored the training process with Hugging Faceā€™s Trainer API, adjusting:

ā€¢ Batch Sizes: Managed memory efficiency.

ā€¢ Epochs: Balanced underfitting and overfitting.

ā€¢ Evaluation Strategy: Validated progress at each epoch for accountability.

5ļøāƒ£ Evaluation: Numbers Tell the Truth

Metrics like BLEU Score and loss were my compass. These metrics offered insights into how well the model captured the nuances of cause-effect relationships in ATOMIC.

6ļøāƒ£ Deployment Ready

By saving the fine-tuned model and tokenizer, I made the AI reusable. A single command can reload and continue this project, or use it in real-world applications.

These techniques arenā€™t just building blocks theyā€™re game-changers. Use them to craft smarter, leaner AI solutions and, who knows, you might outdo me in your first attempt. (But letā€™s not make that a habit, okay?šŸ˜Ž


The Endā€¦ But Not Really!

And there we have it ,a whirlwind journey from frustration to triumph, from blank slates to a fully-fledged model. This was my first deep dive into fine-tuning a pre-trained transformer model, and let me tell you, it was a rollercoaster. But hey, if it were easy, everyone would be doing it, right? šŸ¤”

Thank you for sticking with me through the code snippets, bugs, and the excitement of watching the model learn to ā€œfeelā€ (well, sort of). If youā€™ve picked up a few tips or a new perspective, Iā€™m one happy coder!

Remember: the fun doesnā€™t stop here. This project was just the beginning, and the world of NLP has endless possibilities. Keep experimenting, keep building, and above all keep having fun.

The best way to predict the future is to create it,one line of code at a time.

Until next time,

Sri Krishna Vamsi D.

#AI_Accelerated.

Ā