without-person-focus-on-fine-tune

Unlock AI Potential with OpenAI Fine-Tuning

Share this post on:

OpenAI offers fine-tuning capabilities for their pretrained models, allowing developers to unlock even more potential from the API. This powerful feature brings several key benefits:

  • Enhanced performance compared to standard prompting
  • Capacity to learn from extensive datasets beyond prompt limitations
  • Improved efficiency with shorter prompts and reduced token usage
  • Faster response times

Fine-tuning takes the concept of few-shot learning to the next level. By training on a much larger set of examples than what can fit in a prompt, you can achieve superior results across a wide range of tasks. Once your model is fine-tuned, you’ll need fewer examples in your prompts, streamlining your workflow.

The fine-tuning process can be broken down into four main steps:

  1. Curate and upload your training data
  2. Train your custom fine-tuned model
  3. Assess the results and iterate if necessary
  4. Deploy and utilize your fine-tuned model

By following this process, you can create a model that’s tailored to your specific use case, potentially revolutionizing your AI-powered applications.

The Fine-Tuning Journey: A Step-by-Step Guide

Let’s break down the fine-tuning process into manageable steps:

1. Data Collection and Preparation

The foundation of successful fine-tuning lies in gathering high-quality, labeled datasets for training, validation, and testing. This crucial step ensures your model has the right information to learn from.

2. Hyperparameter Optimization

Fine-tuning is an art as much as a science. It involves carefully adjusting key hyperparameters like learning rate, batch size, and number of epochs to optimize the learning process.

3. Model Adaptation

With your data prepared and hyperparameters set, it’s time to adapt the model architecture to your specific task. This step may involve modifying layers or adding task-specific components.

4. Rigorous Evaluation and Iteration

After fine-tuning, it’s crucial to evaluate your model’s performance on a separate validation set. This step ensures your model generalizes well to new, unseen data and helps identify areas for improvement.

5. Seamless Integration and Continuous Learning

The journey doesn’t end with fine-tuning. Integrating your optimized model into existing infrastructure and setting up processes for ongoing training and improvement are key to long-term success.

Prepare the Financial Sentiment Data from Kaggle

The data I introduced in the blog post β€œUnderstanding Sentiment Analysis inΒ Finance” also will be used for fine-tuning. Before delving into the fine-tuning open source large language model fine-tuning, I want to introduce OpenAI ChatGPT fine-tuning, which can be used very easily.

Before we dive into the exciting world of fine-tuning our model, let’s take a moment to set up our development environment. We’ll start by importing the essential libraries that will power our data manipulation, model training, and analysis processes.

iimport json
import tiktoken     # token counting
import numpy as np 
from collections import defaultdict
import warnings
warnings.filterwarnings('ignore')
from openai_finetune_tools import OpenAIFineTuneTools 
import pandas as pd
import numpy as np
import openai
import keyring
from sklearn.model_selection import train_test_split
from tqdm.notebook import tqdm

To get started with this project, you’ll need to obtain the dataset we’ll be using for sentiment analysis. Head over to Kaggle and download the “Financial Sentiment Analysis” dataset. This comprehensive collection of financial texts will serve as the foundation for our fine-tuning process.

Once you’ve downloaded the dataset, make sure to place it in your project’s data directory. We’ll be accessing this file in the next steps to prepare our training and validation sets.

# Kaggle sentiment analysis for finance
dataset_path = '../data/FinancialPhraseBank/all-data.csv'
df = pd.read_csv(dataset_path, engine='python', encoding='ISO-8859-1')

To fine-tune the model effectively, we need to prepare our dataset according to OpenAI’s guidelines. For models like gpt-4o-mini and gpt-3.5-turbo, OpenAI requires the data to be in a specific conversational format. This format ensures that the model can learn from the structure of dialogues, not just isolated text snippets.

The conversational format consists of a series of messages, each with a designated role (system, user, or assistant) and corresponding content. This structure allows the model to understand the flow of a conversation and the context in which information is presented. By adhering to this format, we can leverage the full potential of these advanced language models and achieve better results in our fine-tuning process.

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

For newer models like gpt-3.5-turbo and gpt-4, OpenAI has introduced a more flexible JSON format that supports multi-turn conversations. However, if you’re working with older models such as babbage-002 and davinci-002, you might still encounter the traditional prompt-completion format shown below:

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}

The code snippet below demonstrates how to structure our data into a series of messages, each with a specific role (system, user, or assistant) and corresponding content. This format allows the model to understand the context and flow of information, leading to more accurate and nuanced sentiment analysis.

prompt_sys = """You are an sentiment analyzer specialized in classifying sentiment of short financial texts.
Your task is to analyze the sentiment of the provided financial text and convert it into string format. Never include any other information or strings but output formt.

Follow these steps and respond only in the specified output format:

# Step 1: Read the provided financial text carefully.

# Step 2: Assign a sentiment score between 0 and 1 based on financial perspective.

# Step 3: Do a sentimental analysis and classify it into positive, negative or neutral category and get the reason why in the financial perspective.

# Step 4: Convert the classification into the specified output format.

#### output format:
<sentimental analysis>

### Example
# Text : The international electronic industry company Elcoteq has laid off tens of employees from its Tallinn facility ; contrary to earlier layoffs the company contracted the ranks of its office workers , the daily Postimees reported 
# Output : negative
# Text : Technopolis plans to develop in stages an area of no less than 100,000 square meters in order to host companies working in computer technologies and telecommunications , the statement said .
# Output : neutral
# Text : 'With the new production plant the company would increase its capacity to meet the expected increase in demand and would improve the use of raw materials and therefore increase the production profitability .'
# Output : positive
# Text : Rinkuskiai 's beer sales fell by 6.5 per cent to 4.16 million litres , while Kauno Alus ' beer sales jumped by 6.9 per cent to 2.48 million litres.
# Output : neutral
"""
prompt_user = f"What is the sentiment of this sentence? {df.iloc[0][1]}"
prompt_assistant = f"{df.iloc[0][0]}"
# data format : dictionary
# data_format = f"""{{"messages": [{{"role": "system", "content": "{prompt_sys}"}}, {{"role": "user", "content": "{prompt_user}"}}, {{"role": "assistant", "content": "{prompt_assistant}"}}]}}"""
data_format = {
    "messages": [
        {"role": "system", "content": prompt_sys},
        {"role": "user", "content": prompt_user},
        {"role": "assistant", "content": prompt_assistant}
    ]
}

Now, let’s create a powerful function that will transform our entire dataset into the conversation format required for fine-tuning. This crucial step ensures our data is properly structured for optimal model training.

def get_json_from_df(df, prompt=prompt_sys):
    # List to hold individual formatted messages
    formatted_messages = [] 
    
    # Iterate through each row in the DataFrame
    for _, row in df.iterrows():
        prompt_user = f"What is the sentiment of this sentence? {row[1]}"
        prompt_assistant = f"{row[0]}"
        data_format = {
            "messages": [
                {"role": "system", "content": prompt_sys},
                {"role": "user", "content": prompt_user},
                {"role": "assistant", "content": prompt_assistant}
            ]
        }
        formatted_messages.append(data_format)
        
    return formatted_messages

Next, let’s create a crucial function that converts our formatted conversations into a JSONL file. This step is essential for preparing our data in a format that OpenAI’s fine-tuning process can readily consume.

def to_json_file(file_path, formatted_messages):
    with open(file_path, 'w') as f:
        for message in formatted_messages:
            f.write(json.dumps(message) + '\n')

To complete our data preparation process, we’ll create two essential files: a training set and a validation set. These datasets are crucial for fine-tuning our model effectively. Let’s define a function that randomly samples and splits our data, ensuring we have a diverse and representative set for both training and validation purposes.

# function for random n samples / split train and test dataset
def get_random_samples(samples=df, n_random=100, test_size=0.3):
    df_sampled = df.sample(n=100)
    df_sampled_train = df_sampled.sample(round(n_random * (1 -test_size)))
    df_sampled_test = df_sampled.drop(df_sampled_train.index)
    return df_sampled_train, df_sampled_test

With all these steps completed, we’ve successfully prepared our data for fine-tuning OpenAI’s model. Let’s take a moment to recap what we’ve accomplished:

  • We’ve structured our financial sentiment data into the required conversational format
  • We’ve created functions to efficiently process and format our entire dataset
  • We’ve split our data into training and validation sets
  • We’ve generated JSONL files ready for OpenAI’s fine-tuning process

These crucial steps lay the foundation for enhancing our model’s performance in financial sentiment analysis. In the next section, we’ll dive into the exciting process of actually fine-tuning our model using this meticulously prepared data.

training_df, test_df = get_random_samples(samples=df, n_random=20)

data_formatted_training = get_json_from_df(training_df)
data_formatted_test = get_json_from_df(test_df)

# test file
file_path_train = '../data/jsonl/sentimen_analysis_finance_train_20240922.jsonl'
to_json_file(file_path_train, data_formatted_training)

# validation file
file_path_test = '../data/jsonl/sentimen_analysis_finance_test_20240922.jsonl'
to_json_file(file_path_test, data_formatted_test)

With our data meticulously prepared and formatted, we’ve reached an exciting milestone in our journey: we’re now primed and ready to embark on the fine-tuning process for our model. This crucial step will elevate our AI’s ability to accurately analyze financial sentiment, transforming it into a powerful tool for deciphering the nuances of financial texts.

In the next section, we’ll dive deep into the intricacies of fine-tuning, exploring how we can leverage OpenAI’s cutting-edge technology to create a model that’s finely attuned to the subtle language of finance. Get ready to unlock the full potential of AI-driven sentiment analysis!

πŸš€ Automate Your Workflow with Make.com!

πŸ”Ή Save time & boost productivity with powerful automation.

πŸ”Ή No coding required – integrate your favorite apps effortlessly.

πŸ”Ή Start for free and unlock limitless possibilities!

Fine-Tuning Essential Tools : Validation, Token Counting, and Cost Estimation

Welcome to the exciting world of AI model fine-tuning! In this section, we’ll dive deep into the crucial tools that every data scientist and AI enthusiast should have in their toolkit. We’re going to explore the OpenAIFineTuneTools class – a powerful ally in your journey to create more accurate and efficient AI models.

This class is designed to be your one-stop solution for three critical aspects of the fine-tuning process:

  • Format Validation: Ensuring your data is correctly structured for optimal results
  • Token Counting: Understanding the size and complexity of your dataset
  • Cost Estimation: Planning your budget for the fine-tuning process

Let’s break down how this class works and why it’s so valuable for your fine-tuning projects.

The OpenAIFineTuneTools class is initialized with a simple input: the path to your JSONL file. This file contains the dataset you’ll use for fine-tuning. Once initialized, you’ll have access to six functions that will revolutionize your fine-tuning workflow.

Let’s start with the cornerstone of data preparation: format validation.

import json
import tiktoken     # token counting
import numpy as np 
from collections import defaultdict

class OpenAIFineTuneTools:
    
    def __init__(self, file_path):
        self.file_path = file_path
        self.encoding = tiktoken.get_encoding('cl100k_base')
        
        # dataset
        with open(self.file_path, 'r', encoding='utf-8') as f:
            self.dataset = [json.loads(line) for line in f]
            

    def format_validate(self):
        # check dataset's format error
        format_errors = defaultdict(int)
        
        for ex in self.dataset:
            if not isinstance(ex, dict):
                format_errors["data_type"] += 1
                continue
            self.messages = ex.get("messages", None)
            if not self.messages:
                format_errors["messing_messages_list"] += 1
                continue
            
            for message in self.messages:
                if "role" not in message or "content" not in message:
                    format_errors["message_missing_key"] += 1
                    
                if any(k not in ("role", "content", "name", "fucntion_call", "weight") for k in message):
                    format_errors["message_unrecognized_key"] += 1
                
                if message.get("role", None) not in ("system", "user", "assistant", "function"):
                    format_errors["uncategorized_role"] += 1
                    
                content = message.get("content", None)
                function_call = message.get("function_call", None)
                
                if (not content and not function_call) or not isinstance(content, str):
                    format_errors['missing_content'] += 1
                    
            if not any(message.get("role", None) == "assistant" for message in self.messages):
                format_errors["example_missing_assistant_message"] += 1
        if format_errors:
            msg = "Found errors:\n"
            for k, v in format_errors.items():
                msg += f"{k}: {v}"
        else:
            msg = "No error found"
        
        return msg

Next, we’ll explore four powerful functions that form the backbone of our token analysis toolkit. These functions are designed to count tokens with precision and provide insightful distributions, giving you a clear picture of your dataset’s complexity.

  1. num_tokens_from_messages: This function calculates the total number of tokens in a conversation, accounting for message structure and content.
  2. num_assistant_tokens_from_messages: Focusing specifically on assistant responses, this function helps you understand the token usage of your model’s outputs.
  3. print_distribution: A versatile function that provides key statistical insights about your token counts, including minimum, maximum, mean, median, and percentile values.
  4. token_counts_warning: This comprehensive function not only counts tokens but also checks for potential issues in your dataset, such as missing system or user messages, and provides detailed distribution analytics.

By leveraging these functions, you’ll gain a deep understanding of your dataset’s token usage patterns, helping you optimize your fine-tuning process and manage resources effectively.

    def num_tokens_from_messages(self, tokens_per_message=3, tokens_per_name=1):   
        # not exact!
        # simplified from https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
        num_tokens = 0
        for message in self.messages:
            num_tokens += tokens_per_message
            for key, value in message.items():
                num_tokens += len(self.encoding.encode(value))
                if key == "name":
                    num_tokens += tokens_per_name
        num_tokens += 3
        return num_tokens
    
    def num_assistant_tokens_from_messages(self):
        num_tokens = 0
        for message in self.messages:
            if message['role'] == 'assistant':
                num_tokens += len(self.encoding.encode(message['content']))
        return num_tokens

    def print_distribution(self, values, name):
        print(f"\n### Distribution of {name}:")
        print(f"min / max : {min(values)}, {max(values)}")
        print(f"mean / median: {np.mean(values)}, {np.median(values)}")
        print(f"p5 / p95: {np.quantile(values, 0.1)}, {np.quantile(values, 0.9)}")
        
    def token_counts_warning(self):
        # warning and tokens counts
        self.n_missing_system = 0
        self.n_missing_user = 0
        self.n_messages = []
        self.convo_lens = []
        self.assistant_message_lens = []

        for ex in self.dataset:
            self.messages = ex["messages"]
            if not any(message["role"] == "system" for message in self.messages):
                n_missing_system += 1
            if not any(message["role"] == "user" for message in self.messages):
                n_missing_user += 1
            self.n_messages.append(len(self.messages))
            self.convo_lens.append(self.num_tokens_from_messages())
            self.assistant_message_lens.append(self.num_assistant_tokens_from_messages())
            
        print("Num example  missing system message:", self.n_missing_system)
        print("Num examples missing user message:", self.n_missing_user)
        self.print_distribution(self.n_messages, "num_message_per_example")
        self.print_distribution(self.convo_lens, "num_total_tokens_per_example")
        self.print_distribution(self.assistant_message_lens, "num_assistant_token_per_example")
        n_too_long = sum(1 > 16385 for l in self.convo_lens)
        print(f"\n{n_too_long} examples may be over the 16,385 token limit, they will be truncated during fine-tuning")

Last but certainly not least, let’s explore a crucial component of our OpenAIFineTuneTools class: the cost estimation function. This invaluable tool helps you plan and budget for your fine-tuning projects with precision.

Understanding the financial implications of fine-tuning is essential for both individual developers and organizations. Our cost estimation function takes into account factors such as the number of tokens, epochs, and examples in your dataset to provide an accurate estimate of your project’s costs. This foresight allows you to make informed decisions about resource allocation and project scope.

    def cost_estimation(self):
        # Pricing and default n_epochs estimate
        MAX_TOKENS_PER_EXAMPLE = 16385

        TARGET_EPOCHS = 3
        MIN_TARGET_EXAMPLES = 100
        MAX_TARGET_EXAMPLES = 25000
        MIN_DEFAULT_EPOCHS = 1
        MAX_DEFAULT_EPOCHS = 25

        n_epochs = TARGET_EPOCHS
        n_train_examples = len(self.dataset)
        if n_train_examples * TARGET_EPOCHS < MIN_TARGET_EXAMPLES:
            n_epochs = min(MAX_DEFAULT_EPOCHS, MIN_TARGET_EXAMPLES // n_train_examples)
        elif n_train_examples * TARGET_EPOCHS > MAX_TARGET_EXAMPLES:
            n_epochs = max(MIN_DEFAULT_EPOCHS, MAX_TARGET_EXAMPLES // n_train_examples)

        n_billing_tokens_in_dataset = sum(min(MAX_TOKENS_PER_EXAMPLE, length) for length in self.convo_lens)
        print(f"Dataset has ~{n_billing_tokens_in_dataset} tokens that will be charged for during training")
        print(f"By default, you'll train for {n_epochs} epochs on this dataset")
        print(f"By default, you'll be charged for ~{n_epochs * n_billing_tokens_in_dataset} tokens")   

Now that we’ve defined our powerful OpenAIFineTuneTools class, let’s put it to work! In this section, we’ll dive into a hands-on exploration of our dataset, leveraging the class’s robust functionality to gain valuable insights. Get ready to uncover the hidden patterns and characteristics of your data that will drive your fine-tuning process to new heights.

# format validate
openaitools_train = OpenAIFineTuneTools(file_path_train)
validate_message_train = openaitools_train.format_validate()
print(validate_message_train)

# token count warning
openaitools_train.token_counts_warning()

# cost estimation
openaitools_train.cost_estimation()

Great news! Our analysis reveals that our dataset is in pristine condition, with no format errors detected. This clean data structure sets a solid foundation for our fine-tuning process. Additionally, we’ve identified a maximum token count of 38,514, which provides ample room for complex conversations while staying well within OpenAI’s token limits. This combination of error-free formatting and generous token allowance positions us perfectly for an effective and efficient fine-tuning journey.

Uploading Your Data: The First Step to Fine-Tuning Magic

Ready to embark on your fine-tuning journey? Let’s kick things off with a crucial step: uploading your datasets to the OpenAI platform. Don’t worry, it’s easier than you might think! With just a few lines of code, you’ll be setting the stage for some serious AI enhancement.

Picture this: you’re about to give your AI model a personalized crash course in understanding your specific data. But before we can start the learning process, we need to get your carefully prepared datasets into the hands (or servers) of OpenAI. Think of it as packing your AI’s lunchbox with brain food before sending it off to school.

So, how do we do this? It’s as simple as making an API call. Let’s dive in and see how effortlessly we can upload both our training and validation files, setting the foundation for a fine-tuned model that’ll knock your socks off!

from openai import OpenAI

# uploading a training dataset
client = OpenAI(api_key=keyring.get_password('openai', 'key_for_windows'))
client.files.create(
    file=open('../data/jsonl/sentimen_analysis_finance_train_20240922.jsonl', 'rb'),
    purpose="fine-tune"
)

# uploading a test dataset
client.files.create(
    file=open('../data/jsonl/sentimen_analysis_finance_test_20240922.jsonl', 'rb'),
    purpose="fine-tune"
)

Unleashing the Power of Fine-Tuning: Creating Your Custom AI Model

Congratulations! Your fine-tuning job has succeeded, and now it’s time to put your custom AI model to work. This exciting milestone marks the transition from training to real-world application. Let’s explore how you can harness the power of your newly minted model.

With your fine-tuned model ready to go, integration is a breeze. Simply specify your custom model as a parameter in the Chat Completions API, and you’re all set to make requests. This seamless process allows you to tap into your model’s specialized knowledge and capabilities with ease.

Want to see your model in action? Head over to the OpenAI Playground, where you can interact with your fine-tuned creation in real-time. This user-friendly interface provides the perfect sandbox to test various prompts, refine your approach, and truly appreciate the nuances of your custom model’s responses.

from openai import OpenAI
client = OpenAI()

completion = client.chat.completions.create(
		model="ft:gpt-4o-mini:my-org:custom_suffix:id",
		messages=[
				{"role":"system", "content":"You are a helpful assistant."},
				{"role":"user", "content":"Hello!"}
		]
)
print(completion.choices[0].message)

Now comes the exciting part – we’re going to unleash our fine-tuned model on real-world financial data! Get ready to witness the power of custom AI as we dive into predicting financial sentiment with pinpoint accuracy.

In this section, we’ll walk you through the process of using our newly minted model to analyze and classify the sentiment behind financial texts. From market reports to earnings calls, our AI is primed to decode the emotional undertones that can make or break investment decisions.

So buckle up, because we’re about to embark on a thrilling journey through the world of AI-powered financial sentiment analysis!

# get sample dataset without test set
sample_data, _ = get_random_samples(samples=df, n_random=100, test_size=0)

For consistency and optimal performance, we utilize the same carefully crafted prompt that was employed during the training phase. This approach ensures that our fine-tuned model receives input in a familiar format, allowing it to leverage its specialized knowledge effectively.

By maintaining prompt consistency, we create a seamless bridge between the model’s training environment and its real-world application. This strategy not only maximizes the model’s performance but also provides a clear framework for users to interact with the AI, enhancing the overall user experience and the accuracy of sentiment analysis results.

# prompt for the sentimental analysis
prompt = """You are an sentiment analyzer specialized in classifying sentiment of short financial texts.
Your task is to analyze the sentiment of the provided financial text and convert it into string format. Never include any other information or strings but output formt.

Follow these steps and respond only in the specified output format:

# Step 1: Read the provided financial text carefully.

# Step 2: Assign a sentiment score between 0 and 1 based on financial perspective.

# Step 3: Do a sentimental analysis and classify it into positive, negative or neutral category and get the reason why in the financial perspective.

# Step 4: Convert the classification into the specified output format.

#### output format:
<sentimental analysis>

### Example
# Text : The international electronic industry company Elcoteq has laid off tens of employees from its Tallinn facility ; contrary to earlier layoffs the company contracted the ranks of its office workers , the daily Postimees reported 
# Output : negative
# Text : Technopolis plans to develop in stages an area of no less than 100,000 square meters in order to host companies working in computer technologies and telecommunications , the statement said .
# Output : neutral
# Text : 'With the new production plant the company would increase its capacity to meet the expected increase in demand and would improve the use of raw materials and therefore increase the production profitability .'
# Output : positive
# Text : Rinkuskiai 's beer sales fell by 6.5 per cent to 4.16 million litres , while Kauno Alus ' beer sales jumped by 6.9 per cent to 2.48 million litres.
# Output : neutral
"""

Now, we’re ready to put our newly fine-tuned model to the test! This isn’t just any off-the-shelf AI – it’s a custom-tailored powerhouse that we’ve trained specifically for financial sentiment analysis. By leveraging this specialized model, we’re tapping into a level of understanding and accuracy that goes beyond generic language models.

Get ready to see how our AI can dissect financial jargon, interpret market trends, and deliver razor-sharp sentiment analysis with unprecedented accuracy.

## llm model
from openai import OpenAI
import keyring
import pandas as pd
# sentimental analysis

def sentiment_analysis(prompt=prompt, content=None):
    # clent
    client = OpenAI(api_key=keyring.get_password('openai', 'key_for_windows'))
    query = prompt + "\n\n#### Text:\n\n" + content
    # getting model's response
    model = 'ft:gpt-4o-mini:my-org:custom_suffix:id' 
    completion = client.chat.completions.create(
        model=model,
        messages=[
            {'role':'system', 'content':'You are a helpful assistant.'},
            {'role':'user', 'content':query}
        ]
    )
    return completion.choices[0].message.content
# extract random 100 test case with 10 iterations
from tqdm.notebook import tqdm
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings('ignore')       # warning message not showing


# fine tuned model
accuracy_list = []

for n in tqdm(range(3)):
    y_true_list = []
    y_pred_list = []
    sample_data, _ = get_random_samples(samples=df, n_random=10, test_size=0)
    accuracy = 0
    for i, d in enumerate(sample_data.iterrows()):
        y_true = sample_data.iloc[i][0]
        y_pred = sentiment_analysis(content=sample_data.iloc[i][1])
        y_true_list.append(y_true)
        y_pred_list.append(y_pred)
    accuracy = accuracy_score(y_true_list, y_pred_list)
    print(accuracy)
    accuracy_list.append(accuracy)
accuracy_list   

The results are truly impressive – our fine-tuned model achieved an accuracy of nearly 90% in financial sentiment analysis! This level of precision demonstrates the power of custom AI models in tackling specific, complex tasks.

But don’t just take our word for it. We believe in transparency and sharing knowledge, so we’ve made our code available for you to explore and learn from. You can dive deeper into the technical details by checking out our:

  • Jupyter notebook – See the step-by-step process of fine-tuning and testing our model
  • OpenAI fine-tune tools – Explore the custom tools we developed to streamline the fine-tuning process

We encourage you to examine these resources, experiment with the code, and perhaps even adapt it for your own projects. After all, the true power of AI lies not just in its results, but in its ability to inspire and enable further innovation.

Leave a Reply

Your email address will not be published. Required fields are marked *