(AI ML 03) Fine tuning and application of LLM in finance, FinGPT part 1

AI4Finance foundation has several interesting projects on application of AI in finance. One of them is FinGPT, which is introduced and available here. Part of this project is about sentiment analysis using LLM. They used Llama2 and ChatGLM as the base, and then performed instruction fine-tuning on them. The outcome was promising.

In this mini project, we attempt to replicate the same sentiment analysis with LLM but we are going to use Google’s FLAN T5 model. This model is well-known for its good performance in instruction fine tuning, and comes in different sizes. My experiments showed that a model as small as FLAN T5 base, with only 250 million parameters, is able to significantly improve its performance with PEFT (LoRa) fine tuning. And more importantly, I did all of the process using an i9 10th generation CPU!

Let’s start exploring the project. In this part, I introduce the FLAN T5 model, and analyze its performance and different ways to use it. We will see how quantized 8-bit and 4-bit models perform and compare their sizes with the original models. You can find the code on my GitHub page.

from transformers import pipeline

model_small = pipeline("text2text-generation", model = "google/flan-t5-small")
model_base  = pipeline("text2text-generation", model = "google/flan-t5-base")
model_large = pipeline("text2text-generation", model = "google/flan-t5-large")
model_xl    = pipeline("text2text-generation", model = "google/flan-t5-xl")

models = [["Small",model_small], ["Base", model_base], ["Large", model_large], ["X-Large", model_xl]]

for name, model in models:
    model.save_pretrained(f"./saved_models/{name}")

from transformers import pipeline

model_small = pipeline("text2text-generation", model = "google/flan-t5-small")
model_base  = pipeline("text2text-generation", model = "google/flan-t5-base")
model_large = pipeline("text2text-generation", model = "google/flan-t5-large")
model_xl    = pipeline("text2text-generation", model = "google/flan-t5-xl")

models = [["Small",model_small], ["Base", model_base], ["Large", model_large], ["X-Large", model_xl]]

for name, model in models:
    model.save_pretrained(f"./saved_models/{name}")

from transformers import pipeline

models = [["Small"], ["Base"], ["Large"], ["X-Large"]]

for i in range(len(models)):
    models[i].append(pipeline("text2text-generation", model = f"./saved_models/{models[i][0]}"))

from transformers import pipeline

models = [["Small"], ["Base"], ["Large"], ["X-Large"]]

for i in range(len(models)):
    models[i].append(pipeline("text2text-generation", model = f"./saved_models/{models[i][0]}"))

models

models

[['Small',
  <transformers.pipelines.text2text_generation.Text2TextGenerationPipeline at 0x275147d6b10>],
 ['Base',
  <transformers.pipelines.text2text_generation.Text2TextGenerationPipeline at 0x27514db3410>],
 ['Large',
  <transformers.pipelines.text2text_generation.Text2TextGenerationPipeline at 0x27514d50490>],
 ['X-Large',
  <transformers.pipelines.text2text_generation.Text2TextGenerationPipeline at 0x27514d505a0>]]

[['Small',
  <transformers.pipelines.text2text_generation.Text2TextGenerationPipeline at 0x275147d6b10>],
 ['Base',
  <transformers.pipelines.text2text_generation.Text2TextGenerationPipeline at 0x27514db3410>],
 ['Large',
  <transformers.pipelines.text2text_generation.Text2TextGenerationPipeline at 0x27514d50490>],
 ['X-Large',
  <transformers.pipelines.text2text_generation.Text2TextGenerationPipeline at 0x27514d505a0>]]

models[3][1]("When a recession begins, the Federal Reserve typically lowers interest rates as part of its monetary easing strategy. Here's how and why:\
What the Fed Does:\n\
- Cuts the Federal Funds Rate: The Fed reduces the benchmark interest rate to make borrowing cheaper for businesses and consumers.\n\
- Stimulates Economic Activity: Lower rates encourage spending and investment, which can help revive economic growth.\n\
- Supports Employment: By boosting demand, the Fed aims to reduce unemployment, which often rises during recessions.\n\
- Manages Inflation: If inflation is low or falling, rate cuts are more aggressive. If inflation remains high, the Fed may be more cautious.\n\n\
Historical Patterns:\n\
- In past recessions (e.g., 2001, 2008, 2020), the Fed slashed rates significantly—sometimes to near zero.\n\
- These moves are often accompanied by other tools like quantitative easing or forward guidance to reinforce the impact.\n\n\
Strategic Considerations:\n\
- The Fed doesn’t always act immediately. It assesses inflation trends, labor market data, and financial stability before deciding.\n\
- If inflation is still elevated—as seen in recent cycles—the Fed may delay or moderate rate cuts.\n\n\
Summarize the above text")[0]["generated_text"]

models[3][1]("When a recession begins, the Federal Reserve typically lowers interest rates as part of its monetary easing strategy. Here's how and why:\
What the Fed Does:\n\
- Cuts the Federal Funds Rate: The Fed reduces the benchmark interest rate to make borrowing cheaper for businesses and consumers.\n\
- Stimulates Economic Activity: Lower rates encourage spending and investment, which can help revive economic growth.\n\
- Supports Employment: By boosting demand, the Fed aims to reduce unemployment, which often rises during recessions.\n\
- Manages Inflation: If inflation is low or falling, rate cuts are more aggressive. If inflation remains high, the Fed may be more cautious.\n\n\
Historical Patterns:\n\
- In past recessions (e.g., 2001, 2008, 2020), the Fed slashed rates significantly—sometimes to near zero.\n\
- These moves are often accompanied by other tools like quantitative easing or forward guidance to reinforce the impact.\n\n\
Strategic Considerations:\n\
- The Fed doesn’t always act immediately. It assesses inflation trends, labor market data, and financial stability before deciding.\n\
- If inflation is still elevated—as seen in recent cycles—the Fed may delay or moderate rate cuts.\n\n\
Summarize the above text")[0]["generated_text"]

"The Federal Reserve typically lowers interest rates as part of its monetary easing strategy. Here's how and why."

"The Federal Reserve typically lowers interest rates as part of its monetary easing strategy. Here's how and why."

models[3][1]("I bought 2 apples and three oranges for a total of 10 dollars. Each orange is 2 dollars. How much is the price of each apple?")

models[3][1]("I bought 2 apples and three oranges for a total of 10 dollars. Each orange is 2 dollars. How much is the price of each apple?")

[{'generated_text': '3 oranges cost 3 * 2 = 6 dollars. 2 apples cost 10 - 6 = 4 dollars. Each apple is 4 / 2 = 2 dollars.'}]

[{'generated_text': '3 oranges cost 3 * 2 = 6 dollars. 2 apples cost 10 - 6 = 4 dollars. Each apple is 4 / 2 = 2 dollars.'}]

prompts =  ["How old the planet Earth is?'",
            "Write a Python code to calculate sum off all elements in a list",
            "Who is the current president of the USA?",
            "What is a common hedging method against currency volatility?"]
models[3][1](prompts)

prompts =  ["How old the planet Earth is?'",
            "Write a Python code to calculate sum off all elements in a list",
            "Who is the current president of the USA?",
            "What is a common hedging method against currency volatility?"]
models[3][1](prompts)

from threading import Thread

threads = []
results = [[None for _ in range(len(models))] for _ in range(len(prompts))]

def run_models(i, j):
     output = models[j][1](prompts[i])
     results[i][j] = output[0]["generated_text"]
     print(f"Model {j} finished prompt {i}")

for i in range(len(prompts)):
    for j in range(len(models)):
        thread = Thread(target=run_models, args=(i,j))
        thread.start()
        threads.append(thread)

from threading import Thread

threads = []
results = [[None for _ in range(len(models))] for _ in range(len(prompts))]

def run_models(i, j):
     output = models[j][1](prompts[i])
     results[i][j] = output[0]["generated_text"]
     print(f"Model {j} finished prompt {i}")

for i in range(len(prompts)):
    for j in range(len(models)):
        thread = Thread(target=run_models, args=(i,j))
        thread.start()
        threads.append(thread)

Model 0 finished prompt 0
Model 0 finished prompt 3
Model 0 finished prompt 2
Model 1 finished prompt 0
Model 1 finished prompt 3
Model 1 finished prompt 2
Model 2 finished prompt 0
Model 0 finished prompt 1
Model 3 finished prompt 0
Model 3 finished prompt 3
Model 3 finished prompt 2
Model 2 finished prompt 3
Model 2 finished prompt 2
Model 1 finished prompt 1
Model 2 finished prompt 1
Model 3 finished prompt 1

Model 0 finished prompt 0
Model 0 finished prompt 3
Model 0 finished prompt 2
Model 1 finished prompt 0
Model 1 finished prompt 3
Model 1 finished prompt 2
Model 2 finished prompt 0
Model 0 finished prompt 1
Model 3 finished prompt 0
Model 3 finished prompt 3
Model 3 finished prompt 2
Model 2 finished prompt 3
Model 2 finished prompt 2
Model 1 finished prompt 1
Model 2 finished prompt 1
Model 3 finished prompt 1

for thread in threads:
    thread.join()
    
for i in range(len(prompts)):
    print(f"\n\n{prompts[i]}")
    for j in range(len(models)):
        print(f"  {models[j][0]}:  {results[i][j]}")

for thread in threads:
    thread.join()
    
for i in range(len(prompts)):
    print(f"\n\n{prompts[i]}")
    for j in range(len(models)):
        print(f"  {models[j][0]}:  {results[i][j]}")

Tell me how old the planet Earth is?
  Small:  10 billion years
  Base:  4.5 billion years old
  Large:  billions of years
  X-Large:  4.5 billion years


What is the Python method that returns sum off all elements in a list
  Small:  List of elements in a list is a list of elements in a list.
  Base:  sum = sum(list(map(int, input().split())))
  Large:  s = sum(list(map(int,input().split())))
  X-Large:  s = 0 for i in list(map(int, input().split())): s += i print(s)


Who is the current president of the United States?
  Small:  george w. bush
  Base:  gerald ford
  Large:  barack obama
  X-Large:  barack obama


What is a common hedging method against currency volatility?
  Small:  hedging
  Base:  hedge fund
  Large:  hedging with a foreign currency
  X-Large:  hedging with forward contracts

Tell me how old the planet Earth is?
  Small:  10 billion years
  Base:  4.5 billion years old
  Large:  billions of years
  X-Large:  4.5 billion years


What is the Python method that returns sum off all elements in a list
  Small:  List of elements in a list is a list of elements in a list.
  Base:  sum = sum(list(map(int, input().split())))
  Large:  s = sum(list(map(int,input().split())))
  X-Large:  s = 0 for i in list(map(int, input().split())): s += i print(s)


Who is the current president of the United States?
  Small:  george w. bush
  Base:  gerald ford
  Large:  barack obama
  X-Large:  barack obama


What is a common hedging method against currency volatility?
  Small:  hedging
  Base:  hedge fund
  Large:  hedging with a foreign currency
  X-Large:  hedging with forward contracts

Loading a prepared T5 quantized 8-bit model

import ctranslate2
import transformers

translator = ctranslate2.Translator("./saved_models/X-Large-8bit/")
tokenizer = transformers.AutoTokenizer.from_pretrained("google/flan-t5-xl")
prompts =  ["How old the planet Earth is?'",
            "Write a Python code to calculate sum off all elements in a list",
            "Who is the current president of the USA?",
            "What is a common hedging method against currency volatility?"]
            
for prompt in prompts:
    input_text = prompt
    input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(input_text))

    results = translator.translate_batch([input_tokens])

    output_tokens = results[0].hypotheses[0]
    output_text = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))

    print(output_text)

import ctranslate2
import transformers

translator = ctranslate2.Translator("./saved_models/X-Large-8bit/")
tokenizer = transformers.AutoTokenizer.from_pretrained("google/flan-t5-xl")
prompts =  ["How old the planet Earth is?'",
            "Write a Python code to calculate sum off all elements in a list",
            "Who is the current president of the USA?",
            "What is a common hedging method against currency volatility?"]
            
for prompt in prompts:
    input_text = prompt
    input_tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(input_text))

    results = translator.translate_batch([input_tokens])

    output_tokens = results[0].hypotheses[0]
    output_text = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))

    print(output_text)

4.5 billion
list = list(map(int, input().rstrip().replace(" "))) list.replace(" ".join(list)) list.replace(list[1]) list.replace(list[2]) list.replace(list[3]) list.replace(list[4]) list.replace(list[5]) list.replace(list[6]) list.replace(list[7]) list.replace(list[8]) list.replace(list[9]) list.replace(list[10]) list.replace(list[11]) list.replace(list[8]) list.replace(list[9]) list.replace(list[6]) list.replace(list[7]) list.replace(list[8]) list.replace(list[9]) list.replace(list[0]) list.replace(list[1]) list.replace(list[2]) list.replace(list[0]) list.replace(list[
barack obama
foreign currency forward transactions

4.5 billion
list = list(map(int, input().rstrip().replace(" "))) list.replace(" ".join(list)) list.replace(list[1]) list.replace(list[2]) list.replace(list[3]) list.replace(list[4]) list.replace(list[5]) list.replace(list[6]) list.replace(list[7]) list.replace(list[8]) list.replace(list[9]) list.replace(list[10]) list.replace(list[11]) list.replace(list[8]) list.replace(list[9]) list.replace(list[6]) list.replace(list[7]) list.replace(list[8]) list.replace(list[9]) list.replace(list[0]) list.replace(list[1]) list.replace(list[2]) list.replace(list[0]) list.replace(list[
barack obama
foreign currency forward transactions

Quantized 8-bit and 4-bit models

from transformers import T5Tokenizer, T5ForConditionalGeneration, BitsAndBytesConfig as bnb_cfg

qcfg = bnb_cfg(load_in_8bit = True)
qqcfg = bnb_cfg(load_in_4bit = True)

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xl")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl", device_map="auto")
qmodel = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl", device_map="auto", quantization_config=qcfg)
qqmodel = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl", device_map="auto", quantization_config=qqcfg)

qmodel.save_pretrained("qXL")
qqmodel.save_pretrained("qqXL")

from transformers import T5Tokenizer, T5ForConditionalGeneration, BitsAndBytesConfig as bnb_cfg

qcfg = bnb_cfg(load_in_8bit = True)
qqcfg = bnb_cfg(load_in_4bit = True)

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xl")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl", device_map="auto")
qmodel = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl", device_map="auto", quantization_config=qcfg)
qqmodel = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl", device_map="auto", quantization_config=qqcfg)

qmodel.save_pretrained("qXL")
qqmodel.save_pretrained("qqXL")

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xl")
model = T5ForConditionalGeneration.from_pretrained("./saved_models/X-Large", device_map="auto")
qmodel = T5ForConditionalGeneration.from_pretrained("./saved_models/qXL", device_map="auto")
qqmodel = T5ForConditionalGeneration.from_pretrained("./saved_models/qqXL", device_map="auto")

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xl")
model = T5ForConditionalGeneration.from_pretrained("./saved_models/X-Large", device_map="auto")
qmodel = T5ForConditionalGeneration.from_pretrained("./saved_models/qXL", device_map="auto")
qqmodel = T5ForConditionalGeneration.from_pretrained("./saved_models/qqXL", device_map="auto")

input_text = "I bought 2 apples and three oranges for a total of 10 dollars. Each orange is 2 dollars. How much is the price of each apple?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

input_text = "I bought 2 apples and three oranges for a total of 10 dollars. Each orange is 2 dollars. How much is the price of each apple?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)
print(f"Original XL: {tokenizer.decode(outputs[0])}")

outputs = model.generate(input_ids)
print(f"Original XL: {tokenizer.decode(outputs[0])}")

Original XL: <pad>3 oranges cost 3 * 2 = $6. 2 apples cost 10 - 6 = $4.

Original XL: <pad>3 oranges cost 3 * 2 = $6. 2 apples cost 10 - 6 = $4.

outputs = qmodel.generate(input_ids)
print(f"8-bit XL: {tokenizer.decode(outputs[0])}")

outputs = qmodel.generate(input_ids)
print(f"8-bit XL: {tokenizer.decode(outputs[0])}")

8-bit XL: <pad>3 oranges cost 3 * 2 = $6. 2 apples cost 10 - 6 = $4.

8-bit XL: <pad>3 oranges cost 3 * 2 = $6. 2 apples cost 10 - 6 = $4.

outputs = qqmodel.generate(input_ids)
print(f"4-bit XL: {tokenizer.decode(outputs[0])}")

outputs = qqmodel.generate(input_ids)
print(f"4-bit XL: {tokenizer.decode(outputs[0])}")

4-bit XL: <pad>3 oranges cost 3 * 2 = $6. So the apples cost 10 - 6 = $4

4-bit XL: <pad>3 oranges cost 3 * 2 = $6. So the apples cost 10 - 6 = $4

prompts =  ["How old the planet Earth is?'",
            "Write a Python code to calculate sum off all elements in a list",
            "Who is the current president of the USA?",
            "What is a common hedging method against currency volatility?"]

for prompt in prompts:
    input_text = prompt
    input_tokens = tokenizer(input_text, return_tensors="pt").input_ids
    outputs = model.generate(input_tokens)
    output_text = tokenizer.decode(outputs[0])
    print(output_text)

prompts =  ["How old the planet Earth is?'",
            "Write a Python code to calculate sum off all elements in a list",
            "Who is the current president of the USA?",
            "What is a common hedging method against currency volatility?"]

for prompt in prompts:
    input_text = prompt
    input_tokens = tokenizer(input_text, return_tensors="pt").input_ids
    outputs = model.generate(input_tokens)
    output_text = tokenizer.decode(outputs[0])
    print(output_text)

<pad> 4.5 billion years</s>
<pad>s = 0 for i in range(len(list(map(int
<pad> barack obama</s>
<pad> hedging with forward contracts</s>

<pad> 4.5 billion years</s>
<pad>s = 0 for i in range(len(list(map(int
<pad> barack obama</s>
<pad> hedging with forward contracts</s>

(AI ML 03) Fine tuning and application of LLM in finance, FinGPT part 1

Leave a Reply Cancel reply

Archives

Categories