Unlocking Stock Market Insights with RiskBERT

Predicting stock prices is akin to the attempts of alchemists in the Middle Ages to transmute lead into gold. Just as alchemists sought to unlock the secrets of transformation, statisticians and investors alike strive to decipher the patterns and signals hidden within market data. Traditional models often assume that stock prices follow a Brownian motion, making them appear entirely random and unpredictable. However, the reality is far more nuanced. While stock prices may exhibit elements of randomness, they are also influenced by a myriad of external factors, including news articles detailing significant events such as annual general meetings or company scandals.

In the fast-paced world of stock trading, staying ahead of the curve often means processing vast amounts of information in real-time. Investors are constantly seeking tools that can help to make more informed decisions amidst the chaos of market fluctuations and news updates. In recent years, advancements in natural language processing (NLP) and machine learning have paved the way for innovative approaches to analyzing market sentiment and risk.

In this blog post, we’ll explore how RiskBERT, can be leveraged to gain insights into stock market dynamics. In particular we try to figure out if, given the open price, a news release during the days is related to the closing price of that day.

Introducing RiskBERT

Built upon the foundation of BERT (Bidirectional Encoder Representations from Transformers), RiskBERT harnesses the power of deep learning to extract signals from textual data to quantify and assess the potential impact of news events on stock prices.

Analyzing Apple Inc. (AAPL) Stock

To demonstrate the capabilities of RiskBERT, let’s delve into a case study focusing on Apple Inc. (AAPL) stock. We’ll walk through a Python script that fetches news articles related to AAPL from a financial API, retrieves historical stock price data using Yahoo Finance, and applies RiskBERT to analyze the relationship between news sentiment and stock price movement.

Fetching News Data

We start by retrieving news articles related to AAPL using the EOD Historical Data API. Since the API limits the number of records per call, we implement a loop to fetch data iteratively until reaching the desired timeframe.

import requests
import pandas as pd
import yahooquery as yq
import matplotlib.pyplot as plt
import numpy as np
from RiskBERT import normalLoss
from RiskBERT import RiskBertModel
from RiskBERT import trainer, evaluate_model
from RiskBERT import DataConstructor
import torch
from transformers import AutoTokenizer

i=0
start_from= datetime.datetime.today().strftime("%Y-%m-%d")
while True:
    try:
        url = f'https://eodhd.com/api/news?s=AAPL.US&offset=0&limit=1000&to={start_from}&api_token=demo&fmt=json'
        data = requests.get(url).json()
        if i==0:
            appl_news = pd.DataFrame( data )
        else:
            appl_news=appl_news._append(pd.DataFrame( data ), ignore_index=True)
        start_from=str( min( pd.to_datetime(appl_news["date"]).dt.date ) )

        if min( pd.to_datetime(appl_news['date']).dt.date )<=datetime.date.fromisoformat('2016-02-19'):
            break
        i=i+1
    except Exception as e: 
        print(e)
        break

Retrieving Stock Price Data

Next, we obtain historical stock price data for AAPL from Yahoo Finance. This data will serve as the basis for our analysis, allowing us to correlate news events with changes in stock prices.

end = max(appl_news["date"]) 
start = min(appl_news["date"])

tq = yq.Ticker("AAPL")
stock_data = tq.history(start=start, end=end)

Preparing the Data

Before applying RiskBERT, we preprocess the data and join the news articles with the corresponding stock price data. We also perform feature engineering to enrich the dataset with additional information relevant to our analysis.

appl_news["daydate"]=pd.to_datetime(appl_news["date"]).dt.date
stock_with_news = stock_data.merge(appl_news,left_on="date",right_on="daydate", how="left")

stock_with_news = stock_with_news.dropna()
stock_with_news["label"] = np.log(stock_with_news["close"])-np.log(stock_with_news["open"])
stock_with_news["num_symbols"] = stock_with_news["symbols"].apply(lambda x: len(x))

Analyzing Stock Price Distribution

To determine the correct distribution for RiskBERT, we plot a basic histogram of the stock price changes. The distribution appears to be fairly “normal,” validating our choice of using the normalLoss as the loss function for RiskBERT. This is what to be expected theoretically which should not be surprising for the frequent reader of this blog (see https://www.thebigdatablog.com/does-my-stock-trading-strategy-work/)

plt.hist(np.log(stock_data["close"])-np.log(stock_data["open"]), bins=50, color="skyblue", edgecolor="black")
image

Applying RiskBERT

With the data prepared, we proceed to apply RiskBERT to analyze the relationship between news sentiment and stock price movement. We utilize a pre-trained BERT model and fine-tune it for our specific task, incorporating additional features such as the number of symbols mentioned in each news article.

# Set device to gpu if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

pre_model= "distilbert-base-uncased"
model = RiskBertModel(model=pre_model, input_dim=1, dropout=0.4, freeze_bert=True, mode="CLS", loss_fn=normalLoss).to(device)
tokenizer = AutoTokenizer.from_pretrained(pre_model)

covariates = np.array(
        [ 
         stock_with_news["num_symbols"] 
        ]
    ).T

my_data = DataConstructor( 
    sentences=[ [x] for x in stock_with_news["title"] ], 
    covariates=covariates,
    labels= [ [x] for x in stock_with_news["label"] ],
    tokenizer= tokenizer,
    device=device)

fitted_model, Total_Loss, Validation_Loss, Test_Loss = trainer(model =model, 
        model_dataset=my_data, 
        epochs=100,
        batch_size=1000,
        evaluate_fkt=evaluate_model,
        tokenizer=tokenizer, 
        optimizer=torch.optim.SGD(model.parameters(), lr=0.001),
        device = device
        )

my_prediction=fitted_model(**my_data.prepare_for_model())

Unveiling Insights

In conclusion, our exploration into the world of stock market analysis with RiskBERT has yielded promising insights and results. Observing how the validation loss evolved over epochs provides valuable insights into the training process. Despite fluctuations, we can discern a clear trend of decreasing validation loss over time, indicating that RiskBERT continuously improves its predictive capabilities as it learns from the data. Thus RiskBERT is able to capture the impact of certain news on the closing price. However, before false hopes arise, the estimated model cannot be used for forecasts. The observed learning curve is mainly the estimate of the intercept. If you want to experiment with RiskBERT by yourself, the code is available at https://github.com/heikowagner/generalized-semantic-regression/blob/main/RiskBERT/simulation/stock_market_example.py.

image 1
Validation Loss over 100 epochs

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.