Machine Learning and Model Training for Stock Market Prediction AI

Table of Contents

Introduction

Building a stock market prediction AI system involves several critical steps, including feature engineering, model selection, and model training and evaluation. This article provides a detailed overview of these steps, with sample code to help you implement them effectively.

Feature Engineering

Feature engineering involves creating meaningful features from raw data to improve the predictive power of machine learning models. For stock market prediction, features can include technical indicators, sentiment scores from news analysis, and other domain-specific attributes.

Technical Indicators

Moving Average (MA)

The moving average smooths out price data to identify trends over a specified period.

import pandas as pd

def moving_average(data, window_size):
    return data['close'].rolling(window=window_size).mean()

data['50_MA'] = moving_average(data, 50)
data['200_MA'] = moving_average(data, 200)

Relative Strength Index (RSI)

The RSI measures the speed and change of price movements.

def calculate_rsi(data, window=14):
    delta = data['close'].diff()
    gain = delta.where(delta > 0, 0)
    loss = -delta.where(delta < 0, 0)
    avg_gain = gain.rolling(window=window).mean()
    avg_loss = loss.rolling(window=window).mean()
    rs = avg_gain / avg_loss
    rsi = 100 - (100 / (1 + rs))
    return rsi

data['RSI'] = calculate_rsi(data)

Moving Average Convergence Divergence (MACD)

The MACD is a trend-following momentum indicator that shows the relationship between two moving averages of a stock’s price.

def calculate_macd(data, short_window=12, long_window=26, signal_window=9):
    short_ema = data['close'].ewm(span=short_window, adjust=False).mean()
    long_ema = data['close'].ewm(span=long_window, adjust=False).mean()
    macd = short_ema - long_ema
    signal = macd.ewm(span=signal_window, adjust=False).mean()
    return macd, signal

data['MACD'], data['Signal_Line'] = calculate_macd(data)

Sentiment Analysis

Sentiment analysis involves extracting sentiment scores from news articles or social media to gauge market sentiment.

import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

nltk.download('vader_lexicon')
sia = SentimentIntensityAnalyzer()

def get_sentiment_score(text):
    return sia.polarity_scores(text)['compound']

# Example: Applying sentiment analysis on news headlines
data['sentiment_score'] = data['news_headlines'].apply(get_sentiment_score)

Model Selection

Choosing the right machine learning model is crucial for accurate stock market predictions. Common models include LSTMs (Long Short-Term Memory networks), Prophet, and ensemble methods.

Long Short-Term Memory (LSTM)

LSTMs are well-suited for time series forecasting due to their ability to capture long-term dependencies.

import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense

# Prepare data for LSTM
def create_sequences(data, sequence_length):
    x = []
    y = []
    for i in range(sequence_length, len(data)):
        x.append(data[i-sequence_length:i])
        y.append(data[i, 0])
    return np.array(x), np.array(y)

sequence_length = 50
x_train, y_train = create_sequences(train_data.values, sequence_length)
x_test, y_test = create_sequences(test_data.values, sequence_length)

# Build LSTM model
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(units=50))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(x_train, y_train, epochs=25, batch_size=32)

Prophet

Prophet is a forecasting tool developed by Facebook, designed for forecasting time series data with daily observations.

from fbprophet import Prophet

# Prepare data for Prophet
prophet_data = data.reset_index().rename(columns={'timestamp': 'ds', 'close': 'y'})

# Initialize and fit the model
model = Prophet()
model.fit(prophet_data)

# Make future predictions
future = model.make_future_dataframe(periods=365)
forecast = model.predict(future)

Ensemble Methods

Ensemble methods combine predictions from multiple models to improve accuracy.

from sklearn.ensemble import RandomForestRegressor

# Example: Using Random Forest as an ensemble method
x_train = train_data.drop(columns=['close']).values
y_train = train_data['close'].values
x_test = test_data.drop(columns=['close']).values
y_test = test_data['close'].values

# Initialize and train the model
model = RandomForestRegressor(n_estimators=100)
model.fit(x_train, y_train)

# Make predictions
predictions = model.predict(x_test)

Model Training and Evaluation

Training the chosen model involves splitting the data into training and testing sets, fitting the model on the training data, and evaluating its performance on the testing data.

Splitting Data

from sklearn.model_selection import train_test_split

# Split data into training and testing sets
train_data, test_data = train_test_split(scaled_data, test_size=0.2, shuffle=False)

Training the Model

Using the LSTM model as an example:

# Prepare data for LSTM
sequence_length = 50
x_train, y_train = create_sequences(train_data.values, sequence_length)
x_test, y_test = create_sequences(test_data.values, sequence_length)

# Build LSTM model
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(units=50))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(x_train, y_train, epochs=25, batch_size=32)

Model Evaluation

Evaluating the model involves measuring its performance using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), or Root Mean Squared Error (RMSE).

from sklearn.metrics import mean_squared_error

# Make predictions
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)

# Calculate RMSE
rmse = np.sqrt(mean_squared_error(y_test, predictions))
print(f'RMSE: {rmse}')

Conclusion

Building a stock market prediction AI system involves several key steps, from feature engineering to model selection and training. By creating meaningful features, selecting appropriate models, and rigorously training and evaluating these models, you can develop a robust system capable of making accurate stock market predictions. The provided sample code offers a foundation to get started with each of these steps, enabling you to customize and expand upon them to suit your specific needs.

Post Views: 275