Forecast the Values with Machine Learning Models

 

Forecast the Values with Machine Learning Models

Solving Business Problems with technologies,In this article going to discuss about Forecast/Predict the future trends based on the Historical Data with levering the Machine Learning Models.

1.Time Series Machine Learning Models

2.Build “Neural Network” with Tensorflow & Keras

Time Series Model

Below are the Classic time series models

  • ·         Autoregression (AR)
  • ·         Moving Average (MA)
  • ·         Autoregressive Moving Average (ARMA)
  • ·         Autoregressive Integrated Moving Average (ARIMA)
  • ·         Seasonal Autoregressive Integrated Moving-Average (SARIMA)
  • ·         Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)
  • ·         Vector Autoregression (VAR)
  • ·         Vector Autoregression Moving-Average (VARMA)
  • ·         Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX)
  • ·         Simple Exponential Smoothing (SES)
  • ·         Holt Winter’s Exponential Smoothing (HWES)
  • ·         Prophet

From the above model,I’ll share my analysis without going into detailed statics simple way to implement the ARIMA and SARIMA Models

Have to find the P for (AR – Auto Regressive), D (Integrated) and Q (Moving Average) value  for Non-Seasonal and Seasonal Time Series, In the below Code shared how to find the P and D values directly and Q Value alone need to find  it through the Graph or Trial& Run methods.

Used the Pandas, numpy library files of python to predict the values based on historical data with ARIMA and SARIMA Time series models

Code#

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import sm

from pandas.plotting._matplotlib import autocorrelation_plot

df=pd.read_csv("Path\\PPA_Liability_INS.csv",index_col='Year',parse_dates=True)

df=df.dropna()

#df['Year']=pd.to_datetime(df['Year'])

print('Shape of data',df.shape)

df.head()

df_np=df[["Net premiums Written"]]

df_window=df.rolling(window=5).mean()

df_window.plot(figsize=(12,5))

plt.title ('Rolling Average')

plt.show()

#Naive/ Base Model

df_base=pd.concat([df,df.shift(1)],axis=1)

#print(df_base)

df_base.columns=['Actual','Forecast']

print(df_base.head())

df_base.dropna(inplace=True)

from sklearn.metrics import mean_squared_error

import numpy as np

df_error=mean_squared_error(df_base.Actual,df_base.Forecast)

print(df_error)

print("Naive SQRT Error:", np.sqrt(df_error))

#ARIMA Model

# find out Stationary or not

from statsmodels.tsa.stattools import adfuller

result = adfuller(df.dropna())

print(result)

print(f"ADF Statistics:  {result[0]}")

print(f"p-values:  {result[1]}")

# If P value is greater than >0.05 then Non-Stationary

# Identify D value = 2

from pmdarima.arima.utils import ndiffs

print(f"d-values:",ndiffs(df,test="adf"))

# Identify the Q Value = 2 with Graph

autocorrelation_plot(df)

plt.title("Q Value Auto Correlation")

plt.show()

#Build SARIMA Model

from statsmodels.tsa.seasonal import seasonal_decompose

decompose_data = seasonal_decompose(df, model="additive")

decompose_data.plot();

plt.title("Seasonal_Decompose")

plt.show()

#model = sm.tsa.SARIMAX(history, trend='c', order=arima_order, enforce_stationarity=False, enforce_invertibility=False)

#Build ARIMA Model

#ARIMA Model

from pmdarima import auto_arima

# Ignore harmless warnings

import warnings

warnings.filterwarnings("ignore")

stepwise_fit = auto_arima(df['Net premiums Written'],suppress_warnings=True)

stepwise_fit.summary()

from statsmodels.tsa.arima_model import ARIMA

print(df.shape)

train=df.iloc[:-3]

test=df.iloc[-3:]

print(train.shape,test.shape)

print(test.iloc[0],test.iloc[-1])

from statsmodels.tsa.arima_model import ARIMA

model=ARIMA(train['Net premiums Written'],order=(1,2,2))

model_fit=model.fit()

print(model_fit.summary())

df['forecast']=model_fit.predict(start=15,end=20,dynamic=True)

df[['Net premiums Written','forecast']].plot(figsize=(12,8))

plt.title("Net Premium Vs ARIMA Forecast")

plt.show()

df_rolling_adjust=pd.concat([df,df_window],axis=1)

df_rolling_adjust.dropna(inplace=True)

df_rolling_adjust.columns=['Actual','forecast','Rolling']

print(df_rolling_adjust.head())

df_rolling_adjust.dropna(inplace=True)

df_rolling_adjust[['Actual','forecast','Rolling']].plot(figsize=(12,8))

plt.title("Actual vs ARIMA Forecast vs Rolling")

plt.show()

# Sarima model

sm_df=pd.read_csv("Path\\PPA_Liability_INS.csv",index_col='Year',parse_dates=True)

sm_df.dropna()

sm_train=df.iloc[:-3]

sm_test=df.iloc[-3:]

myorder=(1,2,1) # Used existing non-Sesaonal (ARIMA Model PDQ Value)

my_seasonal_order=(2,1,2,4) # Used Sessonal P,D,Q ,S (4 - Quarter, 12-Monthly) Values

# sarima takes in arguments in the following order: data, ARIMA inputs (p,d,q), SARIMA inputs (P,D,Q), and seasonal lag s

from statsmodels.tsa.statespace.sarimax import SARIMAX

sm_model=SARIMAX(train,order=myorder,seasonal_order=my_seasonal_order)

sm_model_fit=sm_model.fit()

print(sm_model_fit.summary())

#sm_model=sm.tsa.statespace.SARIMAX(df['Actual'],order=(1, 2, 2),seasonal_order=(1,1,1,12))

#results=sm_model.fit()

df['sm_forecast']=sm_model_fit.predict(start=15,end=23,dynamic=True)

df[['Net premiums Written','sm_forecast']].plot(figsize=(12,8))

plt.title("Net Premium Vs sm_Forecast")

plt.show()

#Making a NAN value future dataset.

from pandas.tseries.offsets import DateOffset

pred_date=[sm_df.index[-1]+ DateOffset(years=x)for x in range(0,24)] # months

pred_date=pd.DataFrame(index=pred_date[1:],columns=sm_df.columns)

data=pd.concat([sm_df,pred_date])

# Forecast with SARIMA Model

data['forecast'] = sm_model_fit.predict(start = 19, end = 23, dynamic= True)

data[['Net premiums Written', 'forecast']].plot(figsize=(12, 8))

plt.title("Values of Net Premium Vs SARIMA_Forecast")

plt.show()

 

Build “Neural Network” with Tensorflow & Keras

User can build their own neural network to predict the values based on Historical Data with Tensorflow, Keras and python.

#Code

import tensorflow as tf

from tensorflow import keras

import pandas as pd

import numpy as np

policies_df=pd.read_csv("path//Filename.csv)

policies_df=policies_df.dropna()

 

#Build Neural network to predict the premium

#Define Data frame to get the values from Neural Network

df3=pd.DataFrame(policies_df,columns=['Months','Avg_Premium']

PremiumPredict_model=tf.keras.Sequential([[keras.layers.Dense(Units=1,input_shape=[[1])])

PremiumPredict_model.compile(optimizer='sgd',loss='mean_squared_error')

 

#Define X and Y along with Data type int/float/String

PremiumPredict_model_XS=np.array(df3['Months'],dtype=int)

PremiumPredict_model_YS=np.array(df3['Avg_Premium'],dtype=float)

PremiumPredict_model.fit(PremiumPredict_model_XS,PremiumPredict_model_YS,epochs=1000)

 

#Having 10 months records with Avg Premium,Going to find the 11th month Avg Premium value

Cov_Future_Value=PremiumPredict_model.predict([11.0])

Print("Coverage Value in 11th Month predicted as :",Cov_Future_Value)

Conclusion

In the analysis of TimeSeries model “SARIMA” is giving better results compare to ARIMA Model,

Neural Network will be the best solution if able to find the correct and single column which depicts as X to find the Y value.

 

Comments

Popular posts from this blog

Guidewire (Maintenance) Automation

Guidewire (Transformation) Automation