Forecast the Values with Machine Learning Models

Solving Business Problems with technologies,In this article going to discuss about Forecast/Predict the future trends based on the Historical Data with levering the Machine Learning Models.

1.Time Series Machine Learning Models

2.Build “Neural Network” with Tensorflow & Keras

Time Series Model

Below are the Classic time series models

· Autoregression (AR)
· Moving Average (MA)
· Autoregressive Moving Average (ARMA)
· Autoregressive Integrated Moving Average (ARIMA)
· Seasonal Autoregressive Integrated Moving-Average (SARIMA)
· Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)
· Vector Autoregression (VAR)
· Vector Autoregression Moving-Average (VARMA)
· Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX)
· Simple Exponential Smoothing (SES)
· Holt Winter’s Exponential Smoothing (HWES)
· Prophet

From the above model,I’ll share my analysis without going into detailed statics simple way to implement the ARIMA and SARIMA Models

Have to find the P for (AR – Auto Regressive), D (Integrated) and Q (Moving Average) value for Non-Seasonal and Seasonal Time Series, In the below Code shared how to find the P and D values directly and Q Value alone need to find it through the Graph or Trial& Run methods.

Used the Pandas, numpy library files of python to predict the values based on historical data with ARIMA and SARIMA Time series models

Code#


	import numpy as np import pandas as pd
	import matplotlib.pyplot as plt
	import sm

	from pandas.plotting._matplotlib import autocorrelation_plot


	df=pd.read_csv("Path\\PPA_Liability_INS.csv",index_col='Year',parse_dates=True)
	df=df.dropna()
	#df['Year']=pd.to_datetime(df['Year'])
	print('Shape of data',df.shape)
	df.head()
	df_np=df[["Net premiums Written"]]

	df_window=df.rolling(window=5).mean()
	df_window.plot(figsize=(12,5))
	plt.title ('Rolling Average')
	plt.show()

	#Naive/ Base Model
	df_base=pd.concat([df,df.shift(1)],axis=1)
	#print(df_base)
	df_base.columns=['Actual','Forecast']
	print(df_base.head())
	df_base.dropna(inplace=True)
	from sklearn.metrics import mean_squared_error
	import numpy as np
	df_error=mean_squared_error(df_base.Actual,df_base.Forecast)
	print(df_error)
	print("Naive SQRT Error:", np.sqrt(df_error))

	#ARIMA Model
	# find out Stationary or not
	from statsmodels.tsa.stattools import adfuller
	result = adfuller(df.dropna())
	print(result)
	print(f"ADF Statistics: {result[0]}")
	print(f"p-values: {result[1]}")

	# If P value is greater than >0.05 then Non-Stationary
	# Identify D value = 2
	from pmdarima.arima.utils import ndiffs
	print(f"d-values:",ndiffs(df,test="adf"))

	# Identify the Q Value = 2 with Graph
	autocorrelation_plot(df)
	plt.title("Q Value Auto Correlation")
	plt.show()

	#Build SARIMA Model
	from statsmodels.tsa.seasonal import seasonal_decompose
	decompose_data = seasonal_decompose(df, model="additive")
	decompose_data.plot();
	plt.title("Seasonal_Decompose")
	plt.show()

	#model = sm.tsa.SARIMAX(history, trend='c', order=arima_order, enforce_stationarity=False, enforce_invertibility=False)

	#Build ARIMA Model

	#ARIMA Model
	from pmdarima import auto_arima
	# Ignore harmless warnings
	import warnings
	warnings.filterwarnings("ignore")
	stepwise_fit = auto_arima(df['Net premiums Written'],suppress_warnings=True)
	stepwise_fit.summary()
	from statsmodels.tsa.arima_model import ARIMA
	print(df.shape)
	train=df.iloc[:-3]
	test=df.iloc[-3:]
	print(train.shape,test.shape)
	print(test.iloc[0],test.iloc[-1])
	from statsmodels.tsa.arima_model import ARIMA

	model=ARIMA(train['Net premiums Written'],order=(1,2,2))
	model_fit=model.fit()
	print(model_fit.summary())

	df['forecast']=model_fit.predict(start=15,end=20,dynamic=True)
	df[['Net premiums Written','forecast']].plot(figsize=(12,8))
	plt.title("Net Premium Vs ARIMA Forecast")
	plt.show()

	df_rolling_adjust=pd.concat([df,df_window],axis=1)
	df_rolling_adjust.dropna(inplace=True)

	df_rolling_adjust.columns=['Actual','forecast','Rolling']
	print(df_rolling_adjust.head())
	df_rolling_adjust.dropna(inplace=True)

	df_rolling_adjust[['Actual','forecast','Rolling']].plot(figsize=(12,8))
	plt.title("Actual vs ARIMA Forecast vs Rolling")
	plt.show()

	# Sarima model
	sm_df=pd.read_csv("Path\\PPA_Liability_INS.csv",index_col='Year',parse_dates=True)
	sm_df.dropna()
	sm_train=df.iloc[:-3]
	sm_test=df.iloc[-3:]
	myorder=(1,2,1) # Used existing non-Sesaonal (ARIMA Model PDQ Value)
	my_seasonal_order=(2,1,2,4) # Used Sessonal P,D,Q ,S (4 - Quarter, 12-Monthly) Values
	# sarima takes in arguments in the following order: data, ARIMA inputs (p,d,q), SARIMA inputs (P,D,Q), and seasonal lag s
	from statsmodels.tsa.statespace.sarimax import SARIMAX
	sm_model=SARIMAX(train,order=myorder,seasonal_order=my_seasonal_order)
	sm_model_fit=sm_model.fit()
	print(sm_model_fit.summary())

	#sm_model=sm.tsa.statespace.SARIMAX(df['Actual'],order=(1, 2, 2),seasonal_order=(1,1,1,12))
	#results=sm_model.fit()
	df['sm_forecast']=sm_model_fit.predict(start=15,end=23,dynamic=True)
	df[['Net premiums Written','sm_forecast']].plot(figsize=(12,8))
	plt.title("Net Premium Vs sm_Forecast")
	plt.show()


	#Making a NAN value future dataset.
	from pandas.tseries.offsets import DateOffset
	pred_date=[sm_df.index[-1]+ DateOffset(years=x)for x in range(0,24)] # months
	pred_date=pd.DataFrame(index=pred_date[1:],columns=sm_df.columns)
	data=pd.concat([sm_df,pred_date])

	# Forecast with SARIMA Model
	data['forecast'] = sm_model_fit.predict(start = 19, end = 23, dynamic= True)
	data[['Net premiums Written', 'forecast']].plot(figsize=(12, 8))
	plt.title("Values of Net Premium Vs SARIMA_Forecast")
	plt.show()

Build “Neural Network” with Tensorflow & Keras

User can build their own neural network to predict the values based on Historical Data with Tensorflow, Keras and python.

#Code

import tensorflow as tf
from tensorflow import keras
import pandas as pd
import numpy as np
policies_df=pd.read_csv("path//Filename.csv)
policies_df=policies_df.dropna()

#Build Neural network to predict the premium
#Define Data frame to get the values from Neural Network
df3=pd.DataFrame(policies_df,columns=['Months','Avg_Premium']
PremiumPredict_model=tf.keras.Sequential([[keras.layers.Dense(Units=1,input_shape=[[1])])
PremiumPredict_model.compile(optimizer='sgd',loss='mean_squared_error')

#Define X and Y along with Data type int/float/String
PremiumPredict_model_XS=np.array(df3['Months'],dtype=int)
PremiumPredict_model_YS=np.array(df3['Avg_Premium'],dtype=float)
PremiumPredict_model.fit(PremiumPredict_model_XS,PremiumPredict_model_YS,epochs=1000)

#Having 10 months records with Avg Premium,Going to find the 11th month Avg Premium value
Cov_Future_Value=PremiumPredict_model.predict([11.0])

Print("Coverage Value in 11th Month predicted as :",Cov_Future_Value)

Conclusion

In the analysis of TimeSeries model “SARIMA” is giving better results compare to ARIMA Model,

Neural Network will be the best solution if able to find the correct and single column which depicts as X to find the Y value.

Search This Blog

ArunKumar - Automation & Machine Learning