Forecast the Values with Machine Learning Models
Forecast the Values with Machine Learning Models
Solving Business Problems with technologies,In this article going to discuss about Forecast/Predict the future trends based on the Historical Data with levering the Machine Learning Models.
1.Time Series Machine Learning Models
2.Build “Neural Network” with Tensorflow & Keras
Time Series Model
Below are the Classic time series models
- · Autoregression (AR)
- · Moving Average (MA)
- · Autoregressive Moving Average (ARMA)
- · Autoregressive Integrated Moving Average (ARIMA)
- · Seasonal Autoregressive Integrated Moving-Average (SARIMA)
- · Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)
- · Vector Autoregression (VAR)
- · Vector Autoregression Moving-Average (VARMA)
- · Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX)
- · Simple Exponential Smoothing (SES)
- · Holt Winter’s Exponential Smoothing (HWES)
- · Prophet
From the above model,I’ll share my analysis without going into detailed statics simple way to implement the ARIMA and SARIMA Models
Have to find the P for (AR – Auto Regressive), D (Integrated) and Q (Moving Average) value for Non-Seasonal and Seasonal Time Series, In the below Code shared how to find the P and D values directly and Q Value alone need to find it through the Graph or Trial& Run methods.
Used the Pandas, numpy library files of python to predict the values based on historical data with ARIMA and SARIMA Time series models
Code#
import numpy as np import
pandas as pd |
|
import matplotlib.pyplot as plt |
|
import sm |
|
from pandas.plotting._matplotlib import autocorrelation_plot |
|
df=pd.read_csv("Path\\PPA_Liability_INS.csv",index_col='Year',parse_dates=True) |
|
df=df.dropna() |
|
#df['Year']=pd.to_datetime(df['Year']) |
|
print('Shape
of data',df.shape) |
|
df.head() |
|
df_np=df[["Net
premiums Written"]] |
|
df_window=df.rolling(window=5).mean() |
|
df_window.plot(figsize=(12,5)) |
|
plt.title ('Rolling
Average') |
|
plt.show() |
|
#Naive/ Base Model |
|
df_base=pd.concat([df,df.shift(1)],axis=1) |
|
#print(df_base) |
|
df_base.columns=['Actual','Forecast'] |
|
print(df_base.head()) |
|
df_base.dropna(inplace=True) |
|
from sklearn.metrics import mean_squared_error |
|
import numpy as np |
|
df_error=mean_squared_error(df_base.Actual,df_base.Forecast) |
|
print(df_error) |
|
print("Naive
SQRT Error:", np.sqrt(df_error)) |
|
#ARIMA
Model |
|
# find out
Stationary or not |
|
from statsmodels.tsa.stattools import adfuller |
|
result = adfuller(df.dropna()) |
|
print(result) |
|
print(f"ADF
Statistics: {result[0]}") |
|
print(f"p-values: {result[1]}") |
|
# If P
value is greater than >0.05 then Non-Stationary |
|
# Identify
D value = 2 |
|
from pmdarima.arima.utils import ndiffs |
|
print(f"d-values:",ndiffs(df,test="adf")) |
|
# Identify
the Q Value = 2 with Graph |
|
autocorrelation_plot(df) |
|
plt.title("Q
Value Auto Correlation") |
|
plt.show() |
|
#Build
SARIMA Model |
|
from statsmodels.tsa.seasonal import seasonal_decompose |
|
decompose_data = seasonal_decompose(df, model="additive") |
|
decompose_data.plot(); |
|
plt.title("Seasonal_Decompose") |
|
plt.show() |
|
#model =
sm.tsa.SARIMAX(history, trend='c', order=arima_order,
enforce_stationarity=False, enforce_invertibility=False) |
|
#Build ARIMA Model |
|
#ARIMA Model |
|
from pmdarima import auto_arima |
|
# Ignore
harmless warnings |
|
import warnings |
|
warnings.filterwarnings("ignore") |
|
stepwise_fit = auto_arima(df['Net
premiums Written'],suppress_warnings=True) |
|
stepwise_fit.summary() |
|
from statsmodels.tsa.arima_model import ARIMA |
|
print(df.shape) |
|
train=df.iloc[:-3] |
|
test=df.iloc[-3:] |
|
print(train.shape,test.shape) |
|
print(test.iloc[0],test.iloc[-1]) |
|
from statsmodels.tsa.arima_model import ARIMA |
|
model=ARIMA(train['Net
premiums Written'],order=(1,2,2)) |
|
model_fit=model.fit() |
|
print(model_fit.summary()) |
|
df['forecast']=model_fit.predict(start=15,end=20,dynamic=True) |
|
df[['Net
premiums Written','forecast']].plot(figsize=(12,8)) |
|
plt.title("Net
Premium Vs ARIMA Forecast") |
|
plt.show() |
|
df_rolling_adjust=pd.concat([df,df_window],axis=1) |
|
df_rolling_adjust.dropna(inplace=True) |
|
df_rolling_adjust.columns=['Actual','forecast','Rolling'] |
|
print(df_rolling_adjust.head()) |
|
df_rolling_adjust.dropna(inplace=True) |
|
df_rolling_adjust[['Actual','forecast','Rolling']].plot(figsize=(12,8)) |
|
plt.title("Actual
vs ARIMA Forecast vs Rolling") |
|
plt.show() |
|
# Sarima model |
|
sm_df=pd.read_csv("Path\\PPA_Liability_INS.csv",index_col='Year',parse_dates=True) |
|
sm_df.dropna() |
|
sm_train=df.iloc[:-3] |
|
sm_test=df.iloc[-3:] |
|
myorder=(1,2,1) #
Used existing non-Sesaonal (ARIMA Model PDQ Value) |
|
my_seasonal_order=(2,1,2,4) #
Used Sessonal P,D,Q ,S (4 - Quarter, 12-Monthly) Values |
|
# sarima
takes in arguments in the following order: data, ARIMA inputs (p,d,q), SARIMA
inputs (P,D,Q), and seasonal lag s |
|
from statsmodels.tsa.statespace.sarimax import SARIMAX |
|
sm_model=SARIMAX(train,order=myorder,seasonal_order=my_seasonal_order) |
|
sm_model_fit=sm_model.fit() |
|
print(sm_model_fit.summary()) |
|
#sm_model=sm.tsa.statespace.SARIMAX(df['Actual'],order=(1,
2, 2),seasonal_order=(1,1,1,12)) |
|
#results=sm_model.fit() |
|
df['sm_forecast']=sm_model_fit.predict(start=15,end=23,dynamic=True) |
|
df[['Net
premiums Written','sm_forecast']].plot(figsize=(12,8)) |
|
plt.title("Net
Premium Vs sm_Forecast") |
|
plt.show() |
|
#Making a
NAN value future dataset. |
|
from pandas.tseries.offsets import DateOffset |
|
pred_date=[sm_df.index[-1]+ DateOffset(years=x)for x in range(0,24)] # months |
|
pred_date=pd.DataFrame(index=pred_date[1:],columns=sm_df.columns) |
|
data=pd.concat([sm_df,pred_date]) |
|
# Forecast
with SARIMA Model |
|
data['forecast'] = sm_model_fit.predict(start = 19, end = 23, dynamic= True) |
|
data[['Net
premiums Written', 'forecast']].plot(figsize=(12, 8)) |
|
plt.title("Values
of Net Premium Vs SARIMA_Forecast") |
|
plt.show() |
Build “Neural Network” with Tensorflow & Keras
User can build their own neural network to predict the values based on Historical Data with Tensorflow, Keras and python.
#Code
import tensorflow as tf
from tensorflow import keras
import pandas as pd
import numpy as np
policies_df=pd.read_csv("path//Filename.csv)
policies_df=policies_df.dropna()
#Build Neural network to predict the premium
#Define Data frame to get the values from Neural Network
df3=pd.DataFrame(policies_df,columns=['Months','Avg_Premium']
PremiumPredict_model=tf.keras.Sequential([[keras.layers.Dense(Units=1,input_shape=[[1])])
PremiumPredict_model.compile(optimizer='sgd',loss='mean_squared_error')
#Define X and Y along with Data type int/float/String
PremiumPredict_model_XS=np.array(df3['Months'],dtype=int)
PremiumPredict_model_YS=np.array(df3['Avg_Premium'],dtype=float)
PremiumPredict_model.fit(PremiumPredict_model_XS,PremiumPredict_model_YS,epochs=1000)
#Having 10 months records with Avg Premium,Going to find the 11th month Avg Premium value
Cov_Future_Value=PremiumPredict_model.predict([11.0])
Print("Coverage Value in 11th Month predicted as :",Cov_Future_Value)
Conclusion
In the analysis of TimeSeries model “SARIMA” is giving better results compare to ARIMA Model,
Neural Network will be the best solution if able to find the correct and single column which depicts as X to find the Y value.
Comments
Post a Comment