Python_통계_(최종)정리하기 / mini-project

<aside> 💡

00 약품 처방 건수 예측하기

</aside>

임포트

# 필요한 라이브러리 임포트
from sklearn.metrics import mean_squared_error, mean_absolute_error  # 모델 평가 지표
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf  # ACF, PACF 시각화
from statsmodels.tsa.seasonal import seasonal_decompose, STL  # 계절성 분해
from statsmodels.stats.diagnostic import acorr_ljungbox  # 잔차 검정
from statsmodels.tsa.statespace.sarimax import SARIMAX  # SARIMA 모델
from statsmodels.tsa.arima_process import ArmaProcess  # ARMA 프로세스
from statsmodels.graphics.gofplots import qqplot  # Q-Q plot
from statsmodels.tsa.stattools import adfuller  # 정상성 검정
from tqdm.auto import tqdm  # 진행률 표시
from itertools import product  # 파라미터 조합 생성
from typing import Union  # 타입 힌팅

# 기본 데이터 분석 및 시각화 라이브러리
import matplotlib.pyplot as plt
import statsmodels.api as sm
import pandas as pd
import numpy as np

# 경고 메시지 무시
import warnings
warnings.filterwarnings('ignore')

# Jupyter notebook에서 그래프 인라인 표시
%matplotlib inline

데이터 불러오기

df = pd.read_csv('./data/drugs.csv')

구성요소 분리 ; STL 분해

; STL : Seasonal-Trend decomposition using Loess

구성요소
- 추세, 계절성, 잔차

decomposition = STL(df.y, period = 12).fit()

STL : 추세(trend), 계절성(Seasonal), 추세(Residual)로 나눔

df.y : y컬럼만

period : 12개월 주기

fit() : 학습

구성요소 + 원래 값 시각화

fig, (ax1, ax2, ax3, ax4) = plt.subplots(nrows = 4, ncols = 1, sharex = True, figsize = (10,8))

ax1.plot(decomposition.observed)  # 원래 데이터
ax1.set_ylabel('Observed')

ax2.plot(decomposition.trend)     # trend 데이터
ax2.set_ylabel('Trend')

ax3.plot(decomposition.seasonal)  # 계절성 데이터
ax3.set_ylabel('Seasonal')

ax4.plot(decomposition.resid)     # 추세 데이터
ax4.set_ylabel('Residuals')

plt.xticks(np.arange(6, 203, 12), np.arange(1992, 2009, 1))

fig.autofmt_xdate()
plt.tight_layout()

plt.show()