Analysis of 2021-2022 CAISO Power Source Data

19 minute read

This data comes from Kaggle, and presumably ultimately from CAISO. The units are not labeled, so I am guessing a bit here on what is what. I’m going to do some exploratory analysis, and see if we can do any meaningful forecasting.

Read and preprocess data

Make sure that the date/time columns in the DataFrame are coded as such so that seaborn/matplotlib don’t treat them as strings, which causes major problems when plotting.

It’s not clear what the units are for the different energy sources, but based on what is on the CAISO website, I’m guessing they’re MW.

Unnamed: 0 Date Time Solar Wind Geothermal Biomass Biogas Small hydro Coal Nuclear Natural Gas Large Hydro Batteries Imports DateTime Month Year Total power
0 0 2021-09-01 2023-02-05 00:00:00 -34.0 4547.0 928.0 281.0 195.0 168.0 18.0 2263.0 8875.0 1261.0 -186.0 8145.0 2021-09-01 00:00:00 9 2021 26461.0
1 1 2021-09-01 2023-02-05 00:05:00 -34.0 4528.0 929.0 283.0 201.0 169.0 18.0 2262.0 9086.0 1109.0 -13.0 7717.0 2021-09-01 00:05:00 9 2021 26255.0
2 2 2021-09-01 2023-02-05 00:10:00 -34.0 4511.0 929.0 281.0 208.0 146.0 18.0 2263.0 9168.0 985.0 37.0 7553.0 2021-09-01 00:10:00 9 2021 26065.0
3 3 2021-09-01 2023-02-05 00:15:00 -34.0 4514.0 929.0 280.0 214.0 140.0 19.0 2262.0 9167.0 962.0 34.0 7458.0 2021-09-01 00:15:00 9 2021 25945.0
4 4 2021-09-01 2023-02-05 00:20:00 -34.0 4515.0 929.0 281.0 215.0 140.0 18.0 2262.0 9176.0 949.0 35.0 7342.0 2021-09-01 00:20:00 9 2021 25828.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
105099 283 2022-08-31 2023-02-05 23:35:00 -1.0 2576.0 872.0 339.0 203.0 206.0 4.0 2263.0 16228.0 2679.0 -434.0 8044.0 2022-08-31 23:35:00 8 2022 32979.0
105100 284 2022-08-31 2023-02-05 23:40:00 0.0 2589.0 871.0 340.0 204.0 206.0 5.0 2264.0 16098.0 2687.0 -487.0 8078.0 2022-08-31 23:40:00 8 2022 32855.0
105101 285 2022-08-31 2023-02-05 23:45:00 0.0 2558.0 872.0 339.0 204.0 206.0 5.0 2263.0 16034.0 2636.0 -514.0 8120.0 2022-08-31 23:45:00 8 2022 32723.0
105102 286 2022-08-31 2023-02-05 23:50:00 0.0 2521.0 871.0 339.0 204.0 206.0 5.0 2264.0 15976.0 2642.0 -521.0 8163.0 2022-08-31 23:50:00 8 2022 32670.0
105103 287 2022-08-31 2023-02-05 23:55:00 0.0 2513.0 871.0 341.0 205.0 206.0 5.0 2264.0 15581.0 2650.0 -389.0 7993.0 2022-08-31 23:55:00 8 2022 32240.0

105104 rows × 19 columns

From the description below, we can see that Natural Gas is clearly the largest power source, and also the second most variable after Solar (2nd highest standard deviation).

Unnamed: 0 Solar Wind Geothermal Biomass Biogas Small hydro Coal Nuclear Natural Gas Large Hydro Batteries Imports Month Year Total power
count 105104.000000 105092.000000 105092.000000 105092.000000 105092.000000 105092.000000 105092.000000 105092.000000 105092.000000 105092.000000 105092.000000 105092.000000 105092.000000 105104.000000 105104.000000 105092.000000
mean 143.478231 4193.116555 2454.372731 869.873949 287.631047 200.969284 191.277966 12.498525 2076.552973 8614.644778 1457.924219 77.379268 5612.521467 6.526374 2021.665703 26048.762760
std 83.125935 5046.487510 1456.446929 76.966457 45.488711 14.678127 94.729647 4.994506 406.780691 3905.161266 855.182969 572.085363 2963.652848 3.447995 0.471747 4762.813769
min 0.000000 -180.000000 28.000000 474.000000 -278.000000 132.000000 46.000000 -6.000000 446.000000 1494.000000 -494.000000 -1848.000000 -4459.000000 1.000000 2021.000000 15916.000000
25% 71.000000 -33.000000 1200.000000 823.000000 255.000000 195.000000 153.000000 9.000000 2250.000000 5692.000000 878.000000 -234.000000 3395.000000 4.000000 2021.000000 22661.000000
50% 143.000000 170.000000 2211.000000 878.000000 288.000000 204.000000 190.000000 14.000000 2264.000000 8116.000000 1279.000000 3.000000 6208.000000 7.000000 2022.000000 25121.500000
75% 215.000000 9391.000000 3649.000000 903.000000 320.000000 211.000000 226.000000 17.000000 2268.000000 10828.000000 1908.000000 319.000000 7999.000000 10.000000 2022.000000 27983.250000
max 287.000000 14288.000000 6429.000000 1134.000000 412.000000 242.000000 3316.000000 91.000000 2287.000000 25441.000000 4556.000000 3053.000000 11587.000000 12.000000 2022.000000 46679.000000

Exploratory Data Analysis

This dataset comes in 5 minute intervals spanning a full year. That is >100k samples, which is difficult to visualize meaningfully all at once. Furthermore, there are at least 2 meaningful periods of variation in this dataset: daily and annually. For these reasons, we’ll make 2 different plots, one showing the daily average power generation over the course of the full year, and one showing hourly power generation over one day, averaging over every day of the dataset.

If we do this for only the Solar power data, we see the following figures

png

png

The curves are almost exactly what we should expect. The annual curve peaks in late June at the summer solstice, and has a trough in late December during the winter solstice. The light blue shaded region shows the 95% inner quantile range.

In order to make more plots easily with Seaborn, we need to convert the DataFrame from wide to long format.

Date Time DateTime Month Year Source Power (MW)
0 2021-09-01 2023-02-05 00:00:00 2021-09-01 00:00:00 9 2021 Solar -34.0
1 2021-09-01 2023-02-05 00:05:00 2021-09-01 00:05:00 9 2021 Solar -34.0
2 2021-09-01 2023-02-05 00:10:00 2021-09-01 00:10:00 9 2021 Solar -34.0
3 2021-09-01 2023-02-05 00:15:00 2021-09-01 00:15:00 9 2021 Solar -34.0
4 2021-09-01 2023-02-05 00:20:00 2021-09-01 00:20:00 9 2021 Solar -34.0
... ... ... ... ... ... ... ...
1366347 2022-08-31 2023-02-05 23:35:00 2022-08-31 23:35:00 8 2022 Total power 32979.0
1366348 2022-08-31 2023-02-05 23:40:00 2022-08-31 23:40:00 8 2022 Total power 32855.0
1366349 2022-08-31 2023-02-05 23:45:00 2022-08-31 23:45:00 8 2022 Total power 32723.0
1366350 2022-08-31 2023-02-05 23:50:00 2022-08-31 23:50:00 8 2022 Total power 32670.0
1366351 2022-08-31 2023-02-05 23:55:00 2022-08-31 23:55:00 8 2022 Total power 32240.0

1366352 rows × 7 columns

Next we’ll make the same plots as above, but this time for all power sources.

png

png

There are a number of interesting features happening in the two figures above.

  • A weeklong variable period is apparent in the Total Power curve. January 1st 2022 was a Saturday and also lines up with one of the troughs, which indicates to me that weekends generally put substantially less load on the power system.
  • In the linear plot, we can see that the Total Power drawn is generally higher in the summer and lower in the winter, although there is a notable increase in power draw around Christmastime. Christmas lights? Or perhaps it was just cold.
  • Unsurprisingly, nuclear power output is generally extremely steady, although we can see that it did drop off a few times.
  • We can again see the seasonal variability of solar power, probably more clearly in the linear plot than the log.
  • On a day-to-day basis, wind power is extremely variable, more so than solar.
  • California imports more power in the winter than summer. I would guess this may be because electricity gets more expensive in the summer, and therefore harder to import.
  • Biomass, biogas, small hydro, and batteries play a pretty small role.
  • There is virtually no coal power in the system. That’s because there is only one 63 MW coal plant operating in the state of California, in Trona.

Now let’s take a look at the hourly data.

png

png

Again we can make a few interesting observations.

  • Although they make up only a small part of CAISO’s power capacity, batteries are playing a nontrivial role at certain times of day, particularly in the evening. We can also see the batteries charging during the day in the linear plot (where the curve goes negative).
  • The solar power curve has two “tails”. I don’t know what is causing them, but my hypothesis of what is has two components:
    • California is a long state N/S, and summer is the part of the year where the sun is up latest into the evening. During the summer, the evening terminator is mostly perpendicular to the length of the state, meaning that there should be a gradual dropoff in solar power as it gets dark from south to north. That explains the fact that there is at least one bump.
    • There are two bumps instead of one because of daylight savings time.
    • This does not occur in the morning because on summer mornings, the terminator faces the opposite direction and so all parts of the state start receiving solar power at about the same time.
  • Wind power does indeed see variation complementary to solar power as advertised, but it is a much smaller source than solar and therefore is not able to offset much of the solar variability.

For the last plots in this section, I’m going to separate sources into dispatchable and variable. Dispatchable resources are those that can be adjusted during the day to compensate for uncontrollable variability in other sources and demand. In practice, I am separating sources into these two categories based on whether they appear to have controllable daily variation in the plot above. The dispatchable resources that seem to have this:

  • Natural Gas
  • Imports
  • Large Hydro
  • Small Hydro
  • Batteries
  • (Theoretically coal would go here as well, but it’s too small for me to see evidence of daily variation.)

In principle, some of the other resources can be ramped up or down, but I would guess that they aren’t because they are smaller and more distributed, and may not be technically set up to be ramped on demand.

png

png

There’s not quite as much to see here as in the previous plots, except that the duck curve is alive and well in the hourly plot. The daily plot is dominated by random variation and the seasonal variation of solar power.

Time-series Modeling

Let’s see if we can extract any utility (haha) from time-series analysis of this data. First we’ll try an ARIMA analysis.

We have a full year of 5-minute samples available, which translates to 105,104 samples. This (empirially) is too many to fit on my laptop, so I’m going to downsample the data to hourly. In the real world, 5-minute predictions may be useful, but we’re going to make do with 1-hr predictions for now.

We’re going to try to predict the total amount of power from dispatchable resources required at any point in time. This is similar to what a real power company or ISO has to do in real time: tune the dispatchable power sources to cover the difference between variable supply and demand at all times.

Since the amount of variable resource supply affects how much dispatchable power will be needed, it is reasonable to look for relationships between previous values of variable power output and the current amount of dispatchable power. Below I create a series of plots showing the relationship between a series of lagged variable power samples and dispatchable power.

First let’s take a look at the autocorrelation and partial autocorrelation functions. These will help us decide what kind of model is most appropriate.

png

png

We see pretty clear evidence of a daily pattern in the autocorrelation function. The partial autocorrelation function suggests that there is meaningful information to be extracted up to lags of about 24, which makes sense since this data is taken over a 24 hour daily period. Unsurprisingly, we see evidence of “seasonality”, where in this case a season corresponds to a 24 hr day. This suggests that an ARIMA model of ARIMA(24, 0, 0)x(1, 0, 0, 24) could be appropriate. However, this is way too many lags to fit in a reasonable amount of time, so we’ll just have to see how many we can do before I run out of computation power or they stop adding reasonable predictive power.

We should also look at the real-time relationship between variable and dispatchable resources. Clearly there is a strong relationship, but if we are trying to predict future values of dispatchable power, we will only have access to past values of variable power. Therefore, we need to take at least one lag to make this realistic.

png

png

png

png

png

The first two variable lags have a reasonably strong relationship with the dispatchable power output, but by the time we get to the third lag, there isn’t much left. The ninth lag also shows essentially no relationship between the two, so we’ll limit the model to the first two lags of variable power output.

Given the results of the autocorrelation plots and the Variable vs. Dispatchable plots above, let’s build a series of models with 1 and 2 Variable source lags, building up the number of autocorrelated lags as far as we reasonably can. We’ll also include one 24-hr seasonal term.

                               SARIMAX Results                                
==============================================================================
Dep. Variable:           Dispatchable   No. Observations:                 8759
Model:                 ARIMA(1, 0, 0)   Log Likelihood              -79321.828
Date:                Sun, 05 Feb 2023   AIC                         158649.656
Time:                        10:16:15   BIC                         158670.890
Sample:                             0   HQIC                        158656.891
                               - 8759                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const       1.597e+04    420.877     37.936      0.000    1.51e+04    1.68e+04
ar.L1          0.9439      0.004    239.155      0.000       0.936       0.952
sigma2        4.3e+06   5.13e+04     83.832      0.000     4.2e+06     4.4e+06
===================================================================================
Ljung-Box (L1) (Q):                4591.85   Jarque-Bera (JB):              1675.00
Prob(Q):                              0.00   Prob(JB):                         0.00
Heteroskedasticity (H):               1.03   Skew:                             0.66
Prob(H) (two-sided):                  0.50   Kurtosis:                         4.69
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).




                               SARIMAX Results                                
==============================================================================
Dep. Variable:           Dispatchable   No. Observations:                 8759
Model:             ARIMA(1, 0, 0, 24)   Log Likelihood              -80386.596
Date:                Sun, 05 Feb 2023   AIC                         160779.192
Time:                        10:16:20   BIC                         160800.425
Sample:                             0   HQIC                        160786.427
                               - 8759                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const       1.597e+04    358.556     44.530      0.000    1.53e+04    1.67e+04
ar.S.L24       0.9298      0.004    236.051      0.000       0.922       0.938
sigma2      5.462e+06   6.02e+04     90.769      0.000    5.34e+06    5.58e+06
===================================================================================
Ljung-Box (L1) (Q):                8069.84   Jarque-Bera (JB):              1334.72
Prob(Q):                              0.00   Prob(JB):                         0.00
Heteroskedasticity (H):               1.12   Skew:                             0.08
Prob(H) (two-sided):                  0.00   Kurtosis:                         4.91
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).



                                    SARIMAX Results                                     
========================================================================================
Dep. Variable:                     Dispatchable   No. Observations:                 8759
Model:             ARIMA(1, 0, 0)x(1, 0, 0, 24)   Log Likelihood              -69337.252
Date:                          Sun, 05 Feb 2023   AIC                         138682.504
Time:                                  10:05:54   BIC                         138710.815
Sample:                                       0   HQIC                        138692.150
                                         - 8759                                         
Covariance Type:                            opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const       1.597e+04   1.48e-11   1.08e+15      0.000     1.6e+04     1.6e+04
ar.L1          1.0000    1.6e-05   6.23e+04      0.000       1.000       1.000
ar.S.L24       0.9491      0.002    415.030      0.000       0.945       0.954
sigma2      4.363e+05   1.99e-10    2.2e+15      0.000    4.36e+05    4.36e+05
===================================================================================
Ljung-Box (L1) (Q):                1712.24   Jarque-Bera (JB):             12230.48
Prob(Q):                              0.00   Prob(JB):                         0.00
Heteroskedasticity (H):               0.70   Skew:                             0.04
Prob(H) (two-sided):                  0.00   Kurtosis:                         8.79
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 3.27e+30. Standard errors may be unstable.



                                    SARIMAX Results                                     
========================================================================================
Dep. Variable:                     Dispatchable   No. Observations:                 8759
Model:             ARIMA(1, 0, 0)x(1, 0, 0, 24)   Log Likelihood              -69250.928
Date:                          Sun, 05 Feb 2023   AIC                         138511.856
Time:                                  10:17:19   BIC                         138547.245
Sample:                                       0   HQIC                        138523.915
                                         - 8759                                         
Covariance Type:                            opg                                         
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
const        2.301e+04   8.65e-09   2.66e+12      0.000     2.3e+04     2.3e+04
Variable.L1    -0.4229      0.000   -898.260      0.000      -0.424      -0.422
ar.L1           1.0000   1.77e-05   5.64e+04      0.000       1.000       1.000
ar.S.L24        0.9987      0.001   1724.759      0.000       0.998       1.000
sigma2       4.925e+05   7.78e-10   6.33e+14      0.000    4.92e+05    4.92e+05
===================================================================================
Ljung-Box (L1) (Q):                 401.93   Jarque-Bera (JB):             11334.48
Prob(Q):                              0.00   Prob(JB):                         0.00
Heteroskedasticity (H):               0.75   Skew:                            -0.11
Prob(H) (two-sided):                  0.00   Kurtosis:                         8.57
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 1.1e+28. Standard errors may be unstable.

ar.L1 == 1 is a random walk, where the most likely next value is the current value. On top of that, we have a strong relationship between the next value and the sample 24 hours ago, which is also close to a random walk with a 24 hr period. Finally, we also see that there is an inverse relationship between the forecasted dispatchable power output and the previous variable power output, as expected. We’ll use the information criteria in the upper right to choose our model.

                                    SARIMAX Results                                     
========================================================================================
Dep. Variable:                     Dispatchable   No. Observations:                 8759
Model:             ARIMA(2, 0, 0)x(1, 0, 0, 24)   Log Likelihood              -68125.930
Date:                          Sun, 05 Feb 2023   AIC                         136263.861
Time:                                  07:26:56   BIC                         136306.328
Sample:                                       0   HQIC                        136278.331
                                         - 8759                                         
Covariance Type:                            opg                                         
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
const        2.301e+04   1527.241     15.066      0.000       2e+04     2.6e+04
Variable.L1     0.0347      0.013      2.749      0.006       0.010       0.059
ar.L1           1.4371      0.010    150.823      0.000       1.418       1.456
ar.L2          -0.4938      0.010    -50.632      0.000      -0.513      -0.475
ar.S.L24        0.9263      0.003    328.655      0.000       0.921       0.932
sigma2       3.317e+05   2769.802    119.763      0.000    3.26e+05    3.37e+05
===================================================================================
Ljung-Box (L1) (Q):                  23.19   Jarque-Bera (JB):             11503.60
Prob(Q):                              0.00   Prob(JB):                         0.00
Heteroskedasticity (H):               0.75   Skew:                            -0.10
Prob(H) (two-sided):                  0.00   Kurtosis:                         8.61
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

The parameters have changed a bit now that we’ve added the extra lag. Besides the fit taking quite a bit longer, we now have a negative coefficient for the L2 term and a positive coefficient for the L1 term. This corresponds roughly to making a linear extrapolation from the previous two sample points. The coefficient of the Variable.L1 term has also nearly disappeared, but it is still quite statistically significant.

                                    SARIMAX Results                                     
========================================================================================
Dep. Variable:                     Dispatchable   No. Observations:                 8759
Model:             ARIMA(3, 0, 0)x(1, 0, 0, 24)   Log Likelihood              -68048.912
Date:                          Sun, 05 Feb 2023   AIC                         136111.824
Time:                                  10:20:22   BIC                         136161.369
Sample:                                       0   HQIC                        136128.706
                                         - 8759                                         
Covariance Type:                            opg                                         
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
const        2.301e+04   1802.777     12.763      0.000    1.95e+04    2.65e+04
Variable.L1     0.1533      0.016      9.590      0.000       0.122       0.185
ar.L1           1.5700      0.014    108.283      0.000       1.542       1.598
ar.L2          -0.7789      0.025    -30.808      0.000      -0.828      -0.729
ar.L3           0.1616      0.013     12.646      0.000       0.137       0.187
ar.S.L24        0.9258      0.003    328.230      0.000       0.920       0.931
sigma2        3.26e+05   2735.865    119.140      0.000    3.21e+05    3.31e+05
===================================================================================
Ljung-Box (L1) (Q):                   0.20   Jarque-Bera (JB):             11390.09
Prob(Q):                              0.65   Prob(JB):                         0.00
Heteroskedasticity (H):               0.77   Skew:                            -0.08
Prob(H) (two-sided):                  0.00   Kurtosis:                         8.58
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).




                                    SARIMAX Results                                     
========================================================================================
Dep. Variable:                     Dispatchable   No. Observations:                 8759
Model:             ARIMA(4, 0, 0)x(1, 0, 0, 24)   Log Likelihood              -68061.618
Date:                          Sun, 05 Feb 2023   AIC                         136139.236
Time:                                  07:51:19   BIC                         136195.859
Sample:                                       0   HQIC                        136158.529
                                         - 8759                                         
Covariance Type:                            opg                                         
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
const        2.301e+04   2668.673      8.622      0.000    1.78e+04    2.82e+04
Variable.L1     0.2007      0.017     12.149      0.000       0.168       0.233
ar.L1           1.6240      0.015    106.777      0.000       1.594       1.654
ar.L2          -0.8668      0.029    -29.703      0.000      -0.924      -0.810
ar.L3           0.2462      0.023     10.583      0.000       0.201       0.292
ar.L4          -0.0364      0.010     -3.502      0.000      -0.057      -0.016
ar.S.L24        0.9268      0.003    329.352      0.000       0.921       0.932
sigma2       3.321e+05   2853.443    116.401      0.000    3.27e+05    3.38e+05
===================================================================================
Ljung-Box (L1) (Q):                   1.26   Jarque-Bera (JB):             12334.06
Prob(Q):                              0.26   Prob(JB):                         0.00
Heteroskedasticity (H):               0.77   Skew:                            -0.06
Prob(H) (two-sided):                  0.00   Kurtosis:                         8.81
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

Non-convergence means that we are done adding terms, and model_1_3 is the best we are going to do with only Variable.L1.

                                    SARIMAX Results                                     
========================================================================================
Dep. Variable:                     Dispatchable   No. Observations:                 8759
Model:             ARIMA(1, 0, 0)x(1, 0, 0, 24)   Log Likelihood              -68898.499
Date:                          Sun, 05 Feb 2023   AIC                         137808.998
Time:                                  10:28:14   BIC                         137851.465
Sample:                                       0   HQIC                        137823.468
                                         - 8759                                         
Covariance Type:                            opg                                         
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
const         2.21e+04   2.71e-08   8.16e+11      0.000    2.21e+04    2.21e+04
Variable.L1    -0.3860      0.009    -42.275      0.000      -0.404      -0.368
Variable.L2     0.1625      0.008     20.866      0.000       0.147       0.178
ar.L1           1.0000   5.91e-07   1.69e+06      0.000       1.000       1.000
ar.S.L24        0.9338      0.003    353.493      0.000       0.929       0.939
sigma2       3.995e+05    8.4e-09   4.76e+13      0.000       4e+05       4e+05
===================================================================================
Ljung-Box (L1) (Q):                 507.22   Jarque-Bera (JB):              9530.79
Prob(Q):                              0.00   Prob(JB):                         0.00
Heteroskedasticity (H):               0.76   Skew:                             0.10
Prob(H) (two-sided):                  0.00   Kurtosis:                         8.11
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 2.78e+28. Standard errors may be unstable.




                                    SARIMAX Results                                     
========================================================================================
Dep. Variable:                     Dispatchable   No. Observations:                 8759
Model:             ARIMA(2, 0, 0)x(1, 0, 0, 24)   Log Likelihood              -68275.361
Date:                          Sun, 05 Feb 2023   AIC                         136564.722
Time:                                  08:09:44   BIC                         136614.267
Sample:                                       0   HQIC                        136581.604
                                         - 8759                                         
Covariance Type:                            opg                                         
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
const         2.21e+04   3.35e-08   6.59e+11      0.000    2.21e+04    2.21e+04
Variable.L1     0.0411      0.010      4.307      0.000       0.022       0.060
Variable.L2     0.1649      0.009     18.478      0.000       0.147       0.182
ar.L1           1.4987      0.009    169.438      0.000       1.481       1.516
ar.L2          -0.4987      0.009    -56.384      0.000      -0.516      -0.481
ar.S.L24        0.9265      0.003    353.314      0.000       0.921       0.932
sigma2       3.426e+05   7.36e-09   4.66e+13      0.000    3.43e+05    3.43e+05
===================================================================================
Ljung-Box (L1) (Q):                  12.19   Jarque-Bera (JB):             12506.26
Prob(Q):                              0.00   Prob(JB):                         0.00
Heteroskedasticity (H):               0.76   Skew:                            -0.02
Prob(H) (two-sided):                  0.00   Kurtosis:                         8.85
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 1.51e+29. Standard errors may be unstable.




                                    SARIMAX Results                                     
========================================================================================
Dep. Variable:                     Dispatchable   No. Observations:                 8759
Model:             ARIMA(4, 0, 0)x(1, 0, 0, 24)   Log Likelihood              -68073.396
Date:                          Sun, 05 Feb 2023   AIC                         136160.792
Time:                                  08:14:48   BIC                         136210.337
Sample:                                       0   HQIC                        136177.673
                                         - 8759                                         
Covariance Type:                            opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const       1.597e+04   1329.281     12.011      0.000    1.34e+04    1.86e+04
ar.L1          1.4725      0.008    185.271      0.000       1.457       1.488
ar.L2         -0.6368      0.015    -42.424      0.000      -0.666      -0.607
ar.L3          0.1400      0.016      8.704      0.000       0.108       0.171
ar.L4         -0.0319      0.010     -3.300      0.001      -0.051      -0.013
ar.S.L24       0.9174      0.003    316.633      0.000       0.912       0.923
sigma2      3.279e+05   2754.964    119.008      0.000    3.22e+05    3.33e+05
===================================================================================
Ljung-Box (L1) (Q):                   0.00   Jarque-Bera (JB):             10359.17
Prob(Q):                              0.96   Prob(JB):                         0.00
Heteroskedasticity (H):               0.78   Skew:                            -0.06
Prob(H) (two-sided):                  0.00   Kurtosis:                         8.33
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

We can plot a barchart of the AIC values for each model, which will help us decide which is optimal.

png

The Relative AIC is just the difference between the given model’s AIC and the lowest AIC amongst all models. It is a bit hard to see, but the AR(3)xSAR(1)xVar.L1 model has the lowest AIC, followed closely by the AR(4)xSAR(1) model. The fact that these are so close, coupled with the fact that including Var.L1 does seem to improve things (see AR(1)xSAR(1) vs. AR(1)xSAR(1)xVar.L1), suggests that a AR(4)xSAR(1)xVar.L1 model would likely perform even better if it had converged. Interestingly, adding the second Variable resource lag (Var.L2) seems to make things worse at AR(2).

Next we’ll look directly at some predictions and residuals. First, we’ll make a plot showing the predicted power vs the actual dispatchable power over the course of the year.

png

png

The residuals are small enough that they don’t noticeably show up on the first plot. By eye it looks like the residuals may be higher in the summer than in the winter, but that is hard to say. That wouldn’t surprise me very much, since solar power is much stronger in the summer and makes up the largest share of the variable resources. Weather effects, which should be a large contributor to solar variablity, would have a bigger effect on the overall market during periods with greater sunshine.

Next we’ll look at some plots of 3 arbitrary days of data with different models, comparing the actual and predicted dispatchable power output. First we’ll look at the final chosen model.

png

We see excellent agreement in general, as expected.

Next we’ll look at the AR(1) model, a.k.a. one autoregressive lag, no seasonal component, and no dependence on prior variable resource output.

png

This model tends to predict slight mean reversion based on the previous sample, which makes sense on average, but is pretty bad in general.

The next model includes only the “seasonal” (a.k.a. daily) term.

png

The prediction of this model is only informed by the value at the same time the day before, so while it doesn’t show the lagged behavior that the previous plot does, it is not able to react to any information more recent than 24 hours ago, and consequently is not very good.

Finally, let’s plot the residuals of these models as well as the AR(1)xSAR(1)xVar.L1 model together. We can see that both the green and red curve perform significanty better than the blue and orange curves, showing that having at least 4 terms (not including the constant term) in the model is important. Red and green are quite similar, but we know from the box-plot above that the more complex model (red) has a lower AIC than green. Since it also has more parameters, that must mean its likelihood is higher, and therefore it should have smaller residuals on average.

png

It would be interesting to fit an RNN to this data.