Automatic weather station examples¶

Eric Gagliano (egagli@uw.edu)
Updated: April 1st, 2024

Thanks for checking out these examples! The automatic_weather_station module is intended to make it easier to retrieve daily SNOTEL and CCSS data without having to do clunky downloads and conversions. Snow depth / SWE / PRCPSA are in meters, temperatures are in celsius. This module is built on my snotel_ccss_stations repository, which uses a github action to auto-update the station data daily.

In [1]:

Copied!

%load_ext autoreload
%autoreload 2
%load_ext autoreload
%autoreload 2

In [2]:

Copied!





%aimport easysnowdata

import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import datetime
import tqdm
import contextily as ctx

import xarray as xr
import glob

import requests 
from io import StringIO
%aimport easysnowdata

import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import datetime
import tqdm
import contextily as ctx

import xarray as xr
import glob

import requests 
from io import StringIO

View all SNOTEL & CCSS stations¶

the SNOwpack TELemetry (SNOTEL) network includes over 800 automated weather stations in the Western U.S. for mountain snowpack observation
the CCSS program manages a network of 130 automated snow sensors located in the Szierra Nevada and Shasta-Trinity Mountains

Get an up to date GeoDataFrame of all active SNOTEL and CCSS stations¶

In [3]:

Copied!

# bbox_gdf = gpd.read_file('https://github.com/egagli/sar_snowmelt_timing/raw/main/input/shapefiles/mt_rainier.geojson')
# bbox_gdf = gpd.read_file('https://github.com/egagli/sar_snowmelt_timing/raw/main/input/shapefiles/mt_rainier.geojson')

In [4]:

Copied!

StationCollection = easysnowdata.automatic_weather_stations.StationCollection()
StationCollection = easysnowdata.automatic_weather_stations.StationCollection()

Geodataframe with all stations has been added to the Station object. Please use the .all_stations attribute to access.
Use the .get_data(stations=geodataframe/string/list,variables=string/list,start_date=str,end_date=str) method to fetch data for specific stations and variables.

In [5]:

Copied!

StationCollection.all_stations
StationCollection.all_stations

Out[5]:

	name	network	elevation_m	latitude	longitude	state	HUC	mgrs	mountainRange	beginDate	endDate	csvData	geometry
code
301_CA_SNTL	Adin Mtn	SNOTEL	1886.712036	41.235828	-120.791924	California	180200021403	10TFL	Great Basin Ranges	1983-10-01	2100-01-01	True	POINT (-120.79192 41.23583)
907_UT_SNTL	Agua Canyon	SNOTEL	2712.719971	37.522171	-112.271179	Utah	160300020301	12SUG	Colorado Plateau	1994-10-01	2100-01-01	True	POINT (-112.27118 37.52217)
916_MT_SNTL	Albro Lake	SNOTEL	2529.840088	45.597229	-111.959023	Montana	100200050701	12TVR	Central Montana Rocky Mountains	1996-09-01	2100-01-01	True	POINT (-111.95902 45.59723)
1267_AK_SNTL	Alexander Lake	SNOTEL	48.768002	61.749668	-150.889664	Alaska	190205051106	05VPJ	None	2014-08-28	2100-01-01	True	POINT (-150.88966 61.74967)
908_WA_SNTL	Alpine Meadows	SNOTEL	1066.800049	47.779572	-121.698471	Washington	171100100501	10TET	Cascade Range	1994-09-01	2100-01-01	True	POINT (-121.69847 47.77957)
...	...	...	...	...	...	...	...	...	...	...	...	...	...
SLT	Slate Creek	CCSS	1737.360000	41.043980	-122.480103	California	180200050304	10TEL	Klamath Mountains	2004-10-01	2024-04-02	True	POINT (-122.48010 41.04398)
SLI	Slide Canyon	CCSS	2804.160000	38.091234	-119.431881	California	180400090501	11SKC	Sierra Nevada	2005-10-01	2024-04-02	True	POINT (-119.43188 38.09123)
SLK	South Lake	CCSS	2926.080000	37.175903	-118.562660	California	180901020601	11SLB	Sierra Nevada	2004-10-01	2024-04-02	True	POINT (-118.56266 37.17590)
STL	State Lakes	CCSS	3169.920000	36.926483	-118.573250	California	180300100305	11SLA	Sierra Nevada	2018-11-01	2024-04-02	True	POINT (-118.57325 36.92648)
TMR	Tamarack Summit	CCSS	2301.240000	37.163750	-119.200531	California	180400060903	11SLB	Sierra Nevada	2011-01-01	2024-04-02	True	POINT (-119.20053 37.16375)

969 rows × 13 columns

Use geopandas `GeoDataFrame.explore()` on the `all_stations` geodataframe to interactively view the stations¶

color by network: red is SNOTEL, blue is CCSS.

In [6]:

Copied!

StationCollection.all_stations.astype(dict(beginDate=str, endDate=str)).explore(
    column="network", cmap="bwr"
)
StationCollection.all_stations.astype(dict(beginDate=str, endDate=str)).explore(
    column="network", cmap="bwr"
)

Out[6]:

Make this Notebook Trusted to load map: File -> Trust Notebook

Get data for a singular site: Is our winter on track with the historical record at the Paradise, WA SNOTEL station?¶

check out information about the SNOTEL station near Mt. Rainier at Paradise, WA
cool plots available at the Northwest River Forecast Center website

Select a station code (which you can find in this interactive plot, or by other means)¶

for SNOTEL stations, this will be of the form {unique number}_{two letter state abbreviation}_SNTL (e.g. 679_WA_SNTL).
for CCSS stations, this will be a three letter code (e.g. BLK).

In [7]:

Copied!

ParadiseSNOTEL = easysnowdata.automatic_weather_stations.StationCollection()
ParadiseSNOTEL = easysnowdata.automatic_weather_stations.StationCollection()

Geodataframe with all stations has been added to the Station object. Please use the .all_stations attribute to access.
Use the .get_data(stations=geodataframe/string/list,variables=string/list,start_date=str,end_date=str) method to fetch data for specific stations and variables.

In [8]:

Copied!

ParadiseSNOTEL.get_data("679_WA_SNTL")
ParadiseSNOTEL.get_data("679_WA_SNTL")

Only one station chosen with variables=None. Default behavior fetches all variables for this station.
Dataframe has been added to the Station object. Please use the .data attribute to access.

In [9]:

Copied!

f, ax = plt.subplots()

ParadiseSNOTEL.stations.plot(ax=ax, color="red")

ax.set_xlim(-122, -121.5)
ax.set_ylim(46.6, 47.1)

ctx.add_basemap(
    ax, crs=ParadiseSNOTEL.stations.crs, source=ctx.providers.Esri.WorldImagery
)
f, ax = plt.subplots()

ParadiseSNOTEL.stations.plot(ax=ax, color="red")

ax.set_xlim(-122, -121.5)
ax.set_ylim(46.6, 47.1)

ctx.add_basemap(
    ax, crs=ParadiseSNOTEL.stations.crs, source=ctx.providers.Esri.WorldImagery
)

No description has been provided for this image

In [10]:

Copied!

ParadiseSNOTEL.data
ParadiseSNOTEL.data

Out[10]:

	TAVG	TMIN	TMAX	SNWD	WTEQ	PRCPSA
datetime
1980-10-01	NaN	NaN	NaN	NaN	0.0000	0.0000
1980-10-02	NaN	NaN	NaN	NaN	0.0000	0.0000
1980-10-03	NaN	NaN	NaN	NaN	0.0000	0.0000
1980-10-04	NaN	NaN	NaN	NaN	0.0000	0.0000
1980-10-05	NaN	NaN	NaN	NaN	0.0000	0.0000
...	...	...	...	...	...	...
2024-03-28	-1.0	-2.2	1.9	3.0480	1.4376	0.0102
2024-03-29	0.2	-3.5	5.7	3.1750	1.4478	0.0000
2024-03-30	1.5	-3.2	8.3	3.1242	1.4478	0.0025
2024-03-31	1.1	-3.4	6.9	3.1242	1.4503	0.0000
2024-04-01	NaN	NaN	NaN	3.0734	1.4453	NaN

15889 rows × 6 columns

Try a simple plot of snow depth and SWE¶

select the column of interest and use pandas built in Series.plot()

In [11]:

Copied!

f, ax = plt.subplots(figsize=(12, 5))

ParadiseSNOTEL.data["SNWD"].plot(ax=ax, label="snow depth")
ParadiseSNOTEL.data["WTEQ"].plot(ax=ax, label="snow water equivalent")

ax.set_xlim(pd.to_datetime(["2017-10-01", "2018-09-30"]))

ax.grid()
ax.legend()

ax.set_xlabel("time")
ax.set_ylabel("snow depth / SWE [meters]")
ax.set_title("Snow depth and SWE at Paradise, WA \n(water year 2018)")

f.tight_layout()
f, ax = plt.subplots(figsize=(12, 5))

ParadiseSNOTEL.data["SNWD"].plot(ax=ax, label="snow depth")
ParadiseSNOTEL.data["WTEQ"].plot(ax=ax, label="snow water equivalent")

ax.set_xlim(pd.to_datetime(["2017-10-01", "2018-09-30"]))

ax.grid()
ax.legend()

ax.set_xlabel("time")
ax.set_ylabel("snow depth / SWE [meters]")
ax.set_title("Snow depth and SWE at Paradise, WA \n(water year 2018)")

f.tight_layout()

Try a more complex plot that shows current snow depth against statistics calculated from the entire time series for each day of water year¶

water year is conceptual 12 month period used to describe when the bulk of precipitation falls, mostly used for hydrology attribution
- in the northern hemisphere, we usually define the water year to start October 1st and go until September 30th (e.g. water year 2017: October 1st, 2016 - September 30th, 2017)
- so October 1st is DOWY 1
try a function like easysnowdata.utils.datetime_to_DOWY() to convert datetimes to day of water year and add a dedicated DOWY column
- this function should account for leap years
then use pandas groupby functionality to calculate stats per DOWY
plot these stats
- thanks David Shean for the plot inspiration!

In [12]:

Copied!





ParadiseSNOTEL.data["DOWY"] = ParadiseSNOTEL.data.index.map(
    easysnowdata.utils.datetime_to_DOWY
)
ParadiseSNOTEL.data["WY"] = ParadiseSNOTEL.data.index.map(
    easysnowdata.utils.datetime_to_WY
)
ParadiseSNOTEL.data["DOWY"] = ParadiseSNOTEL.data.index.map(
    easysnowdata.utils.datetime_to_DOWY
)
ParadiseSNOTEL.data["WY"] = ParadiseSNOTEL.data.index.map(
    easysnowdata.utils.datetime_to_WY
)

In [13]:

Copied!

ParadiseSNOTEL.data
ParadiseSNOTEL.data

Out[13]:

	TAVG	TMIN	TMAX	SNWD	WTEQ	PRCPSA	DOWY	WY
datetime
1980-10-01	NaN	NaN	NaN	NaN	0.0000	0.0000	1	1981
1980-10-02	NaN	NaN	NaN	NaN	0.0000	0.0000	2	1981
1980-10-03	NaN	NaN	NaN	NaN	0.0000	0.0000	3	1981
1980-10-04	NaN	NaN	NaN	NaN	0.0000	0.0000	4	1981
1980-10-05	NaN	NaN	NaN	NaN	0.0000	0.0000	5	1981
...	...	...	...	...	...	...	...	...
2024-03-28	-1.0	-2.2	1.9	3.0480	1.4376	0.0102	180	2024
2024-03-29	0.2	-3.5	5.7	3.1750	1.4478	0.0000	181	2024
2024-03-30	1.5	-3.2	8.3	3.1242	1.4478	0.0025	182	2024
2024-03-31	1.1	-3.4	6.9	3.1242	1.4503	0.0000	183	2024
2024-04-01	NaN	NaN	NaN	3.0734	1.4453	NaN	184	2024

15889 rows × 8 columns

In [14]:

Copied!





stat_list = ["min", "max", "mean", "std", "median"]
paradise_snotel_DOWY_snwd_stats = ParadiseSNOTEL.data.groupby("DOWY").agg(stat_list)[
    "SNWD"
]
paradise_snotel_DOWY_snwd_stats
stat_list = ["min", "max", "mean", "std", "median"]
paradise_snotel_DOWY_snwd_stats = ParadiseSNOTEL.data.groupby("DOWY").agg(stat_list)[
    "SNWD"
]
paradise_snotel_DOWY_snwd_stats

Out[14]:

	min	max	mean	std	median
DOWY
1	0.0	0.2286	0.014111	0.053862	0.0
2	0.0	0.2032	0.011289	0.047895	0.0
3	0.0	0.2286	0.014111	0.053862	0.0
4	0.0	0.1270	0.012700	0.032888	0.0
5	0.0	0.1270	0.012700	0.034022	0.0
...	...	...	...	...	...
362	0.0	0.0254	0.001411	0.005987	0.0
363	0.0	0.0254	0.001411	0.005987	0.0
364	0.0	0.0254	0.002822	0.008214	0.0
365	0.0	0.0762	0.005644	0.018595	0.0
366	0.0	0.0000	0.000000	0.000000	0.0

366 rows × 5 columns

In [15]:

Copied!

today = datetime.datetime.today().strftime("%Y-%m-%d")
current_WY = slice(f"{int(today[0:4])-1}-10-01", f"{today}")
current_WY_paradise_snotel = ParadiseSNOTEL.data[current_WY.start : current_WY.stop]
today = datetime.datetime.today().strftime("%Y-%m-%d")
current_WY = slice(f"{int(today[0:4])-1}-10-01", f"{today}")
current_WY_paradise_snotel = ParadiseSNOTEL.data[current_WY.start : current_WY.stop]

In [16]:

Copied!





f, ax = plt.subplots(figsize=(12, 7))

for stat, stat_color in zip(
    ["min", "max", "mean", "median"], ["red", "blue", "mediumpurple", "mediumseagreen"]
):
    ax.plot(
        paradise_snotel_DOWY_snwd_stats.index,
        paradise_snotel_DOWY_snwd_stats[stat],
        label=stat,
        color=stat_color,
        linewidth=3,
    )

ax.fill_between(
    paradise_snotel_DOWY_snwd_stats.index,
    paradise_snotel_DOWY_snwd_stats["mean"] - paradise_snotel_DOWY_snwd_stats["std"],
    paradise_snotel_DOWY_snwd_stats["mean"] + paradise_snotel_DOWY_snwd_stats["std"],
    color="mediumpurple",
    alpha=0.3,
    label="mean +/- 1 std",
)

ax.scatter(
    current_WY_paradise_snotel.DOWY,
    current_WY_paradise_snotel.SNWD,
    marker="o",
    color="black",
    label="Current WY",
)

ax.set_xlim([0, 366])
ax.set_ylim([0, 6])

ax.grid()
ax.legend()

ax.set_title(
    f"Current snow depth against historical snow depth stats by DOWY at Paradise, WA\n({ParadiseSNOTEL.data.index.min().date()} - {ParadiseSNOTEL.data.index.max().date()})"
)
ax.set_xlabel("Day of Water Year [Oct 1 - Sept 30]")
ax.set_ylabel("Snow depth [meters]")
f.tight_layout()
f, ax = plt.subplots(figsize=(12, 7))

for stat, stat_color in zip(
    ["min", "max", "mean", "median"], ["red", "blue", "mediumpurple", "mediumseagreen"]
):
    ax.plot(
        paradise_snotel_DOWY_snwd_stats.index,
        paradise_snotel_DOWY_snwd_stats[stat],
        label=stat,
        color=stat_color,
        linewidth=3,
    )

ax.fill_between(
    paradise_snotel_DOWY_snwd_stats.index,
    paradise_snotel_DOWY_snwd_stats["mean"] - paradise_snotel_DOWY_snwd_stats["std"],
    paradise_snotel_DOWY_snwd_stats["mean"] + paradise_snotel_DOWY_snwd_stats["std"],
    color="mediumpurple",
    alpha=0.3,
    label="mean +/- 1 std",
)

ax.scatter(
    current_WY_paradise_snotel.DOWY,
    current_WY_paradise_snotel.SNWD,
    marker="o",
    color="black",
    label="Current WY",
)

ax.set_xlim([0, 366])
ax.set_ylim([0, 6])

ax.grid()
ax.legend()

ax.set_title(
    f"Current snow depth against historical snow depth stats by DOWY at Paradise, WA\n({ParadiseSNOTEL.data.index.min().date()} - {ParadiseSNOTEL.data.index.max().date()})"
)
ax.set_xlabel("Day of Water Year [Oct 1 - Sept 30]")
ax.set_ylabel("Snow depth [meters]")
f.tight_layout()

Looks like we're slightly below the mean for snow depth for today's DOWY.

Read a variable from a list of stations: Does the SNOTEL network and CCSS network list the same station?¶

no controversy here, but i noticed that a station name was repeated
perhaps both networks have data from the same station accessible
might make sense if they are co-managed in some way?

Create a list of the stations we are interested in and pass it in to our get_data() function¶

In [17]:

Copied!

station_list = ["356_CA_SNTL", "BLK"]

TwoStations = easysnowdata.automatic_weather_stations.StationCollection()
station_list = ["356_CA_SNTL", "BLK"]

TwoStations = easysnowdata.automatic_weather_stations.StationCollection()

Geodataframe with all stations has been added to the Station object. Please use the .all_stations attribute to access.
Use the .get_data(stations=geodataframe/string/list,variables=string/list,start_date=str,end_date=str) method to fetch data for specific stations and variables.

In [18]:

Copied!

TwoStations.get_data(station_list, variables="WTEQ")
TwoStations.get_data(station_list, variables="WTEQ")

100%|██████████| 2/2 [00:00<00:00,  2.31it/s]

WTEQ dataframe has been added to the Station object. Please use the .WTEQ attribute to access the dataframe.
Full ['WTEQ'] dataset has been added to the station object. Please use the .data attribute to access the dataset.

In [19]:

Copied!

TwoStations.WTEQ
TwoStations.WTEQ

Out[19]:

	356_CA_SNTL	BLK
datetime
1980-07-09	NaN	NaN
1980-07-10	NaN	NaN
1980-07-11	NaN	NaN
1980-07-12	NaN	NaN
1980-07-13	NaN	NaN
...	...	...
2024-03-29	0.8204	0.8204
2024-03-30	0.8433	0.8433
2024-03-31	0.8484	0.8484
2024-04-01	0.8534	0.8534
2024-04-02	NaN	0.8534

15974 rows × 2 columns

Plot the two stations on the same axis¶

if we properly set the index as the datetime, and we used parse_dates=True, these should line up!

In [20]:

Copied!

f, ax = plt.subplots(figsize=(20, 7))

TwoStations.WTEQ.plot(ax=ax, color=["red", "blue"])

ax.legend()

ax.set_xlabel("time")
ax.set_ylabel("snow water equivalent [meters]")
ax.set_title("SNOTEL and CCSS SWE at Blue Lakes, CA \nmaybe these are the same? :)")

f.tight_layout()
f, ax = plt.subplots(figsize=(20, 7))

TwoStations.WTEQ.plot(ax=ax, color=["red", "blue"])

ax.legend()

ax.set_xlabel("time")
ax.set_ylabel("snow water equivalent [meters]")
ax.set_title("SNOTEL and CCSS SWE at Blue Lakes, CA \nmaybe these are the same? :)")

f.tight_layout()

These look oddly similar... let's check out their correlation¶

convenient built in DataFrame.corr()

In [21]:

Copied!

TwoStations.WTEQ.corr()
TwoStations.WTEQ.corr()

Out[21]:

	356_CA_SNTL	BLK
356_CA_SNTL	1.000000	0.999056
BLK	0.999056	1.000000

These correlation values, along with the time series above, makes me think these are way too similar... no way these would agree this much even if the stations were right next to each other!

Let's see where they exist spatially¶

select the stations by index and reproject to UTM 11N
use contextily for a basemap

In [22]:

Copied!

TwoStations.stations
TwoStations.stations

Out[22]:

	name	network	elevation_m	latitude	longitude	state	HUC	mgrs	mountainRange	beginDate	endDate	csvData	geometry
code
356_CA_SNTL	Blue Lakes	SNOTEL	2458.821533	38.608002	-119.92437	California	180400120101	11SKC	Sierra Nevada	1980-10-01	2100-01-01	True	POINT (-119.92437 38.60800)
BLK	Blue Lakes	CCSS	2438.400000	38.613000	-119.93100	California	180400120101	11SKC	Sierra Nevada	2004-10-01	2024-04-02	True	POINT (-119.93100 38.61300)

In [23]:

Copied!

f, ax = plt.subplots(figsize=(7, 7))

TwoStations.stations.to_crs("EPSG:32611").loc[['356_CA_SNTL']].plot(ax=ax, color='red',label='356_CA_SNTL')
TwoStations.stations.to_crs("EPSG:32611").loc[['BLK']].plot(ax=ax, color='blue',label='BLK')

ax.set_xlim([244200, 245700])
ax.set_ylim([4276900, 4278700])

ctx.add_basemap(ax, crs="EPSG:32611", source=ctx.providers.Esri.WorldImagery)

ax.legend() # add labels here

f.tight_layout()
f, ax = plt.subplots(figsize=(7, 7))

TwoStations.stations.to_crs("EPSG:32611").loc[['356_CA_SNTL']].plot(ax=ax, color='red',label='356_CA_SNTL')
TwoStations.stations.to_crs("EPSG:32611").loc[['BLK']].plot(ax=ax, color='blue',label='BLK')

ax.set_xlim([244200, 245700])
ax.set_ylim([4276900, 4278700])

ctx.add_basemap(ax, crs="EPSG:32611", source=ctx.providers.Esri.WorldImagery)

ax.legend() # add labels here

f.tight_layout()

Interesting locations :) Based on correlation and location, I'm going to say these are the same! Wonder what those tiny differences are about...

Look's like some of the CCSS stations have "Natural Resources Conservation Service" as their operator¶

Let's check which stations these are!
Let's plot these shared stations in red, and not shared in blue

In [24]:

Copied!





csv = 'https://cdec.water.ca.gov/misc/SnowSensors.html'
response = requests.get(csv)
same_stations_df = pd.read_html(StringIO(response.content.decode('utf-8')))[0].set_index('ID').sort_index()
same_stations_df = same_stations_df[same_stations_df.nunique(axis=1) > 1]
same_stations_gdf = gpd.GeoDataFrame(same_stations_df, geometry=gpd.points_from_xy(same_stations_df['Longitude'], same_stations_df['Latitude']))
same_stations_gdf.crs = "EPSG:4326"
same_stations_gdf = same_stations_gdf[same_stations_gdf['Operator Agency']=='Natural Resources Conservation Service']
same_stations_gdf
csv = 'https://cdec.water.ca.gov/misc/SnowSensors.html'
response = requests.get(csv)
same_stations_df = pd.read_html(StringIO(response.content.decode('utf-8')))[0].set_index('ID').sort_index()
same_stations_df = same_stations_df[same_stations_df.nunique(axis=1) > 1]
same_stations_gdf = gpd.GeoDataFrame(same_stations_df, geometry=gpd.points_from_xy(same_stations_df['Longitude'], same_stations_df['Latitude']))
same_stations_gdf.crs = "EPSG:4326"
same_stations_gdf = same_stations_gdf[same_stations_gdf['Operator Agency']=='Natural Resources Conservation Service']
same_stations_gdf

Out[24]:

	Name	Elev (feet)	Latitude	Longitude	April 1 Avg (inches)	Operator Agency	geometry
ID
ADM	ADIN MOUNTAIN	6200	41.237	-120.792	13.6	Natural Resources Conservation Service	POINT (-120.79200 41.23700)
BLK	BLUE LAKES	8000	38.613	-119.931	33.1	Natural Resources Conservation Service	POINT (-119.93100 38.61300)
BMW	BIG MEADOWS (SCS)	8700	39.458	-119.946	25.7	Natural Resources Conservation Service	POINT (-119.94600 39.45800)
BSK	BURNSIDE LAKE	8129	38.719	-119.894	-9999.0	Natural Resources Conservation Service	POINT (-119.89400 38.71900)
CDP	CEDAR PASS	7100	41.583	-120.303	18.1	Natural Resources Conservation Service	POINT (-120.30300 41.58300)
CSL	CENT SIERRA SNOW LAB	6900	39.325	-120.367	33.6	Natural Resources Conservation Service	POINT (-120.36700 39.32500)
CWF	CROWDER FLAT	5100	41.893	-120.752	-9999.0	Natural Resources Conservation Service	POINT (-120.75200 41.89300)
CXS	CARSON PASS	8353	38.692	-120.002	-9999.0	Natural Resources Conservation Service	POINT (-120.00200 38.69200)
DSS	DISMAL SWAMP	7050	41.993	-120.165	29.2	Natural Resources Conservation Service	POINT (-120.16500 41.99300)
EBB	EBBETTS PASS	8700	38.561	-119.808	38.8	Natural Resources Conservation Service	POINT (-119.80800 38.56100)
EP5	ECHO PEAK 5	7800	38.849	-120.079	39.5	Natural Resources Conservation Service	POINT (-120.07900 38.84900)
FLL	FALLEN LEAF LAKE	6250	38.932	-120.056	7.0	Natural Resources Conservation Service	POINT (-120.05600 38.93200)
HGM	HAGANS MEADOW	8000	38.853	-119.940	16.5	Natural Resources Conservation Service	POINT (-119.94000 38.85300)
HVN	HEAVENLY VALLEY	8800	38.929	-119.917	28.1	Natural Resources Conservation Service	POINT (-119.91700 38.92900)
IDC	INDEPENDENCE CAMP	7000	39.453	-120.299	21.8	Natural Resources Conservation Service	POINT (-120.29900 39.45300)
IDP	INDEPENDENCE LAKE (SCS)	8450	39.435	-120.322	41.4	Natural Resources Conservation Service	POINT (-120.32200 39.43500)
INN	INDEPENDENCE CREEK	6500	39.494	-120.293	12.7	Natural Resources Conservation Service	POINT (-120.29300 39.49400)
LBD	LOBDELL LAKE	9200	38.440	-119.377	17.3	Natural Resources Conservation Service	POINT (-119.37700 38.44000)
LVM	LEAVITT MEADOWS	7200	38.305	-119.552	8.0	Natural Resources Conservation Service	POINT (-119.55200 38.30500)
LVT	LEAVITT LAKE	9600	38.282	-119.621	-9999.0	Natural Resources Conservation Service	POINT (-119.62100 38.28200)
MNT	MONITOR PASS	8350	38.670	-119.615	-9999.0	Natural Resources Conservation Service	POINT (-119.61500 38.67000)
MRL	MARLETTE LAKE	8000	39.173	-119.905	21.1	Natural Resources Conservation Service	POINT (-119.90500 39.17300)
MSK	MOUNT ROSE SKI AREA	8900	39.326	-119.902	38.5	Natural Resources Conservation Service	POINT (-119.90200 39.32600)
PSN	POISON FLAT	7900	38.501	-119.631	16.2	Natural Resources Conservation Service	POINT (-119.63100 38.50100)
RP2	RUBICON PEAK 2	7500	39.001	-120.140	29.1	Natural Resources Conservation Service	POINT (-120.14000 39.00100)
SDW	SUMMIT MEADOW	9313	38.398	-119.536	-9999.0	Natural Resources Conservation Service	POINT (-119.53600 38.39800)
SPS	SONORA PASS BRIDGE	8750	38.318	-119.601	26.0	Natural Resources Conservation Service	POINT (-119.60100 38.31800)
SPT	SPRATT CREEK	6150	38.666	-119.817	4.5	Natural Resources Conservation Service	POINT (-119.81700 38.66600)
SQV	SQUAW VALLEY GOLD COAST	8200	39.194	-120.276	46.5	Natural Resources Conservation Service	POINT (-120.27600 39.19400)
TCC	TAHOE CITY CROSS	6750	39.171	-120.155	16.0	Natural Resources Conservation Service	POINT (-120.15500 39.17100)
TK2	TRUCKEE 2	6400	39.300	-120.194	14.3	Natural Resources Conservation Service	POINT (-120.19400 39.30000)
VRG	VIRGINIA LAKES RIDGE	9300	38.077	-119.234	20.3	Natural Resources Conservation Service	POINT (-119.23400 38.07700)
WC3	WARD CREEK 3	6750	39.136	-120.219	39.4	Natural Resources Conservation Service	POINT (-120.21900 39.13600)

In [25]:

Copied!





f, ax = plt.subplots(figsize=(10, 10))

same_stations_gdf.to_crs("EPSG:32611").plot(
    ax=ax, color="red", label="CCSS station operated by NRCS"
)
ccss_stations_gdf = TwoStations.all_stations[
    TwoStations.all_stations["network"] == "CCSS"
].to_crs("EPSG:32611")
ccss_stations_gdf[~ccss_stations_gdf.index.isin(same_stations_gdf.index)].plot(
    ax=ax, color="blue", label="CCSS station not operated by NRCS"
)

ctx.add_basemap(ax, crs="EPSG:32611", source=ctx.providers.Esri.WorldImagery)

ax.legend()

f.tight_layout()
f, ax = plt.subplots(figsize=(10, 10))

same_stations_gdf.to_crs("EPSG:32611").plot(
    ax=ax, color="red", label="CCSS station operated by NRCS"
)
ccss_stations_gdf = TwoStations.all_stations[
    TwoStations.all_stations["network"] == "CCSS"
].to_crs("EPSG:32611")
ccss_stations_gdf[~ccss_stations_gdf.index.isin(same_stations_gdf.index)].plot(
    ax=ax, color="blue", label="CCSS station not operated by NRCS"
)

ctx.add_basemap(ax, crs="EPSG:32611", source=ctx.providers.Esri.WorldImagery)

ax.legend()

f.tight_layout()

There are 33 of these! These are likely also listed as SNOTEL stations...

Read a variable using a geodataframe: How extraordinary was the California 2023 snowpack?¶

the Sierra Nevada received a historic amount of snow in 2023
let's explore the magnitude of this season by comparing to the median snow pack

Let's bring in only CCSS stations¶

Filter the all_stations geodataframe with network == CCSS, and pass the filtered geodatframe to the get_data() function.

In [6]:

Copied!

ccssStations = easysnowdata.automatic_weather_stations.StationCollection()
ccssStations = easysnowdata.automatic_weather_stations.StationCollection()

Geodataframe with all stations has been added to the Station object. Please use the .all_stations attribute to access.
Use the .get_data(stations=geodataframe/string/list,variables=string/list,start_date=str,end_date=str) method to fetch data for specific stations and variables.

In [7]:

Copied!

ccss_only = ccssStations.all_stations[ccssStations.all_stations.network == "CCSS"]
ccss_only = ccssStations.all_stations[ccssStations.all_stations.network == "CCSS"]

In [9]:

Copied!

ccssStations.get_data(stations=ccss_only,variables="WTEQ")
ccssStations.get_data(stations=ccss_only,variables="WTEQ")

  0%|          | 0/130 [00:00<?, ?it/s]

100%|██████████| 130/130 [00:41<00:00,  3.17it/s]

WTEQ dataframe has been added to the Station object. Please use the .WTEQ attribute to access the dataframe.
Full ['WTEQ'] dataset has been added to the station object. Please use the .data attribute to access the dataset.

Let's check out the percent of normal snow depth for April 1st, 2023¶

let's add a DOWY column to the ccssStations.WTEQ dataframe like before, groupby DOWY, apply a median, and divide the observation on April 1st, 2023 by the DOWY 183 (April 1) median
- add these percent normal values back to ccssStations.stations
will need to do some slight cleaning to get rid of NaNs, Infs, physically impossible values...

In [10]:

Copied!





ccssStations.WTEQ["DOWY"] = ccssStations.WTEQ.index.map(
    easysnowdata.utils.datetime_to_DOWY
)
ccssStations.WTEQ["WY"] = ccssStations.WTEQ.index.map(easysnowdata.utils.datetime_to_WY)
ccssStations.WTEQ["DOWY"] = ccssStations.WTEQ.index.map(
    easysnowdata.utils.datetime_to_DOWY
)
ccssStations.WTEQ["WY"] = ccssStations.WTEQ.index.map(easysnowdata.utils.datetime_to_WY)

In [11]:

Copied!





ccssStations.stations.loc[:, "april2023_percent_norm"] = 100 * (
    ccssStations.WTEQ["2023-04-01":"2023-04-01"].squeeze()
    / ccssStations.WTEQ.groupby("DOWY").median().loc[183]
)
ccssStations.stations.loc[:, "april2023_percent_norm"] = 100 * (
    ccssStations.WTEQ["2023-04-01":"2023-04-01"].squeeze()
    / ccssStations.WTEQ.groupby("DOWY").median().loc[183]
)

/home/eric/miniconda3/envs/easysnowdata/lib/python3.12/site-packages/geopandas/geodataframe.py:1525: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)

In [12]:

Copied!





ccssStations.stations = ccssStations.stations.dropna(subset="april2023_percent_norm")
ccssStations.stations = ccssStations.stations[
    ccssStations.stations["april2023_percent_norm"] < 7500
]
ccssStations.stations = ccssStations.stations.dropna(subset="april2023_percent_norm")
ccssStations.stations = ccssStations.stations[
    ccssStations.stations["april2023_percent_norm"] < 7500
]

View the values in order...¶

use .sort_values() function to get an idea of percent normal snow depth values
DataFrame.head() and DataFrame.tail() to see highest and lowest percent normal snow depth values

In [13]:

Copied!

ccssStations.stations.sort_values("april2023_percent_norm", ascending=False).head()
ccssStations.stations.sort_values("april2023_percent_norm", ascending=False).head()

Out[13]:

	name	network	elevation_m	latitude	longitude	state	HUC	mgrs	mountainRange	beginDate	endDate	csvData	geometry	april2023_percent_norm
code
BCH	Beach Meadows	CCSS	2331.7200	36.126095	-118.293457	California	180300010403	11SLV	Sierra Nevada	2004-10-01	2024-04-03	True	POINT (-118.29346 36.12609)	7365.789474
FLL	Fallen Leaf Lake	CCSS	1905.0000	38.932000	-120.056000	California	160501010401	10SGJ	None	2004-10-01	2024-04-03	True	POINT (-120.05600 38.93200)	4525.764192
LVM	Leavitt Meadows	CCSS	2194.5600	38.305000	-119.552000	California	160503020104	11SKC	Sierra Nevada	2004-10-01	2024-04-03	True	POINT (-119.55200 38.30500)	706.728704
GNF	Giant Forest (Usace)	CCSS	1950.7200	36.562867	-118.770283	California	180300070402	11SLA	Sierra Nevada	NaT	2024-04-03	True	POINT (-118.77028 36.56287)	700.401606
DPO	Devils Postpile	CCSS	2307.0312	37.629410	-119.084671	California	180400060402	11SLB	Sierra Nevada	2007-10-01	2024-04-03	True	POINT (-119.08467 37.62941)	642.842557

In [14]:

Copied!

ccssStations.stations.sort_values("april2023_percent_norm", ascending=False).tail()
ccssStations.stations.sort_values("april2023_percent_norm", ascending=False).tail()

Out[14]:

	name	network	elevation_m	latitude	longitude	state	HUC	mgrs	mountainRange	beginDate	endDate	csvData	geometry	april2023_percent_norm
code
CDP	Cedar Pass	CCSS	2164.0800	41.583000	-120.303000	California	180200020603	10TGM	Great Basin Ranges	NaT	2024-04-03	True	POINT (-120.30300 41.58300)	165.316340
SQV	Palisades Tahoe Snotel	CCSS	2499.3600	39.194000	-120.276000	California	160501020202	10SGJ	Sierra Nevada	2004-10-01	2024-04-03	True	POINT (-120.27600 39.19400)	165.202219
DSS	Dismal Swamp	CCSS	2148.8400	41.993000	-120.165000	California	171200070304	10TGM	Great Basin Ranges	NaT	2024-04-03	True	POINT (-120.16500 41.99300)	148.136711
LLP	Lower Lassen Peak	CCSS	2541.4224	40.466602	-121.508110	California	180201560301	10TFK	Cascade Range	2004-10-01	2024-04-03	True	POINT (-121.50811 40.46660)	146.590403
SHM	Shimmy Lake	CCSS	1950.7200	41.005299	-122.801598	California	180102110501	10TEL	Klamath Mountains	2005-10-01	2024-01-23	True	POINT (-122.80160 41.00530)	123.362639

Plot the percent of normal snow depths for April 1st, 2023¶

add elevation plot to the right with horizontal dashed line at 100% (normal)

In [15]:

Copied!





f, ax = plt.subplots(1, 2, figsize=(12, 9), gridspec_kw={"width_ratios": [2, 1]})

ccssStations.stations.to_crs("EPSG:32611").plot(
    ax=ax[0],
    column="april2023_percent_norm",
    legend=True,
    vmin=0,
    vmax=500,
    cmap="gnuplot",
    edgecolor="black",
    s=100,
)

ctx.add_basemap(
    ax[0], crs="EPSG:32611", source=ctx.providers.Esri.WorldImagery, attribution=""
)

ax[0].set_title("station locations")


ax[1].scatter(
    ccssStations.stations.elevation_m,
    ccssStations.stations.april2023_percent_norm,
    c=ccssStations.stations.april2023_percent_norm,
    cmap="gnuplot",
    vmin=0,
    vmax=500,
    edgecolors="black",
    s=100,
)
ax[1].axhline(y=100, linestyle="--", color="black")

ax[1].set_title("station percent normal snow depth vs elevation")
ax[1].set_xlabel("elevation [m]")
ax[1].set_ylabel("percent of normal snow depth [%]")

ax[1].set_ylim([0, 1000])

f.suptitle(f"California percent normal snow depth for April 1st, 2023")
f.tight_layout()
f, ax = plt.subplots(1, 2, figsize=(12, 9), gridspec_kw={"width_ratios": [2, 1]})

ccssStations.stations.to_crs("EPSG:32611").plot(
    ax=ax[0],
    column="april2023_percent_norm",
    legend=True,
    vmin=0,
    vmax=500,
    cmap="gnuplot",
    edgecolor="black",
    s=100,
)

ctx.add_basemap(
    ax[0], crs="EPSG:32611", source=ctx.providers.Esri.WorldImagery, attribution=""
)

ax[0].set_title("station locations")


ax[1].scatter(
    ccssStations.stations.elevation_m,
    ccssStations.stations.april2023_percent_norm,
    c=ccssStations.stations.april2023_percent_norm,
    cmap="gnuplot",
    vmin=0,
    vmax=500,
    edgecolors="black",
    s=100,
)
ax[1].axhline(y=100, linestyle="--", color="black")

ax[1].set_title("station percent normal snow depth vs elevation")
ax[1].set_xlabel("elevation [m]")
ax[1].set_ylabel("percent of normal snow depth [%]")

ax[1].set_ylim([0, 1000])

f.suptitle(f"California percent normal snow depth for April 1st, 2023")
f.tight_layout()

Looks like a lot more snow than usual to me!

Read a variable from all CSVs by looping over the entire geodataframe: Has the date of maximum SWE changed in the Western US?¶

Snowmelt timing can be an important indicator of regional climate change, and the snowmelt timing of the Western U.S. is projected to shift earlier in the year by up to 1 month by 2050 (Barnett et al., 2005; Stewart, 2009), with a corresponding snowpack loss equivalent to a 25% decrease in streamflow from snowmelt (Siirila-Woodburn et al., 2021).
Let's explore trends in maximum SWE timing using SNOTEL and CCSS stations

This time we will get SWE at every station¶

might take a minute to load in almost 1000 CSVs...
store SWE for all stations in all_stations_swe_df

In [35]:

Copied!

%%time 

WesternUS = easysnowdata.automatic_weather_stations.StationCollection()
WesternUS.get_data(WesternUS.all_stations,variables="WTEQ")
%%time 

WesternUS = easysnowdata.automatic_weather_stations.StationCollection()
WesternUS.get_data(WesternUS.all_stations,variables="WTEQ")

Geodataframe with all stations has been added to the Station object. Please use the .all_stations attribute to access.
Use the .get_data(stations=geodataframe/string/list,variables=string/list,start_date=str,end_date=str) method to fetch data for specific stations and variables.

100%|██████████| 969/969 [09:40<00:00,  1.67it/s]

WTEQ dataframe has been added to the Station object. Please use the .WTEQ attribute to access the dataframe.
Full ['WTEQ'] dataset has been added to the station object. Please use the .data attribute to access the dataset.
CPU times: user 14.4 s, sys: 1.24 s, total: 15.6 s
Wall time: 9min 44s

In [36]:

Copied!

all_stations_swe_df = WesternUS.WTEQ
all_stations_swe_df
all_stations_swe_df = WesternUS.WTEQ
all_stations_swe_df

Out[36]:

	301_CA_SNTL	907_UT_SNTL	916_MT_SNTL	1267_AK_SNTL	908_WA_SNTL	1189_AK_SNTL	1062_AK_SNTL	1070_AK_SNTL	302_OR_SNTL	1000_OR_SNTL	...	QUA	LLP	FOR	GEM	WTM	SLT	SLI	SLK	STL	TMR
datetime
1909-04-13	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1954-12-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0635	NaN
1954-12-02	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0635	NaN
1954-12-03	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.0635	NaN
1954-12-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.1143	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2024-03-29	0.4267	0.2896	0.3175	0.2464	0.8839	0.1448	0.3251	0.3327	0.3937	1.1354	...	NaN	2.4755	0.4846	NaN	0.5893	1.0119	0.8110	0.3155	0.6106	0.7503
2024-03-30	0.4318	0.2896	0.3175	0.2489	0.8865	0.1448	0.3251	0.3378	0.3962	1.1379	...	NaN	2.5098	0.4953	NaN	0.6017	1.0363	0.8364	0.3338	0.6243	0.7777
2024-03-31	0.4318	0.2972	0.3200	0.2692	0.8865	0.1448	0.3327	0.3454	0.3988	1.1379	...	NaN	2.5240	0.5080	NaN	0.6078	1.0516	0.8407	0.3399	0.6314	0.7930
2024-04-01	0.4191	0.3048	0.3353	0.2692	0.8839	0.1448	0.3378	0.3556	0.3962	1.1328	...	NaN	2.5250	0.5182	NaN	0.6081	1.0668	0.8461	0.3459	0.6342	0.7960
2024-04-02	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	2.5385	0.5182	NaN	0.6081	1.0729	0.8471	0.3459	0.6345	0.7991

23842 rows × 969 columns

Prepare the data¶

prepare and clean all_stations_swe_df:
- filter to start in WY 1967 (the first year with more than one station) and end with WY 2023
- add water year column
- remove any negative SWE measurements
- for consistency with similar analyses, following the methodology of Evan 2019 and US EPA 2021:
  - set any change of greater magnitude than 20cm to NaN
  - if there are more than 30 days of missing data during November-April, don't use that water year
  - if SWE is zero during every day of Jan/Feb/March, don't use that water year
  - only use stations with continuous data from WY 1982
    - i do this only for the all data bulk calculation so we have a common reference frame, but for station level analysis i instead impose a 30 year or longer record rule

In [37]:

Copied!

all_stations_swe_df = all_stations_swe_df.loc[slice("1966-10-01", "2023-09-30")]
all_stations_swe_df = all_stations_swe_df.loc[slice("1966-10-01", "2023-09-30")]

In [38]:

Copied!

all_stations_swe_df["WY"] = all_stations_swe_df.index.map(easysnowdata.utils.datetime_to_WY)
all_stations_swe_df["WY"] = all_stations_swe_df.index.map(easysnowdata.utils.datetime_to_WY)

/tmp/ipykernel_25570/369461795.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_stations_swe_df["WY"] = all_stations_swe_df.index.map(easysnowdata.utils.datetime_to_WY)

In [39]:

Copied!

all_stations_swe_df = all_stations_swe_df[all_stations_swe_df >= 0]
all_stations_swe_df = all_stations_swe_df[all_stations_swe_df >= 0]

In [40]:

Copied!

all_stations_swe_diff_df = all_stations_swe_df.diff().abs()
all_stations_swe_df[all_stations_swe_diff_df > 0.20] = np.nan
all_stations_swe_diff_df = all_stations_swe_df.diff().abs()
all_stations_swe_df[all_stations_swe_diff_df > 0.20] = np.nan

In [41]:

Copied!





def check_missing_data(group):
    nov_to_apr_mask = group.index.month.isin([11, 12, 1, 2, 3, 4])
    filtered_group = group[nov_to_apr_mask]
    missing_data_counts = filtered_group.isnull().sum()
    columns_to_nan = missing_data_counts[missing_data_counts > 30].index
    group[columns_to_nan] = np.nan
    return group


def check_zero_swe(group):
    for month in [1, 2, 3]:
        month_mask = group.index.month == month
        zero_swe_columns = group[month_mask].eq(0).all()
        columns_to_nan = zero_swe_columns[zero_swe_columns].index
        group[columns_to_nan] = np.nan
    return group
def check_missing_data(group):
    nov_to_apr_mask = group.index.month.isin([11, 12, 1, 2, 3, 4])
    filtered_group = group[nov_to_apr_mask]
    missing_data_counts = filtered_group.isnull().sum()
    columns_to_nan = missing_data_counts[missing_data_counts > 30].index
    group[columns_to_nan] = np.nan
    return group


def check_zero_swe(group):
    for month in [1, 2, 3]:
        month_mask = group.index.month == month
        zero_swe_columns = group[month_mask].eq(0).all()
        columns_to_nan = zero_swe_columns[zero_swe_columns].index
        group[columns_to_nan] = np.nan
    return group

In [42]:

Copied!





all_stations_swe_df = (
    all_stations_swe_df.groupby("WY").apply(check_missing_data).droplevel(0)
)
all_stations_swe_df = (
    all_stations_swe_df.groupby("WY").apply(check_zero_swe).droplevel(0)
)
all_stations_swe_df
all_stations_swe_df = (
    all_stations_swe_df.groupby("WY").apply(check_missing_data).droplevel(0)
)
all_stations_swe_df = (
    all_stations_swe_df.groupby("WY").apply(check_zero_swe).droplevel(0)
)
all_stations_swe_df

/tmp/ipykernel_25570/72793946.py:2: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
  all_stations_swe_df.groupby("WY").apply(check_missing_data).droplevel(0)
/tmp/ipykernel_25570/72793946.py:5: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
  all_stations_swe_df.groupby("WY").apply(check_zero_swe).droplevel(0)

Out[42]:

	301_CA_SNTL	907_UT_SNTL	916_MT_SNTL	1267_AK_SNTL	908_WA_SNTL	1189_AK_SNTL	1062_AK_SNTL	1070_AK_SNTL	302_OR_SNTL	1000_OR_SNTL	...	LLP	FOR	GEM	WTM	SLT	SLI	SLK	STL	TMR	WY
datetime
1966-10-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1967.0
1966-10-02	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1967.0
1966-10-03	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1967.0
1966-10-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1967.0
1966-10-05	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	1967.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2023-09-26	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0051	0.0	...	0.0	0.0	NaN	0.0	0.0	NaN	0.0	NaN	0.0	2023.0
2023-09-27	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0051	0.0	...	0.0	0.0	NaN	0.0	0.0	NaN	0.0	NaN	0.0	2023.0
2023-09-28	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0051	0.0	...	0.0	0.0	NaN	0.0	0.0	NaN	0.0	NaN	0.0	2023.0
2023-09-29	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0051	0.0	...	0.0	0.0	NaN	0.0	0.0	NaN	0.0	NaN	0.0	2023.0
2023-09-30	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0051	0.0	...	0.0	0.0	NaN	0.0	0.0	NaN	0.0	NaN	0.0	2023.0

20763 rows × 970 columns

Calculate and plot trend in DOWY of max SWE for all data¶

use DataFrame.groupby() to find the date of max SWE per year and convert to a DOWY value, store in all_stations_dowy_max_swe_df
calculate slope and intercept of linear regression using np.polyfit() on a melted version of
obtain statistics for each year using DataFrame.describe()

In [43]:

Copied!

all_stations_dowy_max_swe_df = (
    all_stations_swe_df.groupby("WY").idxmax().map(easysnowdata.utils.datetime_to_DOWY)
)
all_stations_dowy_max_swe_df = (
    all_stations_swe_df.groupby("WY").idxmax().map(easysnowdata.utils.datetime_to_DOWY)
)

In [44]:

Copied!





stations_before_WY1982 = WesternUS.all_stations[WesternUS.all_stations.beginDate < "1981-10-01"]
dowy_max_swe_melted = pd.melt(
    all_stations_dowy_max_swe_df.reset_index(), id_vars="WY"
).dropna()
dowy_max_swe_melted_before_WY1982 = dowy_max_swe_melted[
    dowy_max_swe_melted["variable"].isin(stations_before_WY1982.index)
]
slope, intercept = np.polyfit(
    dowy_max_swe_melted_before_WY1982.WY, dowy_max_swe_melted_before_WY1982.value, 1
)
lr_years = np.unique(dowy_max_swe_melted.WY)
stations_before_WY1982 = WesternUS.all_stations[WesternUS.all_stations.beginDate < "1981-10-01"]
dowy_max_swe_melted = pd.melt(
    all_stations_dowy_max_swe_df.reset_index(), id_vars="WY"
).dropna()
dowy_max_swe_melted_before_WY1982 = dowy_max_swe_melted[
    dowy_max_swe_melted["variable"].isin(stations_before_WY1982.index)
]
slope, intercept = np.polyfit(
    dowy_max_swe_melted_before_WY1982.WY, dowy_max_swe_melted_before_WY1982.value, 1
)
lr_years = np.unique(dowy_max_swe_melted.WY)

In [45]:

Copied!

describe = all_stations_dowy_max_swe_df.T.describe()
describe
describe = all_stations_dowy_max_swe_df.T.describe()
describe

Out[45]:

WY	1967.0	1968.0	1969.0	1970.0	1971.0	1972.0	1973.0	1974.0	1975.0	1976.0	...	2014.0	2015.0	2016.0	2017.0	2018.0	2019.0	2020.0	2021.0	2022.0	2023.0
count	15.000000	23.00000	31.000000	39.000000	40.000000	40.000000	48.000000	50.000000	53.000000	55.000000	...	891.000000	870.000000	896.000000	926.000000	922.000000	929.000000	932.000000	928.000000	928.000000	920.000000
mean	211.600000	196.73913	185.741935	197.641026	188.050000	186.375000	195.729167	195.720000	213.000000	194.800000	...	179.783389	156.452874	175.334821	179.018359	183.367679	185.812702	182.918455	180.346983	176.734914	192.546739
std	13.113788	19.17199	14.906302	25.243537	29.203047	33.502918	18.525129	17.996644	11.560876	18.673015	...	28.288029	36.552523	27.632212	25.712310	18.666580	19.338004	26.485426	18.764201	36.686443	13.905714
min	188.000000	170.00000	158.000000	148.000000	106.000000	94.000000	134.000000	161.000000	187.000000	154.000000	...	55.000000	33.000000	88.000000	104.000000	84.000000	95.000000	60.000000	119.000000	89.000000	86.000000
25%	208.500000	179.50000	176.000000	167.000000	179.000000	159.500000	184.750000	186.250000	207.000000	184.500000	...	164.000000	133.000000	167.000000	159.000000	175.000000	170.000000	183.000000	175.000000	162.000000	187.000000
50%	215.000000	204.00000	182.000000	210.000000	191.500000	202.000000	197.000000	193.000000	216.000000	202.000000	...	188.000000	156.000000	181.000000	180.000000	180.000000	186.000000	190.000000	178.000000	174.000000	189.000000
75%	218.500000	209.00000	191.500000	212.500000	209.250000	211.250000	207.000000	200.000000	221.000000	211.000000	...	190.000000	177.000000	185.000000	202.000000	200.000000	199.000000	200.000000	189.250000	205.000000	205.000000
max	228.000000	238.00000	219.000000	227.000000	235.000000	226.000000	225.000000	238.000000	240.000000	218.000000	...	238.000000	239.000000	232.000000	235.000000	224.000000	247.000000	229.000000	228.000000	250.000000	234.000000

8 rows × 57 columns

In [46]:

Copied!





f, ax = plt.subplots(
    2, 1, figsize=(10, 6), sharex=True, gridspec_kw={"height_ratios": [3, 2]}
)

describe.loc["50%"].plot(ax=ax[0], label="median")

ax[0].fill_between(
    describe.columns, describe.loc["25%"], describe.loc["75%"], alpha=0.3, label="IQR"
)
ax[0].plot(
    lr_years,
    np.array(lr_years) * slope + intercept,
    "k--",
    label=f"Trend (slope={slope:.2f} Days/Year)",
)
# ax[0].set_xlim([1967,2023])

ax[0].legend()

describe.loc["count"].plot(ax=ax[1])


ax[0].set_title("Trend in DOWY of max SWE")
ax[1].set_title("Number of active stations")
f, ax = plt.subplots(
    2, 1, figsize=(10, 6), sharex=True, gridspec_kw={"height_ratios": [3, 2]}
)

describe.loc["50%"].plot(ax=ax[0], label="median")

ax[0].fill_between(
    describe.columns, describe.loc["25%"], describe.loc["75%"], alpha=0.3, label="IQR"
)
ax[0].plot(
    lr_years,
    np.array(lr_years) * slope + intercept,
    "k--",
    label=f"Trend (slope={slope:.2f} Days/Year)",
)
# ax[0].set_xlim([1967,2023])

ax[0].legend()

describe.loc["count"].plot(ax=ax[1])


ax[0].set_title("Trend in DOWY of max SWE")
ax[1].set_title("Number of active stations")

Out[46]:

Text(0.5, 1.0, 'Number of active stations')

Check out trend at each station seperately¶

calculate the linear trend in DOWY of max SWE only for stations with over 30 years of data, store in our original all_stations_gdf
- out of 971 stations with SWE data, 645 meet this criteria
plot the trend for each station and plot the trends on a histogram

In [47]:

Copied!





WesternUS.all_stations.loc[:, "dowy_max_swe_trend"] = all_stations_dowy_max_swe_df.apply(
    lambda y: (
        np.polyfit(y.dropna().index.values, y.dropna(), 1)[0]
        if len(y.dropna()) >= 30
        else np.nan
    )
)
WesternUS.all_stations.loc[:, "dowy_max_swe_trend"] = all_stations_dowy_max_swe_df.apply(
    lambda y: (
        np.polyfit(y.dropna().index.values, y.dropna(), 1)[0]
        if len(y.dropna()) >= 30
        else np.nan
    )
)

In [48]:

Copied!





f, ax = plt.subplots(figsize=(10, 5.5))

WesternUS.all_stations.plot(
    column="dowy_max_swe_trend",
    ax=ax,
    legend=True,
    cmap="RdBu_r",
    edgecolor="k",
    markersize=20,
    vmin=-1,
    vmax=1,
    legend_kwds={"label": "[days/year]\n(Red is later in the year, blue is earlier)"},
)

ctx.add_basemap(
    ax, crs="EPSG:4326", source=ctx.providers.Esri.WorldImagery, attribution=""
)

ax.set_title("Trend in DOWY of max SWE\n(only stations with 30+ years of data)")
f, ax = plt.subplots(figsize=(10, 5.5))

WesternUS.all_stations.plot(
    column="dowy_max_swe_trend",
    ax=ax,
    legend=True,
    cmap="RdBu_r",
    edgecolor="k",
    markersize=20,
    vmin=-1,
    vmax=1,
    legend_kwds={"label": "[days/year]\n(Red is later in the year, blue is earlier)"},
)

ctx.add_basemap(
    ax, crs="EPSG:4326", source=ctx.providers.Esri.WorldImagery, attribution=""
)

ax.set_title("Trend in DOWY of max SWE\n(only stations with 30+ years of data)")

Out[48]:

Text(0.5, 1.0, 'Trend in DOWY of max SWE\n(only stations with 30+ years of data)')

In [49]:

Copied!

f, ax = plt.subplots()

ax.hist(WesternUS.all_stations["dowy_max_swe_trend"], bins=50)

ax.axvline(x=0, color="red")

ax.set_xlim([-1.5, 1.5])

ax.set_xlabel("trend [days/year]")
ax.set_ylabel("count")
ax.set_title("Distribution of trends in DOWY of max SWE")
f, ax = plt.subplots()

ax.hist(WesternUS.all_stations["dowy_max_swe_trend"], bins=50)

ax.axvline(x=0, color="red")

ax.set_xlim([-1.5, 1.5])

ax.set_xlabel("trend [days/year]")
ax.set_ylabel("count")
ax.set_title("Distribution of trends in DOWY of max SWE")

Out[49]:

Text(0.5, 1.0, 'Distribution of trends in DOWY of max SWE')

Let's analyze these trends by mountain range¶

we can DataFrame.groupby() our geodataframe by mountain range to calculate mountain range specific stats, store in mountain_range_trend_df
mountain ranges with more stations (and more spatial coverage) are probably more trustworthy

In [50]:

Copied!





mountain_range_count = (
    WesternUS.all_stations.dropna().groupby("mountainRange")["dowy_max_swe_trend"].count()
)
mountain_range_median = (
    WesternUS.all_stations.dropna().groupby("mountainRange")["dowy_max_swe_trend"].median()
)
mountain_range_mean = (
    WesternUS.all_stations.dropna().groupby("mountainRange")["dowy_max_swe_trend"].mean()
)
mountain_range_std = (
    WesternUS.all_stations.dropna().groupby("mountainRange")["dowy_max_swe_trend"].std()
)
mountain_range_count = (
    WesternUS.all_stations.dropna().groupby("mountainRange")["dowy_max_swe_trend"].count()
)
mountain_range_median = (
    WesternUS.all_stations.dropna().groupby("mountainRange")["dowy_max_swe_trend"].median()
)
mountain_range_mean = (
    WesternUS.all_stations.dropna().groupby("mountainRange")["dowy_max_swe_trend"].mean()
)
mountain_range_std = (
    WesternUS.all_stations.dropna().groupby("mountainRange")["dowy_max_swe_trend"].std()
)

In [51]:

Copied!





mountain_range_trend_df = pd.concat(
    [
        mountain_range_count,
        mountain_range_median,
        mountain_range_mean,
        mountain_range_std,
    ],
    axis=1,
)
mountain_range_trend_df.columns = ["station_count", "median", "mean", "std"]
mountain_range_trend_df = pd.concat(
    [
        mountain_range_count,
        mountain_range_median,
        mountain_range_mean,
        mountain_range_std,
    ],
    axis=1,
)
mountain_range_trend_df.columns = ["station_count", "median", "mean", "std"]

In [52]:

Copied!

mountain_range_trend_df
mountain_range_trend_df

Out[52]:

	station_count	median	mean	std
mountainRange
Alaska Intermountain Ranges	5	-0.298031	-0.266592	0.112821
Cascade Range	63	0.072125	0.138312	0.346018
Central Montana Rocky Mountains	48	-0.083318	-0.100606	0.193399
Colorado Plateau	37	-0.282213	-0.348372	0.205635
Columbia Mountains	7	0.143695	0.082937	0.164431
Columbia Plateau	24	-0.138161	-0.029927	0.309255
Great Basin Ranges	37	-0.209381	-0.238700	0.206061
Greater Yellowstone Rockies	51	-0.044327	-0.064588	0.175889
Idaho-Bitterroot Rocky Mountains	57	-0.036092	-0.021299	0.242944
Klamath Mountains	5	-0.122516	0.019894	0.407079
Olympic Mountains	1	0.267235	0.267235	NaN
Oregon Coast Range	1	0.163264	0.163264	NaN
Sierra Nevada	81	-0.177622	-0.176293	0.283223
South-Central Alaska	7	-0.073497	-0.029532	0.230334
Southern Rocky Mountains	90	-0.275773	-0.277181	0.258862
Southwest Basins and Ranges	4	-0.752793	-0.740468	0.252450
Western Rocky Mountains	58	-0.191897	-0.171475	0.199642

In [53]:

Copied!





f, ax = plt.subplots(1, 2, sharey=True, gridspec_kw={"width_ratios": [1, 3]})

mountain_range_trend_df["station_count"].plot.barh(ax=ax[0])
mountain_range_trend_df["median"].plot.barh(ax=ax[1], cmap="RdBu")


ax[0].set_xlabel("[#]")
ax[1].set_xlabel("[days/year]")
ax[0].set_ylabel("")
ax[0].set_title("station count")
ax[1].set_title("trend")

f.suptitle("Trend in DOWY of max SWE by mountain range")
f, ax = plt.subplots(1, 2, sharey=True, gridspec_kw={"width_ratios": [1, 3]})

mountain_range_trend_df["station_count"].plot.barh(ax=ax[0])
mountain_range_trend_df["median"].plot.barh(ax=ax[1], cmap="RdBu")


ax[0].set_xlabel("[#]")
ax[1].set_xlabel("[days/year]")
ax[0].set_ylabel("")
ax[0].set_title("station count")
ax[1].set_title("trend")

f.suptitle("Trend in DOWY of max SWE by mountain range")

Out[53]:

Text(0.5, 0.98, 'Trend in DOWY of max SWE by mountain range')

Let's visualize this on a map¶

add spatial information to mountain_range_trend_df using DataFrame.join() with mountain range geometries from GMBA Mountain Inventory v2
ceate both a static plot with counts and medians and create an interactive plot so we can explore the trends across mountain ranges and stations

In [54]:

Copied!





url = (
    f"https://data.earthenv.org/mountains/standard/GMBA_Inventory_v2.0_standard_300.zip"
)
gmba_gdf = gpd.read_file("zip+" + url)
url = (
    f"https://data.earthenv.org/mountains/standard/GMBA_Inventory_v2.0_standard_300.zip"
)
gmba_gdf = gpd.read_file("zip+" + url)

In [55]:

Copied!

mountain_range_trend_gdf = gpd.GeoDataFrame(
    mountain_range_trend_df.join(gmba_gdf[["MapName", "geometry"]].set_index("MapName"))
)
mountain_range_trend_gdf = gpd.GeoDataFrame(
    mountain_range_trend_df.join(gmba_gdf[["MapName", "geometry"]].set_index("MapName"))
)

In [56]:

Copied!





f, ax = plt.subplots(1, 2, figsize=(14, 7))

mountain_range_trend_gdf.plot(
    ax=ax[0],
    column="station_count",
    vmin=0,
    vmax=100,
    cmap="viridis",
    legend=True,
    edgecolor="k",
    legend_kwds={"label": "[#]"},
)
mountain_range_trend_gdf.plot(
    ax=ax[1],
    column="median",
    vmin=-0.3,
    vmax=0.3,
    cmap="RdBu_r",
    legend=True,
    edgecolor="k",
    legend_kwds={"label": "[days/year]\n(Red is later in the year, blue is earlier)"},
)

for axs in ax:
    ctx.add_basemap(
        ax=axs,
        crs=mountain_range_trend_gdf.crs,
        source=ctx.providers.Esri.WorldImagery,
        attribution=False,
    )
    ctx.add_basemap(
        ax=axs,
        crs=mountain_range_trend_gdf.crs,
        source=ctx.providers.Esri.WorldImagery,
        attribution=False,
    )
    axs.set_xlim([-125, -104])
    axs.set_ylim([27, 55])

ax[0].set_title("count")
ax[1].set_title("trend")

f.suptitle(
    "Trend in DOWY of max SWE by mountain range\n(only stations with 30+ years of data)"
)
f, ax = plt.subplots(1, 2, figsize=(14, 7))

mountain_range_trend_gdf.plot(
    ax=ax[0],
    column="station_count",
    vmin=0,
    vmax=100,
    cmap="viridis",
    legend=True,
    edgecolor="k",
    legend_kwds={"label": "[#]"},
)
mountain_range_trend_gdf.plot(
    ax=ax[1],
    column="median",
    vmin=-0.3,
    vmax=0.3,
    cmap="RdBu_r",
    legend=True,
    edgecolor="k",
    legend_kwds={"label": "[days/year]\n(Red is later in the year, blue is earlier)"},
)

for axs in ax:
    ctx.add_basemap(
        ax=axs,
        crs=mountain_range_trend_gdf.crs,
        source=ctx.providers.Esri.WorldImagery,
        attribution=False,
    )
    ctx.add_basemap(
        ax=axs,
        crs=mountain_range_trend_gdf.crs,
        source=ctx.providers.Esri.WorldImagery,
        attribution=False,
    )
    axs.set_xlim([-125, -104])
    axs.set_ylim([27, 55])

ax[0].set_title("count")
ax[1].set_title("trend")

f.suptitle(
    "Trend in DOWY of max SWE by mountain range\n(only stations with 30+ years of data)"
)

Out[56]:

Text(0.5, 0.98, 'Trend in DOWY of max SWE by mountain range\n(only stations with 30+ years of data)')

In [57]:

Copied!





m = mountain_range_trend_gdf.explore(
    column="median", cmap="RdBu_r", vmin=-0.5, vmax=0.5
)
WesternUS.all_stations.astype(dict(beginDate=str, endDate=str)).explore(
    m=m, column="dowy_max_swe_trend", cmap="RdBu_r", vmin=-0.3, vmax=0.3
)
m = mountain_range_trend_gdf.explore(
    column="median", cmap="RdBu_r", vmin=-0.5, vmax=0.5
)
WesternUS.all_stations.astype(dict(beginDate=str, endDate=str)).explore(
    m=m, column="dowy_max_swe_trend", cmap="RdBu_r", vmin=-0.3, vmax=0.3
)

Out[57]:

Make this Notebook Trusted to load map: File -> Trust Notebook

Looks like these trends are different by region, but relatively consistent within region. The majority of regions show the timing of maximum SWE happening earlier in the year, with notable exceptions being mountain ranges in the Pacific Northwest which show a reverse trend with smaller magnitude. Also of importance is the number of stations and their spatial dsitribution in each region, as 2 of the 4 regions (Olympic Mountains and Oregon Coast Range) showing the timing of maximum SWE happening later in the year only have one station each with a 30+ year record.

What if we want to check out all available daily variables at all SNOTEL and CCSS stations for all time?¶

Let's check the data out as an xarray object!

In [3]:

Copied!

AllStations = easysnowdata.automatic_weather_stations.StationCollection()
AllStations = easysnowdata.automatic_weather_stations.StationCollection()

Geodataframe with all stations has been added to the Station object. Please use the .all_stations attribute to access.
Use the .get_data(stations=geodataframe/string/list,variables=string/list,start_date=str,end_date=str) method to fetch data for specific stations and variables.

In [4]:

Copied!

AllStations.get_entire_data_archive()
AllStations.get_entire_data_archive()

Downloading compressed data to a temporary directory (/tmp/all_station_data.tar.lzma)...
Decompressing data...
Creating xarray.Dataset from the uncompressed data...
Done! Entire archive dataset has been added to the station object. Please use the .entire_data_archive attribute to access.

In [ ]:

Automatic weather station examples¶

View all SNOTEL & CCSS stations¶

Get an up to date GeoDataFrame of all active SNOTEL and CCSS stations¶

Use geopandas GeoDataFrame.explore() on the all_stations geodataframe to interactively view the stations¶

Get data for a singular site: Is our winter on track with the historical record at the Paradise, WA SNOTEL station?¶

Select a station code (which you can find in this interactive plot, or by other means)¶

Try a simple plot of snow depth and SWE¶

Try a more complex plot that shows current snow depth against statistics calculated from the entire time series for each day of water year¶

Read a variable from a list of stations: Does the SNOTEL network and CCSS network list the same station?¶

Create a list of the stations we are interested in and pass it in to our get_data() function¶

Plot the two stations on the same axis¶

These look oddly similar... let's check out their correlation¶

Let's see where they exist spatially¶

Look's like some of the CCSS stations have "Natural Resources Conservation Service" as their operator¶

Read a variable using a geodataframe: How extraordinary was the California 2023 snowpack?¶

Let's bring in only CCSS stations¶

Let's check out the percent of normal snow depth for April 1st, 2023¶

View the values in order...¶

Plot the percent of normal snow depths for April 1st, 2023¶

Read a variable from all CSVs by looping over the entire geodataframe: Has the date of maximum SWE changed in the Western US?¶

This time we will get SWE at every station¶

Prepare the data¶

Calculate and plot trend in DOWY of max SWE for all data¶

Check out trend at each station seperately¶

Let's analyze these trends by mountain range¶

Let's visualize this on a map¶

What if we want to check out all available daily variables at all SNOTEL and CCSS stations for all time?¶

Use geopandas `GeoDataFrame.explore()` on the `all_stations` geodataframe to interactively view the stations¶