%load_ext watermark
import pandas as pd
import numpy as np
from typing import Type, Optional, Callable
from typing import List, Dict, Union

from review_methods_tests import collect_vitals, find_missing, find_missing_loc_dates
from review_methods_tests import use_gfrags_gfoams_gcaps, make_a_summary,combine_survey_files

import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.colors
from matplotlib.colors import LinearSegmentedColormap, ListedColormap

from setvariables import *

Testing data models#

The methods used in the version of the federal report were tested, but their was not a specific set of validation criteria beforehand. Test were done as the work progressed. This wasted alot of time

here we test the land use and survey data models.

  1. is the land use data complete for each survey location?

  2. does the survey data aggregate correctly to sample level?

    • what happens to objects with a quantity of zero?

    • aggregating to cantonal, municipal or survey area

      • are all locations included?

      • are lakes and rivers distinguished?

  3. Does the aggregated data for iqaasl match the federal report?

Gfoams, Gfrags, Gcaps#

These are aggregate groups. It is difficult to infer how well a participant differentiates between size or use of the following codes.

  1. Gfrags: G79, G78, G75

  2. Gfoams: G81, G82, G76

  3. Gcaps: G21, G22, G23, G24

These aggregate groups are used when comparing values between sampling campaigns.

Sampling campaigns#

The dates of the sampling campaigns are expanded to include the surveys that happened between large organized campaigns. The start and end dates are defined below.

Attention!! The codes used for each survey campaign are different. Different groups organized and conducted surveys using the MLW protocol. The data was then sent to us.

MCBP: November 2015 - November 2016. The initial sampling campaign. Fragmented plastics (Gfrags/G79/G78/G76) were not sorted by size. All unidentified hard plastic items were classified in this manner.

  • start_date = 2015-11-15

  • end_date = 2017-03-31

SLR: April 2017 - May 2018. Sampling campaign by the WWF. Objects less than 2.5 cm were not counted.

  • start_date = 2017-04-01

  • end_date = 2020-03-31

IQAASL: April 2020 - May 2021. Sampling campaign mandated by the Swiss confederation. Additional codes were added for regional objects.

  • start_date = 2020-04-01

  • end_date = 2021-05-31

Plastock (not added yet): January 2022 - December 2022. Sampling campaign from the Association pour la Sauvegarde du Léman. Not all objects were counted, They only identified a limited number of objects.

Feature name#

The feature name is the name of a river lake or other regional label that you would find on a map. People in the region know the name.

Feature type#

The feature type is a label that applies to general conditions of use for the location and other locations in the region

  • r: rivers: surveys on river banks

  • l: lake: surveys on the lake shore

  • p: parcs: surveys in recreational areas

Parent boundary#

Designates the larger geographic region of the survey location. For lakes and rivers it is the name of the catchment area or river basin. For parcs it is the the type of park ie.. les Alpes. Recall that each feature has a name, for example Alpes Lépontines is the the name of a feature in the geographic region of Les Alpes.

Aggregate a set of data by sample (location and date)#

Use the loc_date column in the survey data. Use the IQAASL period and four river baisns test against the federal report.

Before aggregating does the number of locations, cities, samples and quantity match the federal report?#

The feature types include lakes and rivers, alpes were condsidered separately

From https://hammerdirt-analyst.github.io/IQAASL-End-0f-Sampling-2021/lakes_rivers.html#

  1. cities = yes

  2. samples = yes

  3. locations = yes

  4. quantity = No it is short 50 pieces

  5. start and end date = yes

Hide code cell source
# startint varaibles
period = "iqaasl"
survey_areas = ["rhone", "ticino", "linth", "aare"]
start, end = [*period_dates[period]]
# the surveys from the survey areas of intersest
survey_data = slice_data_by_date(surveys.copy(), start, end)

# the survey data sliced by the start and end data
feature_d= survey_data[survey_data.parent_boundary.isin(survey_areas)].copy()

# convert codes to gfrags, gcaps and gfoams
feature_data = use_gfrags_gfoams_gcaps(feature_d.copy(), codes)

# check the numbers
feature_vitals = collect_vitals(feature_d)
print(make_a_summary(feature_vitals))
    Number of objects: 54694
    
    Median pieces/meter: 0.0
    
    Number of samples: 386
    
    Number of unique codes: 235
    
    Number of sample locations: 143
    
    Number of features: 28
    
    Number of cities: 77
    
    Start date: 2020-03-08
    
    End date: 2021-05-12
    
    
# when the codes are changed to gfrags, gfoams and gcaps that creates 
# multiple code results for the same code at the same sample
# note that the code_result_columns do not have the groupname column
# this is because the code is changed and not the groupname
code_result_df = aggregate_dataframe(feature_data.copy(), code_result_columns, unit_agg)
code_result_df = code_result_df.merge(codes.groupname, left_on="code", right_index=True)
code_result_df = code_result_df.merge(beaches[["canton","feature_type"]], left_on='slug', right_index=True, validate="many_to_one")

Number of lakes, rivers, parcs, cities and cantons#

  Échantillons Municipalités Lacs Rivières Parcs Quantité
St. Gallen 38 5 2 3 0 3'614
Aargau 4 4 0 2 0 101
Bern 88 21 4 4 0 8'786
Solothurn 3 2 0 1 0 66
Vaud 87 14 2 2 0 17'414
Tessin 28 7 2 3 0 3'023
Genève 20 2 1 1 0 4'962
Neuchâtel 16 4 2 0 0 2'375
Glarus 16 2 1 2 0 1'016
Valais 15 5 1 1 0 7'638
Zürich 49 6 1 3 0 4'543
Fribourg 14 2 1 0 0 930
Schwyz 1 1 1 0 0 104
Zug 4 2 1 1 0 64
Luzern 3 1 1 0 0 58

aggregate to sample#

The assessments are made on a per sample basis. That means that we can look at an individual object value at each sample. The sum of all the individual objects in a survey is the total for that survey. Dividing the totals by the length of the survey gives the assessment metric: pieces of trash per meter.

  1. Are the quantiles of the current data = to the federal report? Yes

  2. Are the material totals = to the federal report? No,plastics if off by 50 pcs

  3. Are the fail rates of the most common objects = to the federal report? Yes

  4. Is the % of total of the most common objects = to the fedral report? yes

  5. Is the median pieces/meter of the most common objects = to the federal report? yes

  6. Is the quantity of the most common objects = to the federal report? yes

The summary of survey totals#

fig 1.5 in IQAASL

Hide code cell source
# the sample is the basic unit
# loc_date is the unique identifier for each sample
unit_columns = ["loc_date", "slug", "parent_boundary"]

# the quantiles of the sample-total pcs/m  
vector_summary = a_summary_of_one_vector(code_result_df.copy(), unit_columns, unit_agg, describe='pcs_m')

translated_and_style_for_display(vector_summary,l_mapi, lang, gradient=False)
  Pcs/M
Échantillons 386
Moyenne 3,95
Écart-Type 7,06
Min 0,02
25% 0,82
50% 1,90
75% 3,87
Max 66,17
Total 54'694

Material totals and proportions#

fig 1.5 iqaal

Hide code cell source
# add the material label to each code
merged_result = merge_dataframes_on_column_and_index(code_result_df.copy(), codes["material"], 'code', how='inner', validate=True)

# sum the materials for the data frame
materials = aggregate_dataframe(merged_result.copy(), ["material"], {"quantity":"sum"})

# add 5 of total for display
materials["%"] = materials.quantity/materials.quantity.sum()

translated_and_style_for_display(materials.set_index('material', drop=True),l_mapi, lang, gradient=False)
  Quantité % Du Total
Chimique 140 0,00
Tissu 343 0,01
Verre 2'919 0,05
Métal 1'874 0,03
Papier 1'527 0,03
Plastique 47'093 0,86
Caoutchouc 390 0,01
Non-Identifié 2 0,00
Bois 406 0,01

Quantity, median pcs/m, fail rate, and % of total#

Sumary results for all the codes in the parent_boundary

Hide code cell source
# sum the cumulative quantity for each code and calculate the median pcs/meter
code_totals = aggregate_dataframe(code_result_df.copy(), ["code"], {"quantity":"sum", "pcs_m":"median"})

# collect 
abundant = get_top_x_records_with_max_quantity(code_totals.copy(), "quantity", "code", len(code_totals))

# identify the objects that were found in at least 50% of the samples
# calculate the quantity per sample for each code and sample
occurrences = aggregate_dataframe(code_result_df, ["loc_date", "code"], {"quantity":"sum"})

# count the number of times that an object was counted > 0
# and divide it by the total number of samples 
event_counts  = count_objects_with_positive_quantity(occurrences)

# calculate the rate of occurence per unit of measure
rates = calculate_rate_per_unit(code_result_df, code_result_df.code.unique())

# add the unit rates and fail rates
abundance = merge_dataframes_on_column_and_index(abundant, rates["pcs_m"], left_column="code", validate="one_to_one")
abundance["fail rate"] = abundance.code.apply(lambda x: event_counts.loc[x])

# this is the complete inventory with summary
# statistics for each objecabundance.sort_values(by="quantity", inplace=True, ascending=False)
abundance.reset_index(inplace=True, drop=True)

The most common objects#

fig 1.6 iqaasl

Hide code cell source
# arguments to slice the data by column
column_one = {
    'column': 'quantity',
    'val': abundance.loc[10, 'quantity']
}

column_two = {
    'column':'fail rate',
    'val': 0.5
}

# use the inventory to find the most common objects
the_most_common = display_tabular_data_by_column_values(abundance.copy(), column_one, column_two, 'code')

translated_and_style_for_display(the_most_common.copy(),l_mapi, lang, gradient=False)
  Quantité % Du Total Pcs/M Taux D'Échec
Mégots Et Filtres À Cigarettes 8'485 0,16 0,20 0,88
Fragments De Plastique: G80, G79, G78, G75 7'400 0,14 0,18 0,86
Fragments De Polystyrène Expansé: G76, G81, G82, G83 5'559 0,10 0,05 0,69
Emballages De Bonbons, De Snacks 3'325 0,06 0,09 0,85
Bâche, Feuille Plastique Industrielle 2'534 0,05 0,05 0,70
Verre Brisé 2'136 0,04 0,03 0,65
Pellets Industriels (Gpi) 1'968 0,04 0,00 0,31
Couvercles En Plastique Bouteille: G21, G22, G23, G24 1'844 0,03 0,03 0,65
Mousse De Plastique Pour L'Isolation Thermique 1'656 0,03 0,01 0,53
Coton-Tige 1'406 0,03 0,01 0,51
Polystyrène < 5Mm 1'209 0,02 0,00 0,26
Déchets De Construction En Plastique 992 0,02 0,01 0,52
Bouchons De Bouteilles En Métal, Couvercles Et Tirettes 700 0,01 0,01 0,52

Results by groupname and feature boundary#

Hide code cell source
cumulative_columns = ["loc_date", "groupname"]
unit_columns = ["parent_boundary", "loc_date", "groupname"]
object_labels = code_result_df.groupname.unique()
object_columns = ["groupname"]
boundary_labels = code_result_df.parent_boundary.unique()

args = {
    'cumulative_columns':cumulative_columns,
    'object_labels':object_labels,
    'boundary_labels':boundary_labels,
    'object_columns':object_columns,
    'unit_agg':unit_agg,
    'unit_columns':unit_columns,
    'agg_groups':agg_groups
}

tix = summary_of_parent_and_child_features(code_result_df.copy(), **args)
translated_and_style_for_display(tix,l_mapi, lang, gradient=True)
  Linth Aare Rhône Ticino Cumulé
Agriculture 0,03 0,06 0,14 0,06 0,07
Nourriture Et Boissons 0,28 0,25 0,70 0,28 0,34
Infrastructures 0,12 0,14 0,55 0,21 0,20
Micro-Plastiques (< 5Mm) 0,00 0,01 0,11 0,00 0,01
Emballage Non Alimentaire 0,13 0,09 0,21 0,08 0,13
Articles Personnels 0,04 0,04 0,10 0,07 0,06
Morceaux De Plastique 0,11 0,18 0,48 0,10 0,18
Loisirs 0,04 0,06 0,17 0,04 0,06
Tabac 0,27 0,15 0,50 0,18 0,25
Non Classé 0,00 0,00 0,02 0,00 0,00
Eaux Usées 0,01 0,03 0,19 0,02 0,03

Most common codes by feature boundary#

Hide code cell source
cumulative_columns = ["loc_date", "code"]
unit_columns = ["parent_boundary", "loc_date", "code"]
codes_of_interest = the_most_common.index
object_columns = ["code"]
boundary_labels = code_result_df.parent_boundary.unique()

data = code_result_df[code_result_df.code.isin(codes_of_interest)].copy()

args = {
    'cumulative_columns':cumulative_columns,
    'object_labels':codes_of_interest,
    'boundary_labels':boundary_labels,
    'object_columns':object_columns,
    'unit_agg':unit_agg,
    'unit_columns':unit_columns,
    'agg_groups':agg_groups
}

tix = summary_of_parent_and_child_features(data.copy(), **args)

translated_and_style_for_display(tix,l_mapi, lang, gradient=True)
  Linth Aare Rhône Ticino Cumulé
Pellets Industriels (Gpi) 0,00 0,00 0,00 0,00 0,00
Polystyrène < 5Mm 0,00 0,00 0,00 0,00 0,00
Bouchons De Bouteilles En Métal, Couvercles Et Tirettes 0,01 0,00 0,03 0,01 0,01
Verre Brisé 0,04 0,03 0,02 0,08 0,03
Mégots Et Filtres À Cigarettes 0,23 0,11 0,42 0,15 0,20
Emballages De Bonbons, De Snacks 0,06 0,08 0,19 0,04 0,09
Bâche, Feuille Plastique Industrielle 0,02 0,05 0,09 0,04 0,05
Mousse De Plastique Pour L'Isolation Thermique 0,00 0,00 0,07 0,03 0,01
Déchets De Construction En Plastique 0,00 0,00 0,06 0,03 0,01
Coton-Tige 0,00 0,00 0,11 0,00 0,01
Couvercles En Plastique Bouteille: G21, G22, G23, G24 0,03 0,02 0,10 0,00 0,03
Fragments De Polystyrène Expansé: G76, G81, G82, G83 0,03 0,04 0,17 0,05 0,05
Fragments De Plastique: G80, G79, G78, G75 0,11 0,18 0,48 0,10 0,18

Most common codes by canton#

Hide code cell source
unit_columns = ["canton", "loc_date", "code"]
object_columns = ["code"]
boundary_labels = code_result_df.canton.unique()

data = code_result_df[code_result_df.code.isin(codes_of_interest)].copy()

args = {
    'cumulative_columns':cumulative_columns,
    'object_labels':codes_of_interest,
    'boundary_labels':boundary_labels,
    'object_columns':object_columns,
    'unit_agg':unit_agg,
    'unit_columns':unit_columns,
    'agg_groups':agg_groups
}

tix = summary_of_parent_and_child_features(data, **args)

translated_and_style_for_display(tix.T,l_mapi, lang, gradient=True)
  Aargau Bern Fribourg Genève Glarus Luzern Neuchâtel Schwyz Solothurn St. Gallen Tessin Valais Vaud Zug Zürich Cumulé
Pellets Industriels (Gpi) 0,00 0,00 0,00 0,03 0,00 0,00 0,00 0,16 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
Polystyrène < 5Mm 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00
Bouchons De Bouteilles En Métal, Couvercles Et Tirettes 0,00 0,00 0,00 0,07 0,01 0,00 0,05 0,00 0,03 0,00 0,01 0,02 0,02 0,00 0,05 0,01
Verre Brisé 0,00 0,03 0,03 0,01 0,00 0,00 0,21 0,00 0,00 0,05 0,08 0,00 0,03 0,00 0,12 0,03
Mégots Et Filtres À Cigarettes 0,01 0,11 0,11 0,17 0,17 0,08 0,33 0,00 0,06 0,23 0,15 0,06 0,47 0,07 0,46 0,20
Emballages De Bonbons, De Snacks 0,03 0,08 0,06 0,16 0,07 0,05 0,12 0,20 0,02 0,10 0,04 0,50 0,12 0,01 0,07 0,09
Bâche, Feuille Plastique Industrielle 0,00 0,08 0,00 0,00 0,04 0,00 0,07 0,15 0,00 0,11 0,04 0,53 0,08 0,00 0,00 0,05
Mousse De Plastique Pour L'Isolation Thermique 0,00 0,01 0,00 0,01 0,01 0,00 0,01 0,00 0,00 0,00 0,03 0,28 0,06 0,00 0,00 0,01
Déchets De Construction En Plastique 0,00 0,01 0,00 0,00 0,01 0,00 0,04 0,00 0,00 0,01 0,03 0,44 0,04 0,01 0,00 0,01
Coton-Tige 0,00 0,00 0,00 0,02 0,00 0,00 0,02 0,01 0,03 0,00 0,00 0,68 0,08 0,00 0,00 0,01
Couvercles En Plastique Bouteille: G21, G22, G23, G24 0,00 0,03 0,00 0,04 0,01 0,00 0,02 0,08 0,00 0,05 0,00 0,96 0,09 0,03 0,03 0,03
Fragments De Polystyrène Expansé: G76, G81, G82, G83 0,03 0,06 0,04 0,01 0,08 0,04 0,02 0,01 0,00 0,08 0,05 1,06 0,18 0,00 0,00 0,05
Fragments De Plastique: G80, G79, G78, G75 0,04 0,25 0,06 0,20 0,07 0,00 0,21 0,08 0,06 0,22 0,10 2,04 0,42 0,00 0,10 0,18

Most common codes: canton-municipal#

Bern#

Hide code cell source
canton = "Bern"

with_cantons = code_result_df[code_result_df.canton == canton].copy()

unit_columns = ["city", "loc_date", "code"]
# the column that holds the labels of interest
object_columns = ["code"]
# the labels of interest for the boundary conditions
boundary_labels = with_cantons.city.unique()

ddata = with_cantons[(with_cantons.code.isin(codes_of_interest)) & (with_cantons.canton == "Bern")].copy()

args = {
    'cumulative_columns':cumulative_columns,
    'object_labels':codes_of_interest,
    'boundary_labels':boundary_labels,
    'object_columns':object_columns,
    'unit_agg':unit_agg,
    'unit_columns':unit_columns,
    'agg_groups':agg_groups
}

tix = summary_of_parent_and_child_features(ddata, **args)
translated_and_style_for_display(tix.T,l_mapi, lang, gradient=True)
  Beatenberg Bern Biel/Bienne Brienz (Be) Brügg Burgdorf Bönigen Erlach Gals Kallnach Köniz Ligerz Lüscherz Nidau Port Rubigen Spiez Thun Unterseen Vinelz Walperswil Cumulé
Pellets Industriels (Gpi) 0,03 0,00 0,04 0,00 0,00 0,00 0,04 0,00 0,00 0,10 0,00 0,00 0,00 0,08 0,01 0,00 0,00 0,00 0,00 0,10 0,00 0,00
Polystyrène < 5Mm 0,00 0,00 0,06 0,00 0,00 0,00 0,00 0,00 0,17 0,00 0,00 0,00 0,00 0,04 0,00 0,00 0,00 0,00 0,02 0,00 0,00 0,00
Bouchons De Bouteilles En Métal, Couvercles Et Tirettes 0,00 0,00 0,02 0,02 0,00 0,00 0,00 0,02 0,06 0,03 0,00 0,07 0,00 0,00 0,03 0,00 0,00 0,00 0,00 0,00 0,00 0,00
Verre Brisé 0,02 0,00 0,05 0,00 0,36 0,00 0,00 0,02 0,07 0,00 0,00 1,00 0,20 0,12 0,01 0,00 0,13 0,00 0,02 0,08 0,00 0,03
Mégots Et Filtres À Cigarettes 0,55 0,01 0,81 0,00 0,28 0,07 1,19 0,44 0,09 0,00 0,09 0,76 0,04 0,00 0,63 0,06 0,04 0,23 0,55 0,04 0,00 0,11
Emballages De Bonbons, De Snacks 0,12 0,01 0,34 0,39 0,00 0,02 0,06 0,05 0,12 0,08 0,00 0,78 0,02 0,60 0,07 0,11 0,00 0,09 0,11 0,18 0,00 0,08
Bâche, Feuille Plastique Industrielle 0,03 0,01 0,17 0,67 0,00 0,22 0,15 0,00 0,03 0,32 0,00 1,19 0,05 0,40 0,00 0,00 0,01 0,13 0,13 0,37 0,00 0,08
Mousse De Plastique Pour L'Isolation Thermique 0,42 0,00 0,05 0,02 0,00 0,00 0,10 0,00 0,00 0,00 0,00 0,21 0,00 0,00 0,01 0,00 0,00 0,00 0,09 0,00 0,19 0,01
Déchets De Construction En Plastique 0,00 0,00 0,06 0,00 0,00 0,00 0,04 0,02 0,00 0,00 0,00 0,00 0,00 0,04 0,00 0,00 0,00 0,00 0,01 0,07 0,00 0,01
Coton-Tige 0,04 0,00 0,05 0,06 0,00 0,00 0,06 0,12 0,06 0,07 0,00 0,12 0,00 0,00 0,00 0,00 0,00 0,01 0,02 0,00 0,00 0,00
Couvercles En Plastique Bouteille: G21, G22, G23, G24 0,12 0,00 0,08 0,28 0,00 0,00 0,04 0,06 0,00 0,06 0,00 0,06 0,00 0,04 0,03 0,06 0,00 0,02 0,04 0,08 0,03 0,03
Fragments De Polystyrène Expansé: G76, G81, G82, G83 0,18 0,00 0,16 0,22 0,00 0,00 0,07 0,02 0,00 0,14 0,00 0,06 0,00 0,04 0,00 0,00 0,07 0,16 0,11 0,00 0,00 0,06
Fragments De Plastique: G80, G79, G78, G75 0,44 0,01 0,48 0,39 0,03 0,02 1,01 0,49 0,28 0,12 0,00 1,94 0,14 0,64 0,22 0,00 0,04 0,24 0,21 1,28 0,00 0,25

Valais#

Hide code cell source
canton = "Valais"

with_cantons = code_result_df[code_result_df.canton == canton].copy()

unit_columns = ["city", "loc_date", "code"]
# the column that holds the labels of interest
object_columns = ["code"]
# the labels of interest for the boundary conditions
boundary_labels = with_cantons.city.unique()

ddata = with_cantons[(with_cantons.code.isin(codes_of_interest))].copy()

args = {
    'cumulative_columns':cumulative_columns,
    'object_labels':codes_of_interest,
    'boundary_labels':boundary_labels,
    'object_columns':object_columns,
    'unit_agg':unit_agg,
    'unit_columns':unit_columns,
    'agg_groups':agg_groups
}

tix = summary_of_parent_and_child_features(ddata, **args)
translated_and_style_for_display(tix,l_mapi, lang, gradient=True)
  Saint-Gingolph Riddes Sion Leuk Salgesch Cumulé
Pellets Industriels (Gpi) 0,03 0,00 0,00 0,00 0,00 0,00
Polystyrène < 5Mm 0,18 0,00 0,00 0,00 0,00 0,00
Bouchons De Bouteilles En Métal, Couvercles Et Tirettes 0,03 0,03 0,00 0,00 0,00 0,02
Verre Brisé 0,03 0,00 0,00 0,00 0,00 0,00
Mégots Et Filtres À Cigarettes 0,13 0,00 0,04 0,00 0,00 0,06
Emballages De Bonbons, De Snacks 0,77 0,00 0,02 0,00 0,00 0,50
Bâche, Feuille Plastique Industrielle 0,92 0,00 0,00 0,06 0,30 0,53
Mousse De Plastique Pour L'Isolation Thermique 0,58 0,00 0,00 0,00 0,00 0,28
Déchets De Construction En Plastique 0,85 0,00 0,00 0,08 0,00 0,44
Coton-Tige 1,25 0,00 0,00 0,00 0,00 0,68
Couvercles En Plastique Bouteille: G21, G22, G23, G24 1,66 0,00 0,00 0,00 0,00 0,96
Fragments De Polystyrène Expansé: G76, G81, G82, G83 3,34 0,00 0,02 0,02 0,00 1,06
Fragments De Plastique: G80, G79, G78, G75 2,56 0,00 0,06 0,02 0,00 2,04
Author: hammerdirt-analyst

conda environment: cantonal_report

numpy     : 1.25.2
matplotlib: 3.7.1
pandas    : 2.0.3