Exploring Echoregions Regions2D Functionality#

This notebook parses region values from an Echoview .evr file and creates a region mask for the corresponding Echogram data.

Installation#

Prior to running this notebook and all other notebooks, make sure to pip install Echoregions and Echopype Plotting Library.

Install Using PyPi:

pip install echoregions

pip install echopype[plot]

Install Using Latest Github Main Branch Commit:

pip install git+https://github.com/OSOceanAcoustics/echoregions.git

pip install git+https://github.com/OSOceanAcoustics/echopype.git@plot

# Importing Packages
import matplotlib.pyplot as plt
import urllib.request
import shutil
import xarray as xr
import numpy as np
from pandas.testing import assert_frame_equal
from echopype.visualize.cm import cmap_d

import echoregions as er

Regions Data Reading#

To start this tutorial, we first download evr data from Echoregions’ Github Repository and parse the .evr file using Echoregions’ read_evr function.

The parsing is based off of the .evr data description shown on Echoview’s website: Region Attributes.

# Set path to test data
TEST_DATA_PATH = 'https://raw.githubusercontent.com/OSOceanAcoustics/echoregions/contains_transect_zip/echoregions/test_data'

# Download example EVR File
urllib.request.urlretrieve(f"{TEST_DATA_PATH}/transect.evr","transect.evr")

# Read EVR file
regions2d = er.read_evr('transect.evr')

Regions2D as a DataFrame#

regions2d is a specialized object but it has a data attribute which is a simple dataframe.

# Grab regions2d dataframe
regions2d_df = regions2d.data
regions2d_df.head(3)
file_name file_type evr_file_format_number echoview_version region_id region_structure_version region_point_count region_selected region_creation_type dummy ... region_bbox_right region_bbox_top region_bbox_bottom region_class region_type region_name time depth region_notes region_detection_settings
0 transect.evr EVRG 7 13.0.378.44817 1 13 4 0 6 -1 ... 2019-07-02 08:10:09.425500 -9999.99 9999.99 Log 2 COM [2019-07-02T03:50:54.629500000, 2019-07-02T03:... [-9999.99, 9999.99, 9999.99, -9999.99] [Switched from recording to "night" folder to ... []
1 transect.evr EVRG 7 13.0.378.44817 2 13 4 0 6 -1 ... 2019-07-02 12:32:31.740500 -9999.99 9999.99 Log 2 Com [2019-07-02T12:32:30.175500000, 2019-07-02T12:... [-9999.99, 9999.99, 9999.99, -9999.99] [Recording 10 min of passive data] []
2 transect.evr EVRG 7 13.0.378.44817 3 13 4 0 6 -1 ... 2019-07-02 12:43:10.758500 -9999.99 9999.99 Log 2 COM [2019-07-02T12:43:06.273000000, 2019-07-02T12:... [-9999.99, 9999.99, 9999.99, -9999.99] [End passive data collection and back to active] []

3 rows × 22 columns

The regions2d object can be subsetted using the select_region function and with parameters related to region class, time, and depth. For this example let us select a trawl region based on the region_class parameter:

trawl_regions = regions2d.select_region(region_class="Trawl")
trawl_regions
file_name file_type evr_file_format_number echoview_version region_id region_structure_version region_point_count region_selected region_creation_type dummy ... region_bbox_right region_bbox_top region_bbox_bottom region_class region_type region_name time depth region_notes region_detection_settings
17 transect.evr EVRG 7 13.0.378.44817 18 13 4 0 4 -1 ... 2019-07-02 19:52:21.531 9.244758 758.973217 Trawl 0 AWT20 [2019-07-02T18:40:51.809700000, 2019-07-02T18:... [200.0, 300.0, 300.0, 200.0] [] []
18 transect.evr EVRG 7 13.0.378.44817 19 13 4 0 4 -1 ... 2019-07-02 19:52:21.531 9.244758 758.973217 Trawl 0 AWT20 [2019-07-02T19:22:21.531000000, 2019-07-02T19:... [220.0, 350.0, 350.0, 220.0] [] []

2 rows × 22 columns

Now notice that these regions are not closed:

for _, point in trawl_regions.iterrows():
    plt.plot(point["time"], point["depth"])
_images/0094ca8d84a3e4d193ba7f2f2ae1ed5d5963726e93d2e2ac78deabb4bd50692b.png

We can close these regions and re-plot:

trawl_regions_closed = regions2d.close_region(region_class="Trawl")

for _, row in trawl_regions_closed.iterrows():
    plt.plot(row["time"], row["depth"])
_images/90392fc53bad52a5841b0390e39dbec5df19e18c01116cd0c0b622886479f691.png

To select Trawl regions, close regions, and plot regions, one can also just run the following using the object’s plot function:

regions2d.plot(region_class="Trawl", close_regions=True)
_images/90392fc53bad52a5841b0390e39dbec5df19e18c01116cd0c0b622886479f691.png

Echogram Data Reading and Plotting#

Let’s now download and plot an echogram created using Echopype. Echopype is a comprehensive software designed for parsing sonar backscatter and conducting scientific computations. It offers a wide array of functionalities, including the creation of echograms, which are visual representations formed by the echoes received from sonar signals. Additionally, Echopype seamlessly integrates with Echoregions, a specialized software for parsing echogram annotations. For now, we will primarily be working with the ds_Sv["Sv"] backscatter volume data variable, which when plotted becomes an echogram, but there are many more data variables that can be used when working with Echopype.

# Download example Echopype Sv Zarr File
urllib.request.urlretrieve(f"{TEST_DATA_PATH}/transect.zip","transect.zip")

# Extract the ZIP file
shutil.unpack_archive("transect.zip", "")

# Read volume backscattering strength (Sv) data
ds_Sv = xr.open_dataset('transect.zarr', engine="zarr")
ds_Sv["Sv"]
<xarray.DataArray 'Sv' (depth: 3955, ping_time: 1681, channel: 3)>
[19945065 values with dtype=float64]
Coordinates:
  * channel       (channel) <U37 'GPT  18 kHz 009072058c8d 1-1 ES18-11' ... '...
  * depth         (depth) float64 9.15 9.34 9.529 9.719 ... 758.1 758.3 758.5
  * ping_time     (ping_time) datetime64[ns] 2019-07-02T18:40:00 ... 2019-07-...
    range_sample  (depth) int64 ...
Attributes:
    actual_range:  [-156.0, 37.68]
    long_name:     Volume backscattering strength (Sv re 1 m-1)
    units:         dB
# Plot the 38 kHz backscatter channel
ds_Sv["Sv"].isel(channel=1).plot.pcolormesh(y="depth", yincrease=False, vmin=-70, vmax=-30, cmap=cmap_d["ek500"])
<matplotlib.collections.QuadMesh at 0x7f09711b93f0>
_images/40f2d5e2d98d00bd1fa1702c1a9cd717bdae8ab36a37ab9e68aefc1056087855.png

Plotting Echogram and Region#

From the two previous plots, one can kind of see how they’re related on both the depth and time dimensions. Now let’s see a region annotation overlayed on top of the Echogram dataset.

# Plotting the echogram data and the trawl region
plt.figure(figsize=(20, 6))
for _, point in trawl_regions_closed.iterrows():
    color = 'red' if point['region_id'] == 18 else 'black'  # Assign red for region ID 18 and black for region ID 19
    plt.plot(point["time"], point["depth"], fillstyle='full', markersize=1, color=color)

ds_Sv.Sv.isel(channel=1).T.plot(y="depth", yincrease=False, vmin=-70, vmax=-30, cmap=cmap_d["ek500"])
<matplotlib.collections.QuadMesh at 0x7f096f846770>
_images/9957a0c96155164d1a6d2bac659dd22d9490ea1b5a4342873b026f1894fb830f.png

Masking Echogram and Region#

From the previous plot, one can kind of see how they’re related on both the depth and time dimensions. Now let’s see a region annotation overlayed on top of the Echogram dataset.

For machine learning training purposes, and more specifically, for computer vision segmentation model training, a region polygon is not enough and so we require masks of the echogram data, which indicate which points are within the region.

The Regions2D class has a function to do this specific task: mask. This mask function is essentially a wrapper over the mask_3D function from regionmask. The 3D in mask_3D is important here since the masks we get out of regions2d will be 3D masks by default (when collapse_to_2d = False in regions2d.mask, which is the default value). This means that the mask will have three dimensions: ping_time, depth, and region_id, where region_id is the unique identifier of a region. The separation of regions via the region_id dimension allows the storage of ‘within region’ and ‘outside region’ information as 1 and 0 respectively.

If the user passes in collapse_to_2d = True they will produce a 2D mask where the mask will have dimensions ping_time, and depth, and the ‘within region’ values will be the region_id values and ‘outside region’ values will be NaN.

Let’s look first at the default collapse_to_2d = False case:

# Use the built in mask function to create a mask
region_mask_ds, region_points = regions2d.region_mask(
    ds_Sv["Sv"].isel(channel=1).drop_vars("channel").compute(),
    region_class="Trawl",
    #collapse_to_2d= False, mask defaults to this so we can comment it out
)

Note that selecting 1 channel here is to ensure that our output is a single channel mask. The number of channels of the data array that is passed into the mask will match the number of channels of the output bottom mask data array.

The output should be the region mask itself and the points that constitute the region.

Note that even though we use region_class here to select the region to be masked, what is stored in the dataset as a dimension is the region_id since it is unique and region_class is not:

region_mask_ds
<xarray.Dataset>
Dimensions:      (region_id: 2, depth: 3955, ping_time: 1681)
Coordinates:
  * region_id    (region_id) int64 18 19
  * depth        (depth) float64 9.15 9.34 9.529 9.719 ... 758.1 758.3 758.5
  * ping_time    (ping_time) datetime64[ns] 2019-07-02T18:40:00 ... 2019-07-0...
Data variables:
    mask_labels  (region_id) int64 0 1
    mask_3d      (region_id, depth, ping_time) int64 0 0 0 0 0 0 ... 0 0 0 0 0 0

The values in the region mask should also just be 1s (within the region) and 0s (outside of the region):

print("Unique Values in Region Mask:", np.unique(region_mask_ds["mask_3d"].data))
Unique Values in Region Mask: [0 1]

As a sanity check, let us check that the ping_time and depth dimensions in the region mask region_mask_ds are equal to that in the backscatter data ds_Sv["Sv"]:

print(
    "Region Mask Ping Time Dimension Length:",
    len(region_mask_ds["ping_time"])
)
print(
    "Region Mask Depth Dimension Length:",
    len(region_mask_ds["depth"])
)
print(
    "Echogram Ping Time Dimension Length:",
    len(ds_Sv["Sv"]["ping_time"])
)
print(
    "Echogram Depth Dimension Length:",
    len(ds_Sv["Sv"]["depth"])
)
Region Mask Ping Time Dimension Length: 1681
Region Mask Depth Dimension Length: 3955
Echogram Ping Time Dimension Length: 1681
Echogram Depth Dimension Length: 3955

A plot of region id 18:

region_mask_ds["mask_3d"].sel(region_id=18).plot(y="depth", yincrease=False)
<matplotlib.collections.QuadMesh at 0x7f09449b46a0>
_images/ac9ea2c13c744003025ccfca6e39e708fc98b70da11491a9a8f7d88f30295acf.png

A plot of region id 19:

region_mask_ds["mask_3d"].sel(region_id=19).plot(y="depth", yincrease=False)
<matplotlib.collections.QuadMesh at 0x7f0967f7a7a0>
_images/0812db969ee04c80ef9179b2a8aaddc32981bedbbe83e16e3a8200464b145ea2.png

A plot of the 38 kHz channel where the mask is 1 and masked region’s region id is 19:

# Get only 38 kHz channel values where the mask is 1 and masked region's region id is 19
mask_exists_Sv = xr.where(
    region_mask_ds["mask_3d"].sel(region_id=19) == 1,
    ds_Sv["Sv"].isel(channel=1),
    np.nan,
)

# Plot the masked Sv
mask_exists_Sv.plot(y="depth", yincrease=False, vmin=-70, vmax=-30, cmap=cmap_d["ek500"])
<matplotlib.collections.QuadMesh at 0x7f0967e56f80>
_images/d78dc6c83f3d83c2a46d86ce7cca376e622aff2d218ba2f304ee6a8db037784e.png

Now let us look at what is produced in the collapse_to_2d = True case:

# Set collapse_to_2d = True
region_mask_2d_ds, region_points_2d = regions2d.region_mask(
    ds_Sv["Sv"].isel(channel=1).drop_vars("channel").compute(),
    region_class="Trawl",
    collapse_to_2d=True,
)
region_mask_2d_ds["mask_2d"]
<xarray.DataArray 'mask_2d' (depth: 3955, ping_time: 1681)>
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]])
Coordinates:
  * depth      (depth) float64 9.15 9.34 9.529 9.719 ... 757.9 758.1 758.3 758.5
  * ping_time  (ping_time) datetime64[ns] 2019-07-02T18:40:00 ... 2019-07-02T...

The values in the 2d region mask should be 18, 19, and NaN:

print("Unique Values in Region Mask:", np.unique(region_mask_2d_ds["mask_2d"].data))
Unique Values in Region Mask: [18. 19. nan]

Note that ds_Sv is the entire dataset containing many data variables that Echoregions does not work with, and so passing in the entire dataset to regions2d.mask will produce an error since it expects a single data array ds_Sv["Sv"]:

# Test ds_Sv
regions2d.region_mask(
    ds_Sv,
    region_class="Trawl",
)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[24], line 2
      1 # Test ds_Sv
----> 2 regions2d.mask(
      3     ds_Sv,
      4     region_class="Trawl",
      5 )

File ~/echoregions/echoregions/regions2d/regions2d.py:489, in Regions2D.mask(self, da_Sv, region_id, region_class, mask_name, mask_labels, collapse_to_2d)
    451 """
    452 Mask data from Data Array containing Sv data based off of a Regions2D object
    453 and its regions ids.
   (...)
    486     DataFrame containing region_id, depth, and time.
    487 """
    488 if not isinstance(da_Sv, DataArray):
--> 489     raise TypeError(
    490         f"Input da_Sv must be of type DataArray. da_Sv was instead of type {type(da_Sv)}"
    491     )
    493 # Dataframe containing region information.
    494 region_df = self.select_region(region_id, region_class)

TypeError: Input da_Sv must be of type DataArray. da_Sv was instead of type <class 'xarray.core.dataset.Dataset'>

Saving to “.csv” and Reading From “.csv”#

So now that we have our mask and our region polygon points, how do we save them?

We can use the Echoregions read_regions_csv function to first load it onto a region2d object and use the region2d object’s to_csv function to save the regions2d dataframe as a .csv.

# Create new regions2d object
from_mask_regions = er.read_regions_csv(region_points)

# Save to .csv
from_mask_regions.to_csv("from_mask_regions.csv", index=False)

Now if you need to load this .csv into a regions object we can again use read_regions_csv since it takes in both file locations (Path/str objects) and Pandas DataFrames:

# Create another new regions2d object
from_csv_regions = er.read_regions_csv("from_mask_regions.csv")

As another sanity check, let’s check if these dataframes are equal (with index reset):

try:
    assert_frame_equal(from_mask_regions.data.reset_index(drop=True), from_csv_regions.data.reset_index(drop=True))
    print("The two DataFrames are equal.")
except AssertionError:
    print("The two DataFrames are not equal.")
The two DataFrames are equal.