Exploring Echoregions Lines Functionality#

This notebook parses bottom values from an Echoview .evl file and creates a bottom mask for the corresponding Echogram data.

Installation#

Prior to running this notebook and all other notebooks, make sure to pip install Echoregions and Echopype Plotting Library.

Install Using PyPi:

pip install echoregions

pip install echopype[plot]

Install Using Latest Github Main Branch Commit:

pip install git+https://github.com/OSOceanAcoustics/echoregions.git

pip install git+https://github.com/OSOceanAcoustics/echopype.git@plot

# Importing Packages
import matplotlib.pyplot as plt
import urllib.request
import shutil
import xarray as xr
import numpy as np
import pandas as pd
from pandas.testing import assert_frame_equal
from echopype.visualize.cm import cmap_d

import echoregions as er

Bottom Data Reading#

To start this tutorial, we first download evl data from Echoregions’ Github Repository and parse the .evl file using Echoregions’ read_evl function.

The parsing is based off of the .evl data description shown on Echoview’s website: Line Attributes.

# Set path to test data
TEST_DATA_PATH = 'https://raw.githubusercontent.com/OSOceanAcoustics/echoregions/contains_transect_zip/echoregions/test_data'

# Download example EVL File
urllib.request.urlretrieve(f"{TEST_DATA_PATH}/transect.evl","transect.evl")

# Read EVL file
lines = er.read_evl('transect.evl')

Lines as a DataFrame#

lines is a specialized object but it has a data attribute which is a simple dataframe.

# Grab lines dataframe
lines_df = lines.data
lines_df
file_name file_type evl_file_format_version echoview_version time depth status
0 transect.evl EVBD 3 13.0.378.44817 2019-07-02 18:39:41.321000 442.996834 3
1 transect.evl EVBD 3 13.0.378.44817 2019-07-02 18:39:42.679000 437.818405 3
2 transect.evl EVBD 3 13.0.378.44817 2019-07-02 18:39:44.031000 445.194735 1
3 transect.evl EVBD 3 13.0.378.44817 2019-07-02 18:39:45.380000 451.168987 3
4 transect.evl EVBD 3 13.0.378.44817 2019-07-02 18:39:46.728000 442.551006 3
... ... ... ... ... ... ... ...
3166 transect.evl EVBD 3 13.0.378.44817 2019-07-02 21:04:47.146000 760.707803 3
3167 transect.evl EVBD 3 13.0.378.44817 2019-07-02 21:04:47.147000 762.196532 3
3168 transect.evl EVBD 3 13.0.378.44817 2019-07-02 21:10:40.095000 766.613696 3
3169 transect.evl EVBD 3 13.0.378.44817 2019-07-02 21:10:40.096000 763.976879 3
3170 transect.evl EVBD 3 13.0.378.44817 2019-07-02 21:17:22.145000 764.867052 0

3171 rows × 7 columns

Note the rightmost column status. Status values are generally described by the following:

0 = none

1 = unverified

2 = bad

3 = good

The good and bad values are assigned via the specific EVL line picking formula used to generate the initial EVL file. Generally, we only want the rows with good/3 status.

More information on Echoview Status can be found here: Line Status.

Let’s now plot good points.

# Status 3 are good points so we select those
good_lines_df = lines_df[lines_df['status'] == '3']
good_bottom = good_lines_df[['time', 'depth']]

plt.plot(good_bottom['time'], good_bottom['depth'], 'black')
plt.xlabel('Ping Time')
plt.ylabel('Depth')
plt.gca().invert_yaxis()

plt.show()
_images/2cfaf4ab438d19477fef2e111657d12e7af6a35d2ccf6a49f143a5f66d14b4ec.png

For usage later on, set this good dataframe as the current line dataframe:

lines.data = good_lines_df

Echogram Data Reading and Plotting#

Let’s now download and plot an echogram created using Echopype. Echopype is a comprehensive software designed for parsing sonar backscatter and conducting scientific computations. It offers a wide array of functionalities, including the creation of echograms, which are visual representations formed by the echoes received from sonar signals. Additionally, Echopype seamlessly integrates with Echoregions, a specialized software for parsing echogram annotations. For now, we will primarily be working with the ds_Sv["Sv"] backscatter volume data variable, which when plotted becomes an echogram, but there are many more data variables that can be used when working with Echopype.

# Download example Echopype Sv Zarr File
urllib.request.urlretrieve(f"{TEST_DATA_PATH}/transect.zip","transect.zip")

# Extract the ZIP file
shutil.unpack_archive("transect.zip", "")

# Read volume backscattering strength (Sv) data
ds_Sv = xr.open_dataset('transect.zarr', engine="zarr")
ds_Sv["Sv"]
<xarray.DataArray 'Sv' (depth: 3955, ping_time: 1681, channel: 3)>
[19945065 values with dtype=float64]
Coordinates:
  * channel       (channel) <U37 'GPT  18 kHz 009072058c8d 1-1 ES18-11' ... '...
  * depth         (depth) float64 9.15 9.34 9.529 9.719 ... 758.1 758.3 758.5
  * ping_time     (ping_time) datetime64[ns] 2019-07-02T18:40:00 ... 2019-07-...
    range_sample  (depth) int64 ...
Attributes:
    actual_range:  [-156.0, 37.68]
    long_name:     Volume backscattering strength (Sv re 1 m-1)
    units:         dB
# Plot the 38 kHz backscatter channel
ds_Sv["Sv"].isel(channel=1).plot.pcolormesh(y="depth", yincrease=False, vmin=-70, vmax=-30, cmap=cmap_d["ek500"])
<matplotlib.collections.QuadMesh at 0x7f18c8c61ed0>
_images/40f2d5e2d98d00bd1fa1702c1a9cd717bdae8ab36a37ab9e68aefc1056087855.png

Plotting Echogram and Bottom#

From the two previous plots, one can kind of see how they’re related on both the depth and time dimensions. Now let’s see bottom annotations overlayed on top of the Echogram dataset.

# Plotting the Echogram data and the bottom
plt.figure(figsize = (20, 6))
plt.plot(lines.data['time'], lines.data['depth'],'black',fillstyle='full', markersize=1)
ds_Sv.Sv.isel(channel=1).T.plot.pcolormesh(y="depth", yincrease=False, vmin=-70, vmax=-30, cmap=cmap_d["ek500"])
<matplotlib.collections.QuadMesh at 0x7f18c378e110>
_images/85ab609f9cb589b111b765d23e9542cc7301b66b50d7909c14a27f4cbc942ef2.png

Masking Echogram and Bottom#

We can see that there is clear overlap between the two pieces of data here. However, for machine learning training purposes, or for biomass estimation, it is necessary to mask out the pixels below the bottom line.

The Lines class has a function to do this specific task: mask. The mask function interpolates the bottom points found in the DataFrame to generate new bottom points, and from these new points, generates an Xarray bottom mask. There are a few different Pandas interpolation schemes one can pass in, but for this example we stick with something simple slinear which refers to a first order spline.

# Use the built in mask function
bottom_mask_da, bottom_points = lines.bottom_mask(
    ds_Sv["Sv"].isel(channel=1).drop_vars("channel").compute(),
    operation="above_below",
    method="slinear",
    limit_area=None,
    limit_direction="both"
)

Note that selecting 1 channel here is to ensure that our output is a single channel mask. The number of channels of the data array that is passed into the mask will match the number of channels of the output bottom mask data array.

The output should be both the bottom mask itself and interpolated points that constitute the bottom mask.

Let’s take a look at the mask bottom_mask_da first:

bottom_mask_da
<xarray.DataArray (depth: 3955, ping_time: 1681)>
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [1, 1, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0]])
Coordinates:
  * depth         (depth) float64 9.15 9.34 9.529 9.719 ... 758.1 758.3 758.5
    range_sample  (depth) int64 0 3 4 5 6 7 8 ... 3951 3952 3953 3954 3955 3956
  * ping_time     (ping_time) datetime64[ns] 2019-07-02T18:40:00 ... 2019-07-...

The values in the bottom mask should also just be 1s (at and below bottom) and 0s (above bottom):

print("Unique Values in Bottom Mask:", np.unique(bottom_mask_da.data))
Unique Values in Bottom Mask: [0 1]

As a sanity check, let us check that the ping_time and depth dimensions in the bottom mask bottom_mask_da are equal to that in the backscatter data ds_Sv["Sv"]:

print(
    "Bottom Mask Ping Time Dimension Length:",
    len(bottom_mask_da["ping_time"])
)
print(
    "Bottom Mask Depth Dimension Length:",
    len(bottom_mask_da["depth"])
)
print(
    "Echogram Ping Time Dimension Length:",
    len(ds_Sv["Sv"]["ping_time"])
)
print(
    "Echogram Depth Dimension Length:",
    len(ds_Sv["Sv"]["depth"])
)
Bottom Mask Ping Time Dimension Length: 1681
Bottom Mask Depth Dimension Length: 3955
Echogram Ping Time Dimension Length: 1681
Echogram Depth Dimension Length: 3955

A plot of the mask itself:

bottom_mask_da.plot(y="depth", yincrease=False)
<matplotlib.collections.QuadMesh at 0x7f18c50d7520>
_images/befdb421617143356ecae31350a506592bfddfa5a6db02805fd9a47d614dfb92.png

A plot of the 38 kHz channel where the mask is 1:

# Get only 38 kHz channel values where the mask is 1
mask_exists_Sv = xr.where(
    bottom_mask_da == 1,
    ds_Sv["Sv"].isel(channel=1),
    np.nan,
)

# Plot the masked Sv
mask_exists_Sv.plot.pcolormesh(y="depth", yincrease=False, vmin=-70, vmax=-30, cmap=cmap_d["ek500"])
<matplotlib.collections.QuadMesh at 0x7f18c5146e60>
_images/58a71d4b64b98ba20412cee1267ac75bb3e3b8703d23fc217b62b554a3c257e6.png

Let’s now take a look at the interpolated points bottom_points:

Let’s compare its number of rows to the number of rows from the lines dataframe it was created from.

print("Interpolated Rows:", len(bottom_points))
print("Original Rows:", len(lines.data))
Interpolated Rows: 1681
Original Rows: 2756

Why are there more rows in the original rows than there are in the intepolated rows? It’s because the interpolated bottom data has to match the ping time dimension of the passed in dataset.

Let’s now look at the Echogram ping time dimension length:

print("Echogram Ping Time Dimension Length:", len(ds_Sv["ping_time"]))
Echogram Ping Time Dimension Length: 1681

Let’s now plot original bottom points vs interpolated points:

# Plot both original and interpolated
plt.subplot(1, 2, 1)
plt.plot(lines.data['time'], lines.data['depth'], 'r.', markersize=0.5)
plt.gca().invert_yaxis()
plt.title('Original')
plt.xticks(rotation=90)
plt.xlabel('Ping Time')
plt.ylabel('Depth')
plt.subplot(1, 2, 2)
plt.plot(bottom_points['time'], bottom_points['depth'], 'b.', markersize=0.5)
plt.gca().invert_yaxis()
plt.title('Interpolated: First Order Spline')
plt.xticks(rotation=90)
plt.xlabel('Ping Time')
plt.ylabel('Depth')

# Adjust layout
plt.tight_layout()

# Show the plots
plt.show()
_images/4ad555e01a35a17e55dafd084ecf683687107c636aca1cede016b946415e2c5e.png

Let’s now look at a 30 minute subset of original and interpolated data between 2019-07-02 19:25:00 and 2019-07-02 19:55:00:

# Filter the DataFrames based on time range
start_time = pd.to_datetime('2019-07-02 19:25:00')
end_time = pd.to_datetime('2019-07-02 19:55:00')
subset_lines_data = lines.data[(lines.data['time'] >= start_time) & (lines.data['time'] <= end_time)]
subset_bottom_points = bottom_points[(bottom_points['time'] >= start_time) & (bottom_points['time'] <= end_time)]

# Plot both original and interpolated
plt.subplot(1, 2, 1)
plt.plot(subset_lines_data['time'], subset_lines_data['depth'], 'r.', markersize=5)
plt.gca().invert_yaxis()
plt.title('Subset Original')
plt.xticks(rotation=90)
plt.xlabel('Ping Time')
plt.ylabel('Depth')
plt.subplot(1, 2, 2)
plt.plot(subset_bottom_points['time'], subset_bottom_points['depth'], 'b.', markersize=5)
plt.gca().invert_yaxis()
plt.title('Subset Interpolated: First Order Spline')
plt.xticks(rotation=90)
plt.xlabel('Ping Time')
plt.ylabel('Depth')

# Adjust layout
plt.tight_layout()

# Show the plots
plt.show()
_images/435e61901e20482366291878507ada1b79d54d70039c86014085be2c96e46537.png

Note that ds_Sv is the entire dataset containing many data variables that Echoregions does not work with, and so passing in the entire dataset to Echoregions lines.mask will produce an error since it expects a single data array ds_Sv["Sv"]:

# Test ds_Sv
lines.bottom_mask(
    ds_Sv,
    method="slinear",
    limit_area=None,
    limit_direction="both"
)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[21], line 2
      1 # Test ds_Sv
----> 2 lines.mask(
      3     ds_Sv,
      4     method="slinear",
      5     limit_area=None,
      6     limit_direction="both"
      7 )

File ~/echoregions/echoregions/lines/lines.py:193, in Lines.mask(self, da_Sv, **kwargs)
    164 """
    165 Subsets a bottom dataset to the range of an Sv dataset. Create a mask of
    166 the same shape as data found in the Echogram object:
   (...)
    189 from each other.
    190 """
    192 if not isinstance(da_Sv, DataArray):
--> 193     raise TypeError(
    194         f"Input da_Sv must be of type DataArray. da_Sv was instead of type {type(da_Sv)}"
    195     )
    197 def filter_bottom(bottom, start_date, end_date):
    198     """
    199     Selects the values of the bottom between two dates.
    200     """

TypeError: Input da_Sv must be of type DataArray. da_Sv was instead of type <class 'xarray.core.dataset.Dataset'>

Saving to “.csv” and Reading From “.csv”#

So now that we have our mask and our new interpolated bottom points, how do we save them?

We can use the Echoregions read_lines_csv function to first load it onto a lines object and use the lines object’s to_csv function to save the lines dataframe as a .csv.

# Create new lines object
from_mask_lines = er.read_lines_csv(bottom_points)

# Save to .csv
from_mask_lines.to_csv("from_mask_lines.csv")

Now if you need to load this .csv into a lines object we can again use read_lines_csv since it takes in both file locations (Path/str objects) and Pandas DataFrames:

# Create another new lines object
from_csv_lines = er.read_lines_csv("from_mask_lines.csv")
from_csv_lines.data = from_csv_lines.data.drop("Unnamed: 0", axis=1) # TODO: Fix this need to drop

Now let’s check if these dataframes are equal:

try:
    assert_frame_equal(from_mask_lines.data, from_csv_lines.data)
    print("The two DataFrames are equal.")
except AssertionError:
    print("The two DataFrames are not equal.")
The two DataFrames are equal.