Queries#

Query#

class ocean_taco.dataset.queries.Query(lon_min, lon_max, lat_min, lat_max, time_start, time_end)[source]#

Bases: object

Spatial-temporal query specification.

Parameters:
  • lon_min (float)

  • lon_max (float)

  • lat_min (float)

  • lat_max (float)

  • time_start (str | Timestamp)

  • time_end (str | Timestamp)

lon_min: float#
lon_max: float#
lat_min: float#
lat_max: float#
time_start: str | Timestamp#
time_end: str | Timestamp#
property bbox: tuple[float, float, float, float]#
to_geoslice()[source]#

Convert to GeoSlice format for dataset indexing.

to_dict()[source]#
Return type:

dict

classmethod from_dict(d)[source]#
Return type:

Query

Parameters:

d (dict)

PatchSize#

class ocean_taco.dataset.queries.PatchSize(value, unit='deg')[source]#

Bases: object

Patch size with unit conversion support.

Parameters:
  • value (float)

  • unit (Literal['deg', 'km'])

value: float#
unit: Literal['deg', 'km'] = 'deg'#
to_degrees(center_lat=0.0)[source]#

Convert to (lon_degrees, lat_degrees) accounting for latitude.

Return type:

tuple[float, float]

Parameters:

center_lat (float)

to_km(center_lat=0.0)[source]#

Convert to approximate km.

Return type:

float

Parameters:

center_lat (float)

QueryGenerator#

class ocean_taco.dataset.queries.QueryGenerator(land_mask_path=None)[source]#

Bases: object

Generate queries for training (random) and evaluation (grid).

Parameters:

land_mask_path (str | Path | None)

generate_training_queries(n_queries, patch_size, date_range, bbox_constraint=(-180, 180, -60, 60), time_window_days=1, max_land_fraction=0.3, seed=42, oversample_factor=2.0, verbose=True, max_spatial_overlap=1.0)[source]#

Generate random training queries over ocean regions.

Parameters:
  • n_queries (int) – Number of queries to generate.

  • patch_size (PatchSize | float) – Spatial extent (PatchSize or degrees).

  • date_range (tuple[str, str]) – (start_date, end_date) strings.

  • bbox_constraint (tuple[float, float, float, float]) – Region to sample from (lon_min, lon_max, lat_min, lat_max).

  • time_window_days (int) – Temporal extent of each query.

  • max_land_fraction (float) – Maximum allowed land fraction (0-1).

  • seed (int) – Random seed for reproducibility.

  • oversample_factor (float) – Generate extra candidates to account for rejections.

  • verbose (bool) – Print progress.

  • max_spatial_overlap (float) – Maximum allowed IoU (0-1) with existing queries.

Return type:

list[Query]

Returns:

List of Query objects.

generate_eval_queries(bbox, patch_size, date_range, spatial_overlap=0.0, temporal_stride_days=1, time_window_days=1, max_land_fraction=0.5, verbose=True)[source]#

Generate systematic grid of evaluation queries.

Parameters:
  • bbox (tuple[float, float, float, float]) – Region to cover (lon_min, lon_max, lat_min, lat_max).

  • patch_size (PatchSize | float) – Spatial extent of each query.

  • date_range (tuple[str, str]) – (start_date, end_date) strings.

  • spatial_overlap (float) – Overlap fraction (0 = no overlap, 0.5 = 50% overlap).

  • temporal_stride_days (int) – Days between query start times.

  • time_window_days (int) – Temporal extent of each query.

  • max_land_fraction (float) – Skip patches with more land than this.

  • verbose (bool) – Print progress.

Return type:

list[Query]

Returns:

List of Query objects covering the region.

static save_queries(queries, path, metadata=None)[source]#

Save queries to parquet with JSON metadata.

Parameters:
  • queries (list[Query])

  • path (str | Path)

  • metadata (dict | None)

static load_queries(path)[source]#

Load queries from parquet file.

Return type:

tuple[list[Query], dict]

Parameters:

path (str | Path)