API

Parameters

class roocs_utils.parameter.area_parameter.AreaParameter(input)[source]

Bases: _BaseParameter

Class for area parameter used in subsetting operation.

Area can be input as:
A string of comma separated values: “0.,49.,10.,65”
A sequence of strings: (“0”, “-10”, “120”, “40”)
A sequence of numbers: [0, 49.5, 10, 65]

An area must have 4 values.

Validates the area input and parses the values into numbers.

allowed_input_types = [<class 'collections.abc.Sequence'>, <class 'str'>, <class 'roocs_utils.parameter.param_utils.Series'>, <class 'NoneType'>]
asdict()[source]

Returns a dictionary of the area values

class roocs_utils.parameter.collection_parameter.CollectionParameter(input)[source]

Bases: _BaseParameter

Class for collection parameter used in operations.

A collection can be input as:
A string of comma separated values: “cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga,cmip5.output1.MPI-M.MPI-ESM-LR.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga”
A sequence of strings: e.g. (“cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga”, “cmip5.output1.MPI-M.MPI-ESM-LR.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga”)
A sequence of roocs_utils.utils.file_utils.FileMapper objects

Validates the input and parses the items.

allowed_input_types = [<class 'collections.abc.Sequence'>, <class 'str'>, <class 'roocs_utils.parameter.param_utils.Series'>, <class 'roocs_utils.utils.file_utils.FileMapper'>]
class roocs_utils.parameter.level_parameter.LevelParameter(input)[source]

Bases: _BaseIntervalOrSeriesParameter

Class for level parameter used in subsetting operation.

Level can be input as:
A string of slash separated values: “1000/2000”
A sequence of strings: e.g. (“1000.50”, “2000.60”)
A sequence of numbers: e.g. (1000.50, 2000.60)

A level input must be 2 values.

If using a string input a trailing slash indicates you want to use the lowest/highest level of the dataset. e.g. “/2000” will subset from the lowest level in the dataset to 2000.

Validates the level input and parses the values into numbers.

asdict()[source]

Returns a dictionary of the level values

class roocs_utils.parameter.time_parameter.TimeParameter(input)[source]

Bases: _BaseIntervalOrSeriesParameter

Class for time parameter used in subsetting operation.

Time can be input as:
A string of slash separated values: “2085-01-01T12:00:00Z/2120-12-30T12:00:00Z”
A sequence of strings: e.g. (“2085-01-01T12:00:00Z”, “2120-12-30T12:00:00Z”)

A time input must be 2 values.

If using a string input a trailing slash indicates you want to use the earliest/ latest time of the dataset. e.g. “2085-01-01T12:00:00Z/” will subset from 01/01/2085 to the final time in the dataset.

Validates the times input and parses the values into isoformat.

asdict()[source]

Returns a dictionary of the time values

get_bounds()[source]

Returns a tuple of the (start, end) times, calculated from the value of the parameter. Either will default to None.

class roocs_utils.parameter.time_components_parameter.TimeComponentsParameter(input)[source]

Bases: _BaseParameter

Class for time components parameter used in subsetting operation.

The Time Components are any, or none of:
  • year: [list of years]

  • month: [list of months]

  • day: [list of days]

  • hour: [list of hours]

  • minute: [list of minutes]

  • second: [list of seconds]

month is special: you can use either strings or values:

“feb”, “mar” == 2, 3 == “02,03”

Validates the times input and parses them into a dictionary.

allowed_input_types = [<class 'dict'>, <class 'str'>, <class 'roocs_utils.parameter.param_utils.TimeComponents'>, <class 'NoneType'>]
asdict()[source]
get_bounds()[source]

Returns a tuple of the (start, end) times, calculated from the value of the parameter. Either will default to None.

class roocs_utils.parameter.dimension_parameter.DimensionParameter(input)[source]

Bases: _BaseParameter

Class for dimensions parameter used in averaging operation.

Area can be input as:
A string of comma separated values: “time,latitude,longitude”
A sequence of strings: (“time”, “longitude”)

Dimensions can be None or any number of options from time, latitude, longitude and level provided these exist in the dataset being operated on.

Validates the dims input and parses the values into a sequence of strings.

allowed_input_types = [<class 'collections.abc.Sequence'>, <class 'str'>, <class 'roocs_utils.parameter.param_utils.Series'>, <class 'NoneType'>]
asdict()[source]

Returns a dictionary of the dimensions

class roocs_utils.parameter.param_utils.Interval(*data)[source]

Bases: object

A simple class for handling an interval of any type. It holds a start and end but does not try to resolve the range, it is just a container to be used by other tools. The contents can be of any type, such as datetimes, strings etc.

class roocs_utils.parameter.param_utils.Series(*data)[source]

Bases: object

A simple class for handling a series selection, created by any sequence as input. It has a value that holds the sequence as a list.

class roocs_utils.parameter.param_utils.TimeComponents(year=None, month=None, day=None, hour=None, minute=None, second=None)[source]

Bases: object

A simple class for parsing and representing a set of time components. The components are stored in a dictionary of {time_comp: values}, such as:

{“year”: [2000, 2001], “month”: [1, 2, 3]}

Note that you can provide month strings as strings or numbers, e.g.:

“feb”, “Feb”, “February”, 2

roocs_utils.parameter.param_utils.area

alias of Series

roocs_utils.parameter.param_utils.collection

alias of Series

roocs_utils.parameter.param_utils.dimensions

alias of Series

roocs_utils.parameter.param_utils.interval

alias of Interval

roocs_utils.parameter.param_utils.level_interval

alias of Interval

roocs_utils.parameter.param_utils.level_series

alias of Series

roocs_utils.parameter.param_utils.parse_datetime(dt, defaults=None)[source]

Parses string to datetime and returns isoformat string for it. If defaults is set, use that in case dt is None.

roocs_utils.parameter.param_utils.parse_range(x, caller)[source]
roocs_utils.parameter.param_utils.parse_sequence(x, caller)[source]
roocs_utils.parameter.param_utils.series

alias of Series

roocs_utils.parameter.param_utils.string_to_dict(s, splitters=('|', ':', ','))[source]

Convert a string to a dictionary of dictionaries, based on splitting rules: splitters.

roocs_utils.parameter.param_utils.time_components

alias of TimeComponents

roocs_utils.parameter.param_utils.time_interval

alias of Interval

roocs_utils.parameter.param_utils.time_series

alias of Series

roocs_utils.parameter.param_utils.to_float(i, allow_none=True)[source]
roocs_utils.parameter.parameterise.parameterise(collection=None, area=None, level=None, time=None, time_components=None)[source]

Parameterises inputs to instances of parameter classes which allows them to be used throughout roocs. For supported formats for each input please see their individual classes.

Parameters:
  • collection – Collection input in any supported format.

  • area – Area input in any supported format.

  • level – Level input in any supported format.

  • time – Time input in any supported format.

  • time_components – Time Components input in any supported format.

Returns:

Parameters as instances of their respective classes.

Project Utils

class roocs_utils.project_utils.DatasetMapper(dset, project=None, force=False)[source]

Bases: object

Class to map to data path, dataset ID and files from any dataset input.

dset must be a string and can be input as:
A dataset ID: e.g. “cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga”
A file path: e.g. “/badc/cmip5/data/cmip5/output1/MOHC/HadGEM2-ES/rcp85/mon/atmos/Amon/r1i1p1/latest/tas/tas_Amon_HadGEM2-ES_rcp85_r1i1p1_200512-203011.nc”
A path to a group of files: e.g. “/badc/cmip5/data/cmip5/output1/MOHC/HadGEM2-ES/rcp85/mon/atmos/Amon/r1i1p1/latest/tas/*.nc”
A directory e.g. “/badc/cmip5/data/cmip5/output1/MOHC/HadGEM2-ES/rcp85/mon/atmos/Amon/r1i1p1/latest/tas”
An instance of the FileMapper class (that represents a set of files within a single directory)

When force=True, if the project can not be identified, any attempt to use the base_dir of a project to resolve the data path will be ignored. Any of data_path, ds_id and files that can be set, will be set.

SUPPORTED_EXTENSIONS = ('.nc', '.gz')
property base_dir

The base directory of the input dataset.

property data_path

Dataset input converted to a data path.

property ds_id

Dataset input converted to a ds id.

property files

The files found from the input dataset.

property project

The project of the dataset input.

property raw

Raw dataset input.

roocs_utils.project_utils.datapath_to_dsid(datapath)[source]

Switches from dataset path to ds id.

Parameters:

datapath – dataset path.

Returns:

dataset id of input dataset path.

roocs_utils.project_utils.derive_ds_id(dset)[source]

Derives the dataset id of the provided dset.

Parameters:

dset – dset input of type described by DatasetMapper.

Returns:

ds id of input dataset.

roocs_utils.project_utils.derive_dset(dset)[source]

Derives the dataset path of the provided dset.

Parameters:

dset – dset input of type described by DatasetMapper.

Returns:

dataset path of input dataset.

roocs_utils.project_utils.dset_to_filepaths(dset, force=False)[source]

Gets filepaths deduced from input dset.

Parameters:
  • dset – dset input of type described by DatasetMapper.

  • force – When True and if the project of the input dset cannot be identified, DatasetMapper will attempt to find the files anyway. Default is False.

Returns:

File paths deduced from input dataset.

roocs_utils.project_utils.dsid_to_datapath(dsid)[source]

Switches from ds id to dataset path.

Parameters:

dsid – dataset id.

Returns:

dataset path of input dataset id.

roocs_utils.project_utils.get_data_node_dirs_dict()[source]

Get a dictionary of the data node roots used for retreiving original files.

roocs_utils.project_utils.get_facet(facet_name, facets, project)[source]

Get facet from project config

roocs_utils.project_utils.get_project_base_dir(project)[source]

Get the base directory of a project from the config.

roocs_utils.project_utils.get_project_from_data_node_root(url)[source]

Identify the project from data node root by identifyng the data node root in the input url.

roocs_utils.project_utils.get_project_from_ds(ds)[source]

Gets the project from an xarray Dataset/DataArray.

Parameters:

ds – xarray Dataset/DataArray.

Returns:

The project derived from the input dataset.

roocs_utils.project_utils.get_project_name(dset)[source]

Gets the project from an input dset.

Parameters:

dset – dset input of type described by DatasetMapper.

Returns:

The project derived from the input dataset.

roocs_utils.project_utils.get_projects()[source]

Gets all the projects available in the config.

roocs_utils.project_utils.map_facet(facet, project)[source]

Return mapped facet value from config or facet name if not found.

roocs_utils.project_utils.switch_dset(dset)[source]

Switches between dataset path and ds id.

Parameters:

dset – either dataset path or dataset ID.

Returns:

either dataset path or dataset ID - switched from the input.

roocs_utils.project_utils.url_to_file_path(url)[source]

Convert input url of an original file to a file path

Xarray Utils

roocs_utils.xarray_utils.xarray_utils.convert_coord_to_axis(coord)[source]

Converts coordinate type to its single character axis identifier (tzyx).

Parameters:

coord – (str) The coordinate to convert.

Returns:

(str) The single character axis identifier of the coordinate (tzyx).

roocs_utils.xarray_utils.xarray_utils.get_coord_by_attr(ds, attr, value)[source]

Returns a coordinate based on a known attribute of a coordinate.

Parameters:
  • ds – Xarray Dataset or DataArray

  • attr – (str) Name of attribute to look for.

  • value – Expected value of attribute you are looking for.

Returns:

Coordinate of xarray dataset if found.

roocs_utils.xarray_utils.xarray_utils.get_coord_by_type(ds, coord_type, ignore_aux_coords=True)[source]

Returns the xarray Dataset or DataArray coordinate of the specified type.

Parameters:
  • ds – xarray Dataset or DataArray

  • coord_type – (str) Coordinate type to find.

  • ignore_aux_coords – (bool) If True then coordinates that are not dimensions are ignored. Default is True.

Returns:

Xarray Dataset coordinate (ds.coords[coord_id])

roocs_utils.xarray_utils.xarray_utils.get_coord_type(coord)[source]

Gets the coordinate type.

Parameters:

coord – coordinate of xarray dataset e.g. coord = ds.coords[coord_id]

Returns:

The type of coordinate as a string. Either longitude, latitude, time, level or None

roocs_utils.xarray_utils.xarray_utils.get_main_variable(ds, exclude_common_coords=True)[source]

Finds the main variable of an xarray Dataset

Parameters:
  • ds – xarray Dataset

  • exclude_common_coords – (bool) If True then common coordinates are excluded from the search for the main variable. common coordinates are time, level, latitude, longitude and bounds. Default is True.

Returns:

(str) The main variable of the dataset e.g. ‘tas’

roocs_utils.xarray_utils.xarray_utils.is_kerchunk_file(dset)[source]

Returns a boolean based on reading the file extension.

roocs_utils.xarray_utils.xarray_utils.is_latitude(coord)[source]

Determines if a coordinate is latitude.

Parameters:

coord – coordinate of xarray dataset e.g. coord = ds.coords[coord_id]

Returns:

(bool) True if the coordinate is latitude.

roocs_utils.xarray_utils.xarray_utils.is_level(coord)[source]

Determines if a coordinate is level.

Parameters:

coord – coordinate of xarray dataset e.g. coord = ds.coords[coord_id]

Returns:

(bool) True if the coordinate is level.

roocs_utils.xarray_utils.xarray_utils.is_longitude(coord)[source]

Determines if a coordinate is longitude.

Parameters:

coord – coordinate of xarray dataset e.g. coord = ds.coords[coord_id]

Returns:

(bool) True if the coordinate is longitude.

roocs_utils.xarray_utils.xarray_utils.is_realization(coord)[source]

Determines if a coordinate is realization.

Parameters:

coord – coordinate of xarray dataset e.g. coord = ds.coords[coord_id]

Returns:

(bool) True if the coordinate is longitude.

roocs_utils.xarray_utils.xarray_utils.is_time(coord)[source]

Determines if a coordinate is time.

Parameters:

coord – coordinate of xarray dataset e.g. coord = ds.coords[coord_id]

Returns:

(bool) True if the coordinate is time.

roocs_utils.xarray_utils.xarray_utils.open_xr_dataset(dset, **kwargs)[source]

Opens an xarray dataset from a dataset input.

Parameters:
  • dset – (Str or Path) ds_id, directory path or file path ending in *.nc.

  • kwargs – Any further keyword arguments to include when opening the dataset. use_cftime=True and decode_timedelta=False are used by default, along with combine=”by_coords” for open_mfdataset only.

Any list will be interpreted as list of files

Other utilities

roocs_utils.utils.common.parse_size(size)[source]

Parse size string into number of bytes.

Parameters:

size – (str) size to parse in any unit

Returns:

(int) number of bytes

class roocs_utils.utils.time_utils.AnyCalendarDateTime(year, month, day, hour, minute, second)[source]

Bases: object

A class to represent a datetime that could be of any calendar.

Has the ability to add and subtract a day from the input based on MAX_DAY, MIN_DAY, MAX_MONTH and MIN_MONTH

DAY_RANGE = range(1, 32)
HOUR_RANGE = range(0, 24)
MINUTE_RANGE = range(0, 60)
MONTH_RANGE = range(1, 13)
SECOND_RANGE = range(0, 60)
add_day()[source]

Add a day to the input datetime.

sub_day(n=1)[source]

Subtract a day to the input datetime.

validate_input(input, name, range)[source]
property value
roocs_utils.utils.time_utils.str_to_AnyCalendarDateTime(dt, defaults=None)[source]

Takes a string representing date/time and returns a DateTimeAnyTime object. String formats should start with Year and go through to Second, but you can miss out anything from month onwards.

Parameters:
  • dt – (str) string representing a date/time.

  • defaults – (list) The default values to use for year, month, day, hour, minute and second if they cannot be parsed from the string. A default value must be provided for each component. If defaults=None, [-1, 1, 1, 0, 0, 0] is used.

Returns:

AnyCalendarDateTime object

roocs_utils.utils.time_utils.to_isoformat(tm)[source]

Returns an ISO 8601 string from a time object (of different types).

Parameters:

tm – Time object

Returns:

(str) ISO 8601 time string

class roocs_utils.utils.file_utils.FileMapper(file_list, dirpath=None)[source]

Bases: object

Class to represent a set of files that exist in the same directory as one object.

Parameters:
  • file_list – the list of files to represent. If dirpath not providedm these should be full file paths.

  • dirpath – The directory path where the files exist. Default is None.

If dirpath is not provided it will be deduced from the file paths provided in file_list.

file_list

list of file names of the files represented.

file_paths

list of full file paths of the files represented.

dirpath

The directory path where the files exist. Either deduced or provided.

roocs_utils.utils.file_utils.is_file_list(coll)[source]

Checks whether a collection is a list of files.

Parameters:

(list) (coll) – collection to check.

Returns:

True if collection is a list of files, else returns False.