{ "cells": [ { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "import roocs_utils" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['AreaParameter',\n", " 'CONFIG',\n", " 'CollectionParameter',\n", " 'LevelParameter',\n", " 'TimeParameter',\n", " '__author__',\n", " '__builtins__',\n", " '__cached__',\n", " '__contact__',\n", " '__copyright__',\n", " '__doc__',\n", " '__file__',\n", " '__license__',\n", " '__loader__',\n", " '__name__',\n", " '__package__',\n", " '__path__',\n", " '__spec__',\n", " '__version__',\n", " 'area_parameter',\n", " 'base_parameter',\n", " 'collection_parameter',\n", " 'config',\n", " 'exceptions',\n", " 'get_config',\n", " 'level_parameter',\n", " 'parameter',\n", " 'parameterise',\n", " 'roocs_utils',\n", " 'time_parameter',\n", " 'utils',\n", " 'xarray_utils']" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dir(roocs_utils)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Parameters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Parameters classes are used to parse inputs of collection, area, time and level used as arguments in the subsetting operation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The area values can be input as:\n", "* A string of comma separated values: “0.,49.,10.,65” \n", "* A sequence of strings: (“0”, “-10”, “120”, “40”) \n", "* A sequence of numbers: [0, 49.5, 10, 65]" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'lon_bnds': (0.0, 10.0), 'lat_bnds': (49.0, 65.0)}\n", "(0.0, 49.0, 10.0, 65.0)\n" ] } ], "source": [ "area = roocs_utils.AreaParameter(\"0.,49.,10.,65\")\n", "\n", "# the lat/lon bounds can be returned in a dictionary\n", "print(area.asdict())\n", "\n", "# the values can be returned as a tuple\n", "print(area.tuple)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A collection can be input as \n", "* A string of comma separated values: “cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga,cmip5.output1.MPI-M.MPI-ESM-LR.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga” \n", "* A sequence of strings: e.g. (“cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga”,“cmip5.output1.MPI-M.MPI-ESM-LR.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga”)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga', 'cmip5.output1.MPI-M.MPI-ESM-LR.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga')\n" ] } ], "source": [ "collection = roocs_utils.CollectionParameter(\"cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga,cmip5.output1.MPI-M.MPI-ESM-LR.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga\")\n", "\n", "# the collection ids can be returned as a tuple\n", "print(collection.tuple)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Level can be input as:\n", "* A string of slash separated values: “1000/2000” \n", "* A sequence of strings: e.g. (“1000.50”, “2000.60”) A sequence of numbers: e.g. (1000.50, 2000.60)\n", "\n", "Level inputs should be a range of the levels you want to subset over" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'first_level': 1000.5, 'last_level': 2000.6}\n", "(1000.5, 2000.6)\n" ] } ], "source": [ "level = roocs_utils.LevelParameter((1000.50, 2000.60))\n", "\n", "# the first and last level in the range provided can be returned in a dictionary\n", "print(level.asdict())\n", "\n", "# the values can be returned as a tuple\n", "print(level.tuple)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Time can be input as:\n", "* A string of slash separated values: “2085-01-01T12:00:00Z/2120-12-30T12:00:00Z” \n", "* A sequence of strings: e.g. (“2085-01-01T12:00:00Z”, “2120-12-30T12:00:00Z”)\n", "\n", "Time inputs should be the start and end of the time range you want to subset over" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'start_time': '2085-01-01T12:00:00+00:00', 'end_time': '2120-12-30T12:00:00+00:00'}\n", "('2085-01-01T12:00:00+00:00', '2120-12-30T12:00:00+00:00')\n" ] } ], "source": [ "time = roocs_utils.TimeParameter(\"2085-01-01T12:00:00Z/2120-12-30T12:00:00Z\")\n", "\n", "# the first and last time in the range provided can be returned in a dictionary\n", "print(time.asdict())\n", "\n", "# the values can be returned as a tuple\n", "print(time.tuple)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Parameterise parameterises inputs to instances of parameter classes which allows them to be used throughout roocs." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'collection': Datasets to analyse:\n", " cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga,\n", " 'area': Area to subset over:\n", " (0.0, 49.0, 10.0, 65.0),\n", " 'level': Level range to subset over\n", " first_level: 1000.5\n", " last_level: 2000.6,\n", " 'time': Time period to subset over\n", " start time: 2085-01-01T12:00:00+00:00\n", " end time: 2120-12-30T12:00:00+00:00}" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "roocs_utils.parameter.parameterise(\"cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga\", \"0.,49.,10.,65\", (1000.50, 2000.60), \"2085-01-01T12:00:00Z/2120-12-30T12:00:00Z\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Xarray utils" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Xarray utils can bu used to identify the main variable in a dataset as well as idnetifying the type of a coordinate or returning a coordinate based on an attribute or a type" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "from roocs_utils.xarray_utils import xarray_utils as xu\n", "import xarray as xr" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "ds = xr.open_mfdataset(\"../tests/mini-esgf-data/test_data/badc/cmip5/data/cmip5/output1/MOHC/HadGEM2-ES/rcp85/mon/atmos/Amon/r1i1p1/latest/tas/*.nc\", use_cftime=True, combine=\"by_coords\")" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "main var = tas\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'tas' (time: 3530, lat: 2, lon: 2)>\n",
       "dask.array<concatenate, shape=(3530, 2, 2), dtype=float32, chunksize=(300, 2, 2), chunktype=numpy.ndarray>\n",
       "Coordinates:\n",
       "    height   float64 1.5\n",
       "  * lat      (lat) float64 -90.0 35.0\n",
       "  * lon      (lon) float64 0.0 187.5\n",
       "  * time     (time) object 2005-12-16 00:00:00 ... 2299-12-16 00:00:00\n",
       "Attributes:\n",
       "    standard_name:     air_temperature\n",
       "    long_name:         Near-Surface Air Temperature\n",
       "    comment:           near-surface (usually, 2 meter) air temperature.\n",
       "    units:             K\n",
       "    original_name:     mo: m01s03i236\n",
       "    cell_methods:      time: mean\n",
       "    cell_measures:     area: areacella\n",
       "    history:           2010-12-04T13:50:30Z altered by CMOR: Treated scalar d...\n",
       "    associated_files:  baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation...
" ], "text/plain": [ "\n", "dask.array\n", "Coordinates:\n", " height float64 1.5\n", " * lat (lat) float64 -90.0 35.0\n", " * lon (lon) float64 0.0 187.5\n", " * time (time) object 2005-12-16 00:00:00 ... 2299-12-16 00:00:00\n", "Attributes:\n", " standard_name: air_temperature\n", " long_name: Near-Surface Air Temperature\n", " comment: near-surface (usually, 2 meter) air temperature.\n", " units: K\n", " original_name: mo: m01s03i236\n", " cell_methods: time: mean\n", " cell_measures: area: areacella\n", " history: 2010-12-04T13:50:30Z altered by CMOR: Treated scalar d...\n", " associated_files: baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation..." ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# find the main variable of the dataset\n", "main_var = xu.get_main_variable(ds)\n", "\n", "print(\"main var =\", main_var)\n", "\n", "ds[main_var]" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "coord name = height \n", "coord type = level\n", "\n", "coord name = lat \n", "coord type = latitude\n", "\n", "coord name = lon \n", "coord type = longitude\n", "\n", "coord name = time \n", "coord type = time\n", "\n", " There is a level, time, latitude and longitude coordinate in this dataset\n" ] } ], "source": [ "# to get the coord types\n", "\n", "for coord in ds.coords:\n", " print(\"\\ncoord name =\", coord, \"\\ncoord type =\", xu.get_coord_type(ds[coord]))\n", " \n", "print(\"\\n There is a level, time, latitude and longitude coordinate in this dataset\") " ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "None\n" ] } ], "source": [ "# to check the type of a coord\n", "\n", "print(xu.is_level(ds[\"height\"]))\n", "print(xu.is_latitude(ds[\"lon\"]))" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "time = \n", "array([cftime.Datetime360Day(2005, 12, 16, 0, 0, 0, 0),\n", " cftime.Datetime360Day(2006, 1, 16, 0, 0, 0, 0),\n", " cftime.Datetime360Day(2006, 2, 16, 0, 0, 0, 0), ...,\n", " cftime.Datetime360Day(2299, 10, 16, 0, 0, 0, 0),\n", " cftime.Datetime360Day(2299, 11, 16, 0, 0, 0, 0),\n", " cftime.Datetime360Day(2299, 12, 16, 0, 0, 0, 0)], dtype=object)\n", "Coordinates:\n", " height float64 1.5\n", " * time (time) object 2005-12-16 00:00:00 ... 2299-12-16 00:00:00\n", "Attributes:\n", " bounds: time_bnds\n", " axis: T\n", " long_name: time\n", " standard_name: time\n", "\n", "level = \n", "array(1.5)\n", "Coordinates:\n", " height float64 1.5\n", "Attributes:\n", " units: m\n", " axis: Z\n", " positive: up\n", " long_name: height\n", " standard_name: height\n" ] } ], "source": [ "# to find a coordinate of a specific type\n", "\n", "print(\"time =\", xu.get_coord_by_type(ds, \"time\"))\n", "\n", "# to find the level coordinate,set ignore_aux_coords to False \n", "\n", "print(\"\\nlevel =\", xu.get_coord_by_type(ds, \"level\", ignore_aux_coords=False))" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.DataArray 'lat' (lat: 2)>\n",
       "array([-90.,  35.])\n",
       "Coordinates:\n",
       "    height   float64 1.5\n",
       "  * lat      (lat) float64 -90.0 35.0\n",
       "Attributes:\n",
       "    bounds:         lat_bnds\n",
       "    units:          degrees_north\n",
       "    axis:           Y\n",
       "    long_name:      latitude\n",
       "    standard_name:  latitude
" ], "text/plain": [ "\n", "array([-90., 35.])\n", "Coordinates:\n", " height float64 1.5\n", " * lat (lat) float64 -90.0 35.0\n", "Attributes:\n", " bounds: lat_bnds\n", " units: degrees_north\n", " axis: Y\n", " long_name: latitude\n", " standard_name: latitude" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# to find a coordinate based on an attribute you expect it to have\n", "\n", "xu.get_coord_by_attr(ds, \"standard_name\", \"latitude\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Other utilities" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Other utilities allow parsing a memory size of any unit into bytes and converting a time object into an ISO 8601 string" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "from roocs_utils.utils.common import parse_size\n", "from roocs_utils.utils.time_utils import to_isoformat\n", "from datetime import datetime" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "52428800.0" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# to parse a size into bytes\n", "size = '50MiB'\n", "size_in_b = parse_size(size)\n", "size_in_b" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'2005-07-14T12:30:00'" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# to convert a time object into a time string\n", "time = datetime(2005, 7, 14, 12, 30)\n", "time_str = to_isoformat(time)\n", "time_str" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 4 }