Loading and Analyzing Argo Float Data

Last updated on 2026-02-06 | Edit this page

Estimated time: 30 minutes

Overview

Questions

Objectives

“Explain what a library is and what libraries are used for.”
“Import a Python library and use the functions it contains.”
“Select individual values from data.”
“Perform operations on arrays of data.”

Words are useful, but what’s more useful are the sentences and stories we build with them. Similarly, while a lot of powerful, general tools are built into Python, specialized tools built up from these basic units live in libraries that can be called upon when needed.

Loading data with ArgoPy

All of the data recorded by Argo floats is sent to a Data Assembly Centre (DAC). After some checks of the data have been made it is sent to a Global Data Assembly Centre (GDAC). There are two of these, one in the USA and one in France, but they both hold a copy of all of the Argo data ever received. To make accessing the data easy from Python a special library called argopy has been developed. This can load data directly from one of the GDACs and turn it into a Numpy array. This saves us having to search through the GDAC, picking the data we want and downloading it to a file on our computer.

To tell Python that we’d like to start using argopy, we need to import it:

PYTHON

import argopy

Importing a library is like getting a piece of lab equipment out of a storage locker and setting it up on the bench. Libraries provide additional functionality to the basic Python package, much like a new piece of equipment adds functionality to a lab space. Just like in the lab, importing too many libraries can sometimes complicate and slow down your programs - so we only import what we need for each program.

The argopy library has a lot of different features, but we want to use the DataFetcher function which gets data from a GDAC. The ArgoDataFetcher will return something called a class that has more functions we can call. One of these is called profile and that gets an individual profile given a float number and a profile number. We’re going to look at data from profile 12 of float number 6902746.

PYTHON

argopy.DataFetcher().profile(6902746, 12)

The expression argopy.DataFetcher().profile(....) is a function call that asks Python to run the function profile which belongs to the DataFetcher class which, in turn, belongs to the argopy library. The dot notation in Python is used most of all as an object attribute/property specifier or for invoking its method. object.property will give you the object.property value, object_name.method() will invoke an object_name method.

Callout

Functions, Parameters and Return Values

In the last episode we looked at using the print and type functions which are built into Python.
We “call” a function by writing its name followed by a (, then we can give the values of any parameters that the function might need. If there is more than one of these we separate each of them with a comma. Finally we write a closing ) to end the function call.

PYTHON

function_name(first_parameter, second_parameter)

Parameters have to be given in the order the function expects them. Alternatively we can put a name in front of each paraemter followed by an = sign and the parameter value or the name of the variable we are sending.

PYTHON

function_name(parameter_name=first_parameter_value)

Functions can also send data back to the code which called them, this is known as “returning” data from a function.
We can save this return data into a variable to use it again later. If we don’t save it into a variable then its value is displayed on the screen.

PYTHON

my_variable = function_name(first_parameter)

When we import a library like argopy more functions become available to us.

argopy.DataFetcher().profile has two parameters: a float number and a profile number.

If we run the profile function with the float number and profile number we get back a datafetcher.erddap object.

OUTPUT

<datafetcher.erddap>
Name: Ifremer erddap Argo data fetcher for floats
API: https://erddap.ifremer.fr/erddap/
Domain: phy;WMO6902746
Performances: cache=False, parallel=False
User mode: standard
Dataset: phy

This doesn’t contain much useful data, although it does tell us which GDAC supplied the data. To get the actual data we need to call yet another function that the datafetcher.erdapp object provides called to_xarray. This gets the data ready for processing using another library called Xarray, which works well with array based data but is very good at working with really big datasets.

PYTHON

argopy.DataFetcher().profile(6902746, 12).to_xarray()

Now we get a lot more information including a list of what data variables this float has.

Callout

Not All Functions Have Input

Generally, a function uses inputs to produce outputs. However, some functions produce outputs without needing any input. These functions don’t need any parameters, so we just write () after the function name.

PYTHON

function_name()

For example, checking the current time doesn’t require any input.

PYTHON

import time
print(time.ctime())

OUTPUT

Sat Mar 26 13:07:33 2016

We still need parentheses (()) to tell Python to go and do something for us.

To get one of those we add its name to the end of the command; for example, to get temperature we add .TEMP.

PYTHON

argopy.DataFetcher().profile(6902746, 12).to_xarray().TEMP

Now, to access just the array of temperature values, we add .values on the end (note that there’s no brackets on this as this is a variable name not a function).

PYTHON

argopy.DataFetcher().profile(6902746, 12).to_xarray().TEMP.values

Since we haven’t told it to do anything else with the function’s output, the notebook displays it. In this case, that output is the data we just loaded.

Our call to argopy.DataFetcher().profile read our file but didn’t save the data in memory. To do that, we need to assign the array to a variable. In a similar manner to how we assign a single value to a variable, we can also assign an array of values to a variable using the same syntax. Let’s capture this into a variable called temp_data.

PYTHON

temp_data=argopy.DataFetcher().profile(6902746, 12).to_xarray().TEMP.values

This statement doesn’t produce any output because we’ve assigned the output to the variable temp_data. If we want to check that the data have been loaded, we can print the variable’s value:

PYTHON

print(temp_data)

Now we have a an array with our temperature data.

Let’s check its type.

PYTHON

print(type(temp_data))

OUTPUT

<class 'numpy.ndarray'>

The output tells us that temp_data currently refers to a NumPy array, the functionality for which is provided by the NumPy library. (The type of argopy.DataFetcher().profile(6902746, 12).to_xarray().TEMP is xarray.DataArray.)

NumPy, like argopy is a Python library. It stands for Numerical Python. In general, you should use this library when you want to do fancy things with lots of numbers, especially if you have matrices or arrays.

These data correspond to Argo float data. Each row represents one reading and the columns are the different data values.

Callout

Data Type

A Numpy array contains one or more elements of the same type. The type function will only tell you that a variable is a NumPy array but won’t tell you the type of thing inside the array. We can find out the type of the data contained in the NumPy array.

PYTHON

print(temp_data.dtype)

OUTPUT

float32

This tells us that the NumPy array’s elements are floating-point numbers.

With the following command, we can see the array’s shape:

PYTHON

print(temp_data.shape)

OUTPUT

(108,)

The output tells us that the temp_data array variable contains 108 elements in a 1D array.

If we want to get a single number from the array, we must provide an index in square brackets after the variable name, just as we do in math when referring to an element of a matrix.

PYTHON

print('first temperature value:', temp_data[0])

OUTPUT

first temperature value: 28.898

PYTHON

print('middle temperature value:', temp_data[53])

OUTPUT

middle temperature value: 9.876

The expression temp_data[53] accesses the 54th element, not the 53rd as you might think. Programming languages like Fortran, MATLAB and R start counting at 1 because that’s what human beings have done for thousands of years. Languages in the C family (including C++, Java, Perl, and Python) count from 0 because it represents an offset from the first value in the array (the second value is offset by one index from the first value). This is closer to the way that computers represent arrays (if you are interested in the historical reasons behind counting indices from zero, you can read Mike Hoye’s blog post).

Challenge

Explore the data

If you haven’t already, write some Python code to load in the data from profile 12 of float number 6902746. What is the last salinity value? (Salinity is called PSAL, as temperature was called TEMP.)

Show me the solution

PYTHON

import argopy
sal_data = argopy.DataFetcher().profile(6902746, 12).to_xarray().PSAL.values
#there are 108 elements to the data, so we use 107 as our index because we started from 0
print(sal_data[107])

OUTPUT

34.988

Analyzing data

NumPy has several useful functions that take an array as input to perform operations on its values. If we want to find the average of all our Argo float data, for example, we can ask NumPy to compute temp_data’s mean value:

PYTHON

import numpy
print(numpy.mean(temp_data))

OUTPUT

13.058639

mean is a function that takes an array as an argument.

Let’s use two other NumPy functions to get some descriptive values about the temperature range.

PYTHON

maxval = numpy.max(temp_data)
minval = numpy.min(temp_data)

print('Max temperature:', maxval)
print('Min temperature:', minval)

Here we’ve assigned the return value from numpy.max(data) to the variable maxval and the value from numpy.min(data) to minval. Note that we used maxval, rather than just max - it’s not good practice to use variable names that are the same as Python keywords or fuction names.

OUTPUT

Max temperature: 28.907
Min temperature: 3.693

Callout

Getting help on functions

How did we know what functions NumPy has and how to use them? If you are working in IPython or in a Jupyter Notebook, there is an easy way to find out. If you type the name of something followed by a dot, then you can use tab completion (e.g. type numpy. and then press Tab) to see a list of all functions and attributes that you can use. After selecting one, you can also add a question mark (e.g. numpy.abs?), and IPython will return an explanation of the method! This is the same as doing help(numpy.abs).

Challenge

Find the temperature range for an Arctic float

The float 5906983 has been deployed in the Arctic by NOC for the MetOffice. You can see a map of where it’s been at https://fleetmonitoring.euro-argo.eu/float/5906983.

Adapt the code above to load profile number 33 from float 5906983. Calculate it’s minimum, maximum, mean and median temperature.

We haven’t calculated median before, search on the internet or look at the NumPy documentation (https://numpy.org/devdocs/reference/routines.statistics.html) to find out how to calculate this.

Show me the solution

PYTHON

temperatures = argopy.DataFetcher().profile(5906983, 33).to_xarray().TEMP.values

maxval = numpy.max(temperatures)
minval = numpy.min(temperatures)
meanval = numpy.mean(temperatures)
medianval = numpy.median(temperatures)

print('Max temperature:', maxval)
print('Min temperature:', minval)
print('Mean Temperature:', meanval)
print('Median Temperature:', medianval)

OUTPUT

Max temperature: 7.463699817657471
Min temperature: -0.6674000024795532
Mean Temperature: 2.9974963312872487
Median Temperature: 3.9305999279022217

Key Points

“Import a library into a program using import libraryname.”
“The argopy library can load Argo float data over the internet from the GDAC”
“Use the numpy library to work with arrays in Python.”
“The expression array.shape gives the shape of an array.”
“Use array[x] to select a single element from a 1D array.”
“Array indices start at 0, not 1.”
“Use numpy.mean(array), numpy.max(array), and numpy.min(array) to calculate simple statistics.”