This lesson is in the early stages of development (Alpha version)

Requests

Overview

Teaching: 40 min
Exercises: 20 min
Questions
  • How can I send HTTP requests to a web server from Python?

  • How to interact with web services that require authentication?

  • What are the data formats that are used in HTTP messages?

Objectives
  • Use the Python requests library for GET and POST requests

  • Understand how to deal with common authentication mechanisms.

  • Understand what else the requests library can do for you.

So far, we have been interacting with web APIs by using curl to send HTTP requests and then inspecting the responses at the command line. This is very useful for running quick checks that we are able to access the API, and debugging if we’re not. However, to integrate web APIs into our software and analyses, we’d like to be able to make requests of web APIs from within Python, and work with the results.

In principle we could make subprocess calls to curl, and capture and parse the results, but this would be very cumbersome. Fortunately, other people thought the same thing, and have made libraries available to help with this. Basic functionality around making and processing requests is built into the Python standard library, but far more popular is to use a package called requests, which is available from PyPI.

First off, let’s check that we have requests installed.

$ python -c "import requests"

if you do not see any message, then requests is already installed. If on the other hand you see a message like

Traceback (most recent call last):
  File "<string>", line 1, in <module>\
ModuleNotFoundError: No module name 'requests'

then install requests from pip:

$ pip install requests

Recap: Requests, Responses and JSON

As a reminder, communication with web APIs is done through the HTTP protocol, and happens through messages, which are of two kinds: requests and responses.

A request is composed of a start line, a number of headers and an optional body.

Practically, a request needs to specify one of the HTTP verbs and a URL in the start line and an optional payload (the body).

A response is composed of a status line, a number of headers and an optional body.

The data to be transferred with the body of a request needs to be represented in some way. “Unstructured” text representations are used, e.g., to transmit CSV data. A popular text-based (ASCII) format to transmit data is the JavaScript Object Notation (JSON) format. The Python standard library includes a module to deal with JSON, for serialisation (i.e. representing Python objects as JSON strings):

import json
data = dict(a=1, b=dict(c=(2,3,4)))
representation = json.dumps(data)
representation
'{"a": 1, "b": {"c": [2, 3, 4]}}'

And for parsing (i.e. recovering python objects from their JSON string representation):

data_reparsed = json.loads(representation)
data_reparsed
{'a': 1, 'b': {'c': [2, 3, 4]}}

You can see that for dicts containing strings, integers, and lists, at least, the JSON representation looks very similar to the Python representation. The two are not always directly interchangeable, however.

The Python requests library can parse JSON and serialise the objects, so that you don’t have to deal with this aspect on your own.

Another ASCII format that is used with APIs is the eXtensible Markup Language (XML), which is much more complex to deal with than JSON. Facilities to deal with the XML format are in the xml.etree.ElementTree library.

Another markup language widely used in HTTP message bodies is the HyperText Markup Language, HTML.

HTTP verbs

Up until now we have exclusively used GET requests, to retrieve information from a server. In fact, the HTTP protocol has a number of such verbs, each associated with an operation falling in one of four categories: Create, Read, Update, or Delete (sometimes called the CRUD categories). The most common verbs are:

In this lesson we will focus on GET and POST requests only.

A GET request example

Let’s take the first example we looked at earlier, now with the Python requests library:

import requests
response = requests.get("http://carpentries.org")

requests gives us access to both the headers and the body of the response. Looking at the headers first, we can look at what type of data is in the body. As this is the URL of a website, we expect the reponse to contain a web page:

response.headers["Content-Type"]
text/html

Our expectations are confirmed. We can also check the Content-Length header to see how much data we expect to find in the body:

response.headers["Content-Length"]
29741

And, as expected, the length of the body of the response is the same:

len(response.text)
29741

We can look at the content of the body:

response.text

Another GET request example

APIs, like other pieces of code, need documentation. We’ve already seen some examples of API documentation, such as NASA’s API documentation.

One popular way of creating API documentation is by generating it from the API specification (essentially a means of providing metadata for the API). One of the specification languages you are likely to hear about is called OpenAPI, a specification language for HTTP APIs. This is a machine readable format, meaning that a lot of tooling has been developed around it for tasks like the generation of API documentation.

The documentation that can be generated from an OpenAPI description can be interactive, even allowing you to test API endpoints without ever leaving the documentation page. We’ll see an example of this in the exercise below, as well as another example of a GET request.

EDS Citation API using the documentation

Look at BODC’s EDS citation API. (See this page for background information.) Can you find out the total number of citation count for the Polar Data Centre (PDC), without leaving the page?

Solution

There are multiple ways to do this. One is to use the /centre endpoint. You could also use the /centre/{centre_name} endpoint, where centre_name is Polar Data Centre (PDC). To interact with the documentation on the page, click the Try it out button, enter any desired parameters, then click the Execute button.

EDS Citation API using the requests library

Can you now make exactly the same request but using the requests library, rather than just using the interactive documentation?

Solution

For example:

import requests
response = requests.get("https://www.bodc.ac.uk/eds-citation/centre")
response.json()

GET with parameters

As we have seen when talking about curl, some endpoints accept parameters in GET requests. Using Python’s requests library, the call to NASA’s APOD endoint that we previously made

$ curl -i "https://api.nasa.gov/planetary/apod?date=2005-04-01&api_key=<your-api-key>"

can be expressed in a more human-friendly format:

response = requests.get(url="https://api.nasa.gov/planetary/apod",
                        params={"date":"2005-04-01",
                                "api_key":"<your-api-key>"})

using a dictionary to contain all the arguments.

Get a list of GitHub repositories

The CDT-AIMLAC GitHub organisation (cdt-aimlac) has a number of repositories. Using the official API documentation of GitHub, can you list their name, ordered in ascending order by last updated time? (Look at the examples in the documentation!)

Solution

The url to use is https://api.github.com/orgs/cdt-aimlac/repos. In addition to that, we need to use the parameters sort with value updated and direction with value asc.

response = requests.get(url="https://api.github.com/orgs/cdt-aimlac/repos",
                        params={'sort':'updated',
                                'direction':'asc'})
response
<Response [200]> 

Once we verify that there are no errors, we can extract the data, which is available via the json() method:

for repo in response.json():
   print(repo["name"], ':', repo["updated_at"]) 
testing_exercise : 2020-04-28T13:56:42Z
docker-introduction-2021 : 2021-01-26T19:20:19Z
grid : 2021-03-10T11:59:09Z
training-cloud-vm : 2021-03-23T13:43:03Z
ccintro-2021 : 2021-09-21T13:57:35Z
git-novice : 2021-11-24T10:21:58Z
docker-introduction-2022 : 2022-01-24T17:31:39Z
blogs : 2022-09-07T15:56:33Z
ccintro-2022 : 2022-09-15T15:51:29Z
aber-pubs : 2022-11-23T13:41:57Z
agile_snails_coding_challenge : 2022-11-23T15:45:05Z
team_7564616d_models : 2022-11-23T15:45:42Z
coding-challenge-2022_23-task1 : 2023-02-08T17:02:07Z
pl_curves : 2023-03-29T22:38:10Z
ccintro-2023 : 2023-09-18T13:58:06Z
marketintro-2023 : 2023-11-16T17:12:00Z

Another GET request with parameters example - Open-Meteo API

As an additional example of using requests to connect to an API rather than a plain web site we’ll use the Open-Meteo API. (This is free for non-commercial use and does not require an API key.)

Looking at the documentation for this API (specifically, the API URL under the API Response section), we can build a URL to access a temperature forecast for the next three days for NOC Southampton. This URL has four parameters (latitude, longitude, variable we are accessing and number of days we’re interested in).

As we saw in the previous episode, with curl from the command line, we would have to use the following command

curl "https://api.open-meteo.com/v1/forecast?latitude=50.89&longitude=-1.39&hourly=temperature_2m&forecast_days=3"

building the parameter string explicitly. This is also the syntax that is used in a browser address bar:

"protocol://host/resource/path?parname1=value1&parname2=value2..."

However, using the requests library allows us to use a nicer syntax:

response = requests.get(url="https://api.open-meteo.com/v1/forecast", params={"latitude": "50.89", "longitude": "-1.39", "hourly": "temperature_2m", "forecast_days": "3"})
response
<Response [200]>

As we saw previously, the code 200 means “success”. To make sure the response contains what we expect, let’s quickly print its headers (which has the structure of a dictionary):

for key, value in response.headers.items():
    print((key, value))
('Date', 'Wed, 23 Apr 2025 12:57:34 GMT')
('Content-Type', 'application/json; charset=utf-8')
('Transfer-Encoding', 'chunked')
('Connection', 'keep-alive')
('Content-Encoding', 'deflate')

As expected the Content-Type is application-json. We can now look at the body of the response:

response.text[:100]
'{"latitude":50.86,"longitude":-1.3800001,"generationtime_ms":0.024437904357910156,"utc_offset_second'

As mentioned, the requests library can parse this JSON representation and return a more convenient Python object, using which we can access the inner data:

data = response.json()
data["hourly"]["temperature_2m"]

Another location

Refer back to the the API reference. Can you produce a forecast for a fortnight for precipation probability at NOC Liverpool?

Solution

We query the MetOffice API using something similar to the following:

response = requests.get(url="https://api.open-meteo.com/v1/forecast", params={"latitude": "53.40", "longitude": "-2.97", "hourly": "precipitation_probability", "forecast_days": "14"})

Authentication and POST

As mentioned above, thus far we have only used GET requests. GET requests are intended to be used for retrieving data, without modifying any state—effectively, “look, but don’t touch”. To modify state, other HTTP verbs should be used instead. Most commonly used for this purpose in web APIs are POST requests.

As such, we’ll switch to using the GitHub API to look at how POST requests can be used.

This will require a GitHub Personal Access Token. If you don’t already have one, then the instructions in the Setup walk through how to obtain one.

Take care with access tokens!

This access token identifies your individual user account, rather than just the application you’re developing, so anyone with this token can impersonate you and manage your account. Be very sure not to commit this (or any other personal access token) to a public repository, (or any repository that might be made public in the future) as it will very rapidly be discovered and used against you.

The most common mistake some people have made here is committing tokens for a cloud service. This has allowed unscrupulous individuals to take over cloud computing services and spend hundreds of thousands of pounds on activities such as mining cryptocurrency.

To POST requests, we can use the function requests.post.

For this example, we are going to post a comment on an issue on GitHub. Issues on GitHub are a simple way to keep track of bugs, and a great way to manage focused discussions on the code.

In order to do so, we need to authenticate. We will now create an object of the HTTPBasicAuth class provided by requests, and pass it to requests.post.

First of all, let’s load the GitHub access token:

with open("github-access-token.txt", "r") as file:
  ghtoken = file.read().strip()

Let’s then create the HTTPBasicAuth object:

from requests.auth import HTTPBasicAuth
auth = HTTPBasicAuth("your-github-username",ghtoken)

We will now create the body of the comment, as a JSON string:

import json
body = json.dumps({"body": "Another test comment"})

Finally, we will post the comment on GitHub and make sure we get a success code:

response = requests.post(url="https://api.github.com/repos/mmesiti/web-novice-test-repo/issues/1/comments",
              data=body,
              auth=auth)
response
<Response [201]>

The code 201 is the typical success response for a POST request, signaling that the creation of a resource has been successful. We can go to the issue page and check that our new comment is there.

Curl and POST

curl can be also used for POST requests, which can be useful for shell-based workflows. One needs to use the --data option.

What have I asked you?

The request that generated a given response object can be retrieved as response.request. Can you see the headers of that request? And what about the body of the message? What is the type of the request object?

Solution

To print the headers:

print(response.request.headers)
{'User-Agent': 'python-requests/2.22.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

The body of the request is accessible just with

response.requests.body
'{"body": "A test comment"}'

And the type is PreparedRequest:

type(response.request)
request.models.PreparedRequest

For better control, one could in principle create a Request object beforehand, call the prepare method on it to obtain a PreparedRequest, and then send it through a Session object.

Forgot the key

What error code do we get if we just forget to add the auth? How do the headers of the request change?

Solution

r = requests.post(url="https://api.github.com/repos/mmesiti/web-novice-test-repo/issues/1/comments",data=body)
r
<Response [401]>

The request headers are:

('User-Agent', 'python-requests/2.25.1')
('Accept-Encoding', 'gzip, deflate')
('Accept', '*/*')
('Connection', 'keep-alive')
('Content-Length', '26')

Most notably, the “Authorization” header is missing.

Authentication is a vast topic. The requests library implements a number of authentication mechanisms that you can use. To handle authentication for multiple requests, one could also use a Session object from the requests library (see Advanced Usage).

Key Points

  • GET requests are used to read data from a particular resource.

  • POST requests are used to write data to a particular resource.

  • GET and POST methods may require some form of authentication (POST usually does)

  • The Python requests library offers various ways to deal with authentication.

  • curl can be used instead for shell-based workflows and debugging purposes.