Requests
Overview
Teaching: 40 min
Exercises: 20 minQuestions
How can I send HTTP requests to a web server from Python?
How to interact with web services that require authentication?
What are the data formats that are used in HTTP messages?
Objectives
Use the Python
requests
library for GET and POST requestsUnderstand how to deal with common authentication mechanisms.
Understand what else the
requests
library can do for you.
So far, we have been interacting with web APIs by using curl
to send HTTP requests
and then inspecting the responses at the command line. This is very useful for running
quick checks that we are able to access the API, and debugging if we’re not. However,
to integrate web APIs into our software and analyses, we’d like to be able to make
requests of web APIs from within Python, and work with the results.
In principle we could make subprocess calls to curl
, and capture and parse the
results, but this would be very cumbersome. Fortunately, other people thought the same
thing, and have made libraries available to help with this. Basic functionality around
making and processing requests is built into the Python standard library, but far more
popular is to use a package called requests
, which is available from PyPI.
First off, let’s check that we have requests
installed.
$ python -c "import requests"
if you do not see any message,
then requests
is already installed.
If on the other hand you see a message like
Traceback (most recent call last):
File "<string>", line 1, in <module>\
ModuleNotFoundError: No module name 'requests'
then install requests
from pip
:
$ pip install requests
Recap: Requests, Responses and JSON
As a reminder, communication with web APIs is done through the HTTP protocol, and happens through messages, which are of two kinds: requests and responses.
A request is composed of a start line, a number of headers and an optional body.
Practically, a request needs to specify one of the HTTP verbs and a URL in the start line and an optional payload (the body).
A response is composed of a status line, a number of headers and an optional body.
The data to be transferred with the body of a request needs to be represented in some way. “Unstructured” text representations are used, e.g., to transmit CSV data. A popular text-based (ASCII) format to transmit data is the JavaScript Object Notation (JSON) format. The Python standard library includes a module to deal with JSON, for serialisation (i.e. representing Python objects as JSON strings):
import json
data = dict(a=1, b=dict(c=(2,3,4)))
representation = json.dumps(data)
representation
'{"a": 1, "b": {"c": [2, 3, 4]}}'
And for parsing (i.e. recovering python objects from their JSON string representation):
data_reparsed = json.loads(representation)
data_reparsed
{'a': 1, 'b': {'c': [2, 3, 4]}}
You can see that for dicts
containing strings, integers, and lists, at least, the
JSON representation looks very similar to the Python representation. The two are not
always directly interchangeable, however.
The Python requests
library
can parse JSON and serialise the objects,
so that you don’t have to deal with this aspect on your own.
Another ASCII format that is used with APIs
is the eXtensible Markup Language (XML),
which is much more complex to deal with than JSON.
Facilities to deal with the XML format are
in the xml.etree.ElementTree
library.
Another markup language widely used in HTTP message bodies is the HyperText Markup Language, HTML.
HTTP verbs
Up until now we have exclusively used GET requests, to retrieve information from a server. In fact, the HTTP protocol has a number of such verbs, each associated with an operation falling in one of four categories: Create, Read, Update, or Delete (sometimes called the CRUD categories). The most common verbs are:
- GET: to read resources (these requests have no body);
- POST: to create new resources;
- PUT: to update/replace existing resources;
- PATCH: to update/modify existing resources;
- DELETE: to delete resources.
In this lesson we will focus on GET and POST requests only.
A GET request example
Let’s take the first example we looked at earlier,
now with the Python requests
library:
import requests
response = requests.get("http://carpentries.org")
requests
gives us access to both the headers and the body of the response.
Looking at the headers first, we can look at what type of data is in the body.
As this is the URL of a website,
we expect the reponse to contain a web page:
response.headers["Content-Type"]
text/html
Our expectations are confirmed. We can also check the Content-Length
header to see
how much data we expect to find in the body:
response.headers["Content-Length"]
29741
And, as expected, the length of the body of the response is the same:
len(response.text)
29741
We can look at the content of the body:
response.text
Another GET request example
APIs, like other pieces of code, need documentation. We’ve already seen some examples of API documentation, such as NASA’s API documentation.
One popular way of creating API documentation is by generating it from the API specification (essentially a means of providing metadata for the API). One of the specification languages you are likely to hear about is called OpenAPI, a specification language for HTTP APIs. This is a machine readable format, meaning that a lot of tooling has been developed around it for tasks like the generation of API documentation.
The documentation that can be generated from an OpenAPI description can be interactive, even allowing you to test API endpoints without ever leaving the documentation page. We’ll see an example of this in the exercise below, as well as another example of a GET request.
EDS Citation API using the documentation
Look at BODC’s EDS citation API. (See this page for background information.) Can you find out the total number of citation count for the Polar Data Centre (PDC), without leaving the page?
Solution
There are multiple ways to do this. One is to use the
/centre
endpoint. You could also use the/centre/{centre_name}
endpoint, wherecentre_name
isPolar Data Centre (PDC)
. To interact with the documentation on the page, click theTry it out
button, enter any desired parameters, then click theExecute
button.
EDS Citation API using the requests library
Can you now make exactly the same request but using the requests library, rather than just using the interactive documentation?
Solution
For example:
import requests response = requests.get("https://www.bodc.ac.uk/eds-citation/centre") response.json()
GET with parameters
As we have seen when talking about curl
,
some endpoints accept parameters in GET requests.
Using Python’s requests
library,
the call to NASA’s APOD endoint
that we previously made
$ curl -i "https://api.nasa.gov/planetary/apod?date=2005-04-01&api_key=<your-api-key>"
can be expressed in a more human-friendly format:
response = requests.get(url="https://api.nasa.gov/planetary/apod",
params={"date":"2005-04-01",
"api_key":"<your-api-key>"})
using a dictionary to contain all the arguments.
Get a list of GitHub repositories
The CDT-AIMLAC GitHub organisation (
cdt-aimlac
) has a number of repositories. Using the official API documentation of GitHub, can you list their name, ordered in ascending order by last updated time? (Look at the examples in the documentation!)Solution
The url to use is
https://api.github.com/orgs/cdt-aimlac/repos
. In addition to that, we need to use the parameterssort
with valueupdated
anddirection
with valueasc
.response = requests.get(url="https://api.github.com/orgs/cdt-aimlac/repos", params={'sort':'updated', 'direction':'asc'}) response
<Response [200]>
Once we verify that there are no errors, we can extract the data, which is available via the
json()
method:for repo in response.json(): print(repo["name"], ':', repo["updated_at"])
testing_exercise : 2020-04-28T13:56:42Z docker-introduction-2021 : 2021-01-26T19:20:19Z grid : 2021-03-10T11:59:09Z training-cloud-vm : 2021-03-23T13:43:03Z ccintro-2021 : 2021-09-21T13:57:35Z git-novice : 2021-11-24T10:21:58Z docker-introduction-2022 : 2022-01-24T17:31:39Z blogs : 2022-09-07T15:56:33Z ccintro-2022 : 2022-09-15T15:51:29Z aber-pubs : 2022-11-23T13:41:57Z agile_snails_coding_challenge : 2022-11-23T15:45:05Z team_7564616d_models : 2022-11-23T15:45:42Z coding-challenge-2022_23-task1 : 2023-02-08T17:02:07Z pl_curves : 2023-03-29T22:38:10Z ccintro-2023 : 2023-09-18T13:58:06Z marketintro-2023 : 2023-11-16T17:12:00Z
Another GET request with parameters example - Open-Meteo API
As an additional example of using requests to connect to an API rather than a plain web site we’ll use the Open-Meteo API. (This is free for non-commercial use and does not require an API key.)
Looking at the documentation for this API (specifically, the API URL under the API Response section), we can build a URL to access a temperature forecast for the next three days for NOC Southampton. This URL has four parameters (latitude, longitude, variable we are accessing and number of days we’re interested in).
As we saw in the previous episode, with curl from the command line, we would have to use the following command
curl "https://api.open-meteo.com/v1/forecast?latitude=50.89&longitude=-1.39&hourly=temperature_2m&forecast_days=3"
building the parameter string explicitly. This is also the syntax that is used in a browser address bar:
"protocol://host/resource/path?parname1=value1&parname2=value2..."
However, using the requests
library allows us to use a nicer syntax:
response = requests.get(url="https://api.open-meteo.com/v1/forecast", params={"latitude": "50.89", "longitude": "-1.39", "hourly": "temperature_2m", "forecast_days": "3"})
response
<Response [200]>
As we saw previously, the code 200 means “success”. To make sure the response contains what we expect, let’s quickly print its headers (which has the structure of a dictionary):
for key, value in response.headers.items():
print((key, value))
('Date', 'Wed, 23 Apr 2025 12:57:34 GMT')
('Content-Type', 'application/json; charset=utf-8')
('Transfer-Encoding', 'chunked')
('Connection', 'keep-alive')
('Content-Encoding', 'deflate')
As expected the Content-Type
is application-json
.
We can now look at the body of the response:
response.text[:100]
'{"latitude":50.86,"longitude":-1.3800001,"generationtime_ms":0.024437904357910156,"utc_offset_second'
As mentioned, the requests
library
can parse this JSON representation
and return a more convenient Python object,
using which we can access the inner data:
data = response.json()
data["hourly"]["temperature_2m"]
Another location
Refer back to the the API reference. Can you produce a forecast for a fortnight for precipation probability at NOC Liverpool?
Solution
We query the MetOffice API using something similar to the following:
response = requests.get(url="https://api.open-meteo.com/v1/forecast", params={"latitude": "53.40", "longitude": "-2.97", "hourly": "precipitation_probability", "forecast_days": "14"})
Authentication and POST
As mentioned above, thus far we have only used GET requests. GET requests are intended to be used for retrieving data, without modifying any state—effectively, “look, but don’t touch”. To modify state, other HTTP verbs should be used instead. Most commonly used for this purpose in web APIs are POST requests.
As such, we’ll switch to using the GitHub API to look at how POST requests can be used.
This will require a GitHub Personal Access Token. If you don’t already have one, then the instructions in the Setup walk through how to obtain one.
Take care with access tokens!
This access token identifies your individual user account, rather than just the application you’re developing, so anyone with this token can impersonate you and manage your account. Be very sure not to commit this (or any other personal access token) to a public repository, (or any repository that might be made public in the future) as it will very rapidly be discovered and used against you.
The most common mistake some people have made here is committing tokens for a cloud service. This has allowed unscrupulous individuals to take over cloud computing services and spend hundreds of thousands of pounds on activities such as mining cryptocurrency.
To POST requests, we can use the function requests.post
.
For this example, we are going to post a comment on an issue on GitHub. Issues on GitHub are a simple way to keep track of bugs, and a great way to manage focused discussions on the code.
In order to do so, we need to authenticate.
We will now create an object
of the HTTPBasicAuth
class
provided by requests
,
and pass it to requests.post
.
First of all, let’s load the GitHub access token:
with open("github-access-token.txt", "r") as file:
ghtoken = file.read().strip()
Let’s then create the HTTPBasicAuth
object:
from requests.auth import HTTPBasicAuth
auth = HTTPBasicAuth("your-github-username",ghtoken)
We will now create the body of the comment, as a JSON string:
import json
body = json.dumps({"body": "Another test comment"})
Finally, we will post the comment on GitHub and make sure we get a success code:
response = requests.post(url="https://api.github.com/repos/mmesiti/web-novice-test-repo/issues/1/comments",
data=body,
auth=auth)
response
<Response [201]>
The code 201 is the typical success response for a POST request, signaling that the creation of a resource has been successful. We can go to the issue page and check that our new comment is there.
Curl and POST
curl
can be also used for POST requests, which can be useful for shell-based workflows. One needs to use the--data
option.
What have I asked you?
The request that generated a given
response
object can be retrieved asresponse.request
. Can you see the headers of that request? And what about the body of the message? What is the type of the request object?Solution
To print the headers:
print(response.request.headers)
{'User-Agent': 'python-requests/2.22.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
The body of the request is accessible just with
response.requests.body
'{"body": "A test comment"}'
And the type is
PreparedRequest
:type(response.request)
request.models.PreparedRequest
For better control, one could in principle create a
Request
object beforehand, call theprepare
method on it to obtain aPreparedRequest
, and then send it through aSession
object.
Forgot the key
What error code do we get if we just forget to add the auth? How do the headers of the request change?
Solution
r = requests.post(url="https://api.github.com/repos/mmesiti/web-novice-test-repo/issues/1/comments",data=body) r
<Response [401]>
The request headers are:
('User-Agent', 'python-requests/2.25.1') ('Accept-Encoding', 'gzip, deflate') ('Accept', '*/*') ('Connection', 'keep-alive') ('Content-Length', '26')
Most notably, the “Authorization” header is missing.
Authentication is a vast topic.
The requests
library implements a number
of authentication mechanisms that you can use.
To handle authentication for multiple requests,
one could also use a Session
object
from the requests
library
(see Advanced Usage).
Key Points
GET requests are used to read data from a particular resource.
POST requests are used to write data to a particular resource.
GET and POST methods may require some form of authentication (POST usually does)
The Python requests library offers various ways to deal with authentication.
curl can be used instead for shell-based workflows and debugging purposes.