Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
- Before class reading
-
Virtual environment for Python (a.k.a.
virtualenv
orvenv
) - How does it work?
-
Installing Python-specific packages with
pip
- Packaging Python Projects
- Building Python Package
- Publishing Python Package
- Creating distribution packages (e.g. for DNF)
- Higher-level tools
- Other languages
- Excercise
- Graded tasks (deadline: May 12)
- Learning outcomes
This lab is devoted to the basic principles of reproducible and isolated development. You will see how you can ensure that working on your project – that may require installation of many dependencies – can be set up without modifying anything system-wide on your machine.
Do not forget that the Before class reading is mandatory and there is a quiz that you are supposed to complete before coming to the labs.
Virtual environment for Python (a.k.a. virtualenv
or venv
)
To try installing Python packages safely, we will first setup a virtual environment for our project. Fortunately, Python has built-in support for creating a virtual environment.
We will demonstrate this on following example:
#!/usr/bin/env python3
import sys
import dateparser
def main():
input_date = ' '.join(sys.argv[1:])
if input_date == '':
input_date = 'now'
date = dateparser.parse(input_date)
if not date:
print(f"Invalid date specification (`{input_date}').", file=sys.stderr)
sys.exit(1)
print(date.strftime('%Y-%m-%dT%H:%M:%S'))
if __name__ == '__main__':
main()
Save this snippet into timestamp2iso.py
and set the executable bit.
Note that dateparser.parse()
is able to parse various time specification
into the native Python date format.
The time specification can be even text such as three days ago
.
Make sure you understand the whole program before continuing.
Try running the timestamp2iso.py
program.
Unless you have already installed the python3-dateparser
package system-wide,
it should fail with ModuleNotFoundError: No module named 'dateparser'
.
The chances are that you do not have that module installed.
If you have installed the python3-dateparser
, uninstall it now and try again
(just for this demo).
But double-check that you would not remove some other program that may require it.
We could now install the python3-dateparser
with DNF but we already described
why that is a bad idea.
We could also install it with pip
globally but that is not the best course
of action either.
Instead, we will create a new virtual environment for it.
python -m venv my-venv
The above command creates a new directory my-venv
that contains a bare installation
of Python.
Feel free to investigate the contents of this directory.
We now need to activate the environment.
source my-venv/bin/activate
Your prompt should have changed: it is prefixed by (my-venv)
now.
Running timestamp2iso.py
will still terminate with ModuleNotFoundError
.
We will now install the dependency:
pip install dateparser
This will take some time as Python will also download transitive dependencies of this
library (and their dependencies etc.).
Once the installation finishes, run timestamp2iso.py
again.
This time, it should work.
./timestamp2iso.py three days ago
Once we are finished with the development, we can deactivate the
environment by calling deactivate
(this time, without sourcing anything).
Running timestamp2iso.py
outside the environment shall again terminate
with ModuleNotFoundError
.
Installing Python-specific packages with pip
We have already seen one usage of pip
in practice, but pip
can do much more.
The nice walkthrough over all pip
capabilities can be found in
Using Python’s pip to Manage Your Projects’ Dependencies.
Here we provide a brief summary of the most important concepts and commands.
By default pip install
is searching through the package registry PyPI,
in order to install package specified in command-line. We wouldn’t be far from truth,
by saying that all packages inside this registry are just archived directories, which
contains Python source code organized in a prescribed way.
If you would like to change this default package registry you can use --index-url
argument.
In later section, we will learn how to turn a directory with code into proper Python package.
Assuming that we have already done it, we can that package directly (without archiving/packing)
by running pip install /path/to/python_package
.
For example, imagine a situation where you are interested in third-party open-source package.
This package is available in remote git repository (typically on GitHub or GitLab),
but it is NOT packed and published in PyPI. You can simply clone the repository
and run pip install .
. However, thanks to
pip VCS Support you
can avoid the cloning phase and install the package directly with:
pip install git+https://git.example.com/MyProject
In order to upgrade a specific package you run pip install --upgrade [packages]
.
Finally, for removing package you run pip uninstall [packages]
.
Dependency versioning
We have already mentioned Semantic Versioning 2.0.0. Python uses more or less compatible versioning, which is described in PEP 440 – Version Identification and Dependency Specification.
When you install dependencies from package registry, you can specify this version.
pkgname # latest version
pkgname == 4.2 # specific version
pkgname >= 4.2 # minimal version
pkgname ~= 4.2 # equivalent to >= 4.2, == 4.*
Truth is that a version specifier consists of a series of version clauses, separated by commas. Therefore you can type:
pkgname >= 1.0, != 1.3.4.*, < 2.0
Dependency versioning
Sometimes it is helpful to save a list of all currently installed packages (including transitive dependencies). For example, you have recently noticed a new bug in you project and you would like to keep record of precise version of currently installed dependencies, so you co-worker can reproduce it.
In order to do that, it is possible to use pip freeze
and create a list
that sets specific versions, ensuring the same environment for every developer.
It is recommended to store these in requirements.txt
file.
# Generationg requirements file
pip freeze > requirements.txt`
# Installing package from it
pip install -r requirements.txt
Packaging Python Projects
Let’s say that you come up with a super cool algorithm and you want to enrich the world by sharing it. Python official documentation offers step-by-step tutorial how to achieve it.
Python Package Directory Structure
The very first step, before you can publish it, is to
transform it into a proper Python package. We need to files called pyproject.toml
and setup.cfg
. These files contain information about the project,
a list of dependencies, and also information for project installation.
In timestamp2iso
you can find Python package with the same functionality as our previous
timestamp2iso.py
script.
Please study carefully the directory structure as well as the content of setup.cfg
.
Try to install this package with VCS Support with following command:
pip install git+http://gitlab.mff.cuni.cz/teaching/nswi177/2022/common/timestamp2iso.git
You perhaps noticed that the setup.cfg
contained section
[options.entry_points]
.
This section specifies what are actual scripts of your project.
Note that after running the above command, you can execute timestamp2iso
command directly.
Pip created a wrapper script for you and added it to the sandbox $PATH
.
timestamp2iso three days ago
Now uninstall the package with:
pip uninstall matfyz-nswi177-timestamp2iso
Clone the repository to you local machine and change directory to it. Now run:
pip install -e .
pip install -e
produces an editable installation
for easy debugging. Instead of copying your code to the virtual environment,
it installs only a symlink-like thing (actually, an timestamp2iso.egg-link
file which has a similar effect on Python’s mechanism for finding modules)
referring to the directory with your source files.
Add some nice prefix just before the ISO print statement and run timestamp2iso three days ago
again.
Building Python Package
Now, when we already have the proper directory structure, we are only two step from publishing it to Package Registry.
Now, we prepare distribution packages for our code. Firstly, we install the build
package by invoking pip install build
. Then we can run
python -m build
Two files are created in the dist
subdirectory:
-
matfyz-nswi177-timestamp2iso-0.0.1.tar.gz
– a source code archive -
matfyz_nswi177_timestamp2iso-0.0.1-py3-none-any.whl
– a wheel file, which is the built package (py3
is the Python version required,none
andany
tell that this is a platform-independent package).
You can now switch to a different virtualenv and install the package
using pip install
package.whl.
Publishing Python Package
If you think that the package could be useful to other people, you can publish it in the Python Package Index. This is usually accomplished using the twine tool. The precise steps are described in Uploading the distribution archives.
Higher-level tools
We can think of the pip
and virtualenv
as low-level tools. However, there
are also tools that combine both of them and bring more comfort to package
management. In Python there are at least two favorite choices, namely
Poetry and
Pipenv.
Internally, these tools use pip
and venv
, so you are still able to
have independent working spaces as well as the possibility to install a
specific package from the Python Package Index (PyPI).
The complete introduction of these tools is out of the scope for this course. Generally, they follow the same principles, but they add some extra functions that are nice to have. Briefly, the major differences are:
- They can freeze specific versions of dependencies, so that the project
builds the same on all machines (using
poetry.lock
file). - Packages can be removed together with their dependencies.
- It is easier to initialize a new project.
Other languages
Other languages have their own tools with similar functions:
Excercise
Setup program from examples repository
(11/last_commit
) to be a proper Python project.
Graded tasks (deadline: May 12)
11/tapsum2json
(100 points)
Write a program that produces summary of TAP results in a JSON format.
TAP – or Test Anything Protocol – is a universal format for test results. It is used by BATS and the GitLab pipeline, too.
1..4
ok 1 One
ok 2 Two
ok 3 Three
not ok 4 Four
#
# -- Report --
# filename:77:26: note: Something is wrong here.
# --
#
Your program will accept a list of arguments – filenames – and read them
using appropriate consumer.
Each of the files would be a standalone TAP result (i.e., what a BATS produces
with -t
).
Nonexistent files will be skipped and recorded as having zero tests.
The program will then print summary of the tests in the following format.
{
"summary": [
{
"filename": "filename1.tap",
"total": 12,
"passed": 8,
"skipped": 3,
"failed": 1
},
{
...
}
]
}
You have to use a library for reading TAP files: tap.py is certainly a good option, but feel free to find a better alternative.
Your solution must contain a pyproject.toml
, setup.cfg
and requirements.txt
with a list of library dependencies that can be passed to pip install
.
Your solution must also be installable via setup.cfg
and
create a tapsum2json
executable in the $PATH
.
This is mandatory as we will test your solution like this
(see the tests for details).
Save your implementation into 11/tapsum2json
subdirectory.
If you want to run the automated tests on your machine, you need to have
the json_reformat
utility from the yajl
DNF package (sudo dnf install yajl
).
The tests reformat the JSON output to allow easy visual inspection of the result.
We do not require you to format JSON by yourself, although passing
indent=True
to json.dump
certainly helps debugging.
Learning outcomes
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …
-
explain what are dependencies (in the sense of requirements)
-
explain why installing project dependencies system-wide may not work for multiple projects
-
explain how the sandboxing works (high-level points of view)
-
explain pros and cons of specifying transitive dependencies or top-level ones only and of specifying exact versions vs minimal requirement
Practical skills
Practical skills is usually about usage of given programs to solve various tasks. Therefore, you should be able to …
-
create a new virtual environment (for Python)
-
active and deactive an existing virtual environment
-
run/test Python project using virtualenv (with
setup.cfg
andpyproject.toml
) -
install Python project that uses
setup.cfg
andpyproject.toml
-
install new project dependencies
-
update list of dependencies
-
configure project for installation (optional)