Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
Please, see latest news in issue #112 (from April 15).
In this lab we will start with a brief look at SSH port forwarding.
But in the majority of the lab we will explore how to develop Python projects in a sandboxed environment that is easily distributed among individual developers in a big software team. We will also see how Python programs can be prepared for further distribution.
Note that the sandboxed Python development knowledge would be useful for the last homework task though AI can guide you a lot.
Preflight checklist
- You know what is SSH.
- You know what are TCP ports.
- You know how Python modules are created and organized.
- You are ready for the shell exam.
SSH port forwarding
Generally, services provided by a machine should not be exposed over the network for random “security researchers” to play with. Therefore, a firewall is usually configured to control access to your machine from the network.
If a service should be provided only locally, it is even easier to let it listen on the loopback device only. This way, only local users (including users connected to the machine via SSH) can access it.
As an example, you will find that there is a web server listening on port 8080 of linux.ms.mff.cuni.cz
.
This web server is not available when you try to access it as linux.ms.mff.cuni.cz
,
but accessing it locally (when logged to linux.ms.mff.cuni.cz
) works.
you@laptop$ curl http://linux.ms.mff.cuni.cz:8080 # Fails
you@laptop$ ssh LOGIN@linux.ms.mff.cuni.cz curl --silent http://localhost:8080 # Works
While using cURL to access this web server is possible, it is not the most user-friendly way to browse a web page.
Local Port Forwarding
SSH can be used to create a secure tunnel, through which a local port is forwarded to a port accessible from the remote machine. In essence, you will connect to a loopback device on your machine and SSH will forward that communication to the remote server, effectively making the remote port accessible.
The following command will make local port 8888 behave as port 8080 on the remote machine.
The 127.0.0.1
part refers to the loopback on the remote server
(you can write localhost
there, too.)
ssh -L 8888:127.0.0.1:8080 -N linux.ms.mff.cuni.cz
You always first specify which local port to forward (8888) and then the destination as if you were connecting from the remote machine (127.0.0.1:8080).
The -N
makes this connection usable only for forwarding – use Ctrl-C
to terminate it (without it, you will log in to the remote machine, too).
Open http://localhost:8888 in your browser to check that you can see
the same content as with the ssh LOGIN@linux.ms.mff.cuni.cz curl http://localhost:8080
command above.
Remote/Reverse Port Forwarding
SSH allows to create also a so-called remote port forward.
It basically allows you to open a connection from the remote server to your local machine (in reverse to the SSH connection).
Practically, you can set up a remote port forwarding by connecting from your desktop you have at home to a machine in IMPAKT/Rotunda, for example, and then use it to connect from IMPAKT/Rotunda back to your desktop.
This feature will work even if your machine is behind NAT, which makes direct connections from the outside impossible.
The following command sets the remote port forwarding such that
connecting to port 2222
on the remote machine will be translated
to connection to port 22
(ssh) on the local machine:
ssh -N -R 2222:127.0.0.1:22 u-plN.ms.mff.cuni.cz
You first specify the remote port to forward (2222) and then the destination as if you were connecting from the local machine (127.0.0.1:22).
When trying this, ensure that your sshd
daemon is running
(recall lab 08 and systemctl
command)
and use a different port than 2222 to prevent collisions.
In order to connect to your desktop via this port forward, you have to do so from IMPAKT/Rotunda lab via the following command.
ssh -p 2222 your-desktop-login@localhost
We use localhost
as the connection
is only bound to the loopback interface, not to the actual network adapter available
on lab computers. (Actually, ssh
allows to bind the port forward on the public
IP address, but this is often disabled by the administrator for security reasons.)
Sandboxed software development
In one of the previous labs, we have showed that the preferred way of installing applications (and libraries and data files) on Linux is via the package manager. It installs the application for all users, it allows system-wide upgrades, and it generally keeps your system in a much cleaner state.
However, system-wide installation may not be always suitable. One typical example are project-specific dependencies. These are often not installed system-wide, mainly for the following reasons:
- You need different versions of dependencies for different projects.
- You do not want to remember to uninstall them when you stop working on the project.
- You want to control when you upgrade them: an upgrade of the OS should not affect your project.
- The versions you need are different from those available through the package manager.
- Or they may not be packaged at all.
For the above reasons, it is much better to create a project-specific
installation that is better isolated from the system.
Note that installing the dependency per-user (i.e., somewhere into $HOME
)
may not provide the isolation you wish to achieve.
Such approach is supported by most reasonable programming languages and can be usually found under names such as virtual environment, local repository, sandbox or similar (note that the concepts do not map 1:1 across languages and tools, but the general idea remains the same).
With a virtual environment, your dependencies are usually installed into a specific directory inside your project, kept outside version control. The compiler/interpreter is then told to use this location.
The directory-local installation then keeps your system clean. It also allows working on multiple projects with incompatible dependencies, because they are completely isolated.
Each developer can then recreate the environment without polluting the main repository with distribution-specific or even OS-dependent files. Yet the configuration file ensures that all developers will be working in the same environment (i.e., same versions of all the dependencies).
It also means that new members of software teams can easily set up their environment using the provided configuration file.
Dependency installation
Inside the virtual environment, the project usually does not use generic package managers (such as DNF). Instead, they install dependencies using language-specific package managers.
These are usually cross-platform and use their own software repository. Such repository then hosts only libraries for that particular language. Again, there can be multiple such repositories and it is up to the developers how they configure their projects
In our scenario, the language-specific managers would install only into the virtual environment directory without ever touching the system itself.
Python Package Index (PyPI)
The rest of the text will focus mostly on Python tools supporting the above-mentioned principles. Similar tools are available for other languages, but we believe that demonstrating them on Python is sufficient to understand the principles in practice.
Python has a repository called the Python Package Index (PyPI) where anyone can publish their Python programs and/or libraries.
The repository can be used through a web browser, but also through a command-line
client called pip
.
pip
behaves rather similar to DNF.
You can use it to install, upgrade, or uninstall Python modules.
Typical workflow practically
While the actual tools will differ across different programming languages, the general steps for developing project in some kind of a sandbox are generally the same.
- The developer clones the project (e.g., from a Git repository).
- The sandbox (virtual environment) is initialized. Usually this means that a new directory with a fresh language environment is created.
- The virtual environment must be activated. Often the virtual environment
needs to modify
$PATH
(or rather some language-specific variant of such path that is used to search for libraries or modules), so the developer mustsource
(or.
) some activation script that modifies the path. - Then the developer can install dependencies of the project. They are usually stored in a file that can be passed to the package manager (of the given programming language).
- Only now the developer can actually work on the project. The project is fully isolated, removing the virtual environment directory removes all traces of the installed packages.
Everyday job then often involves only steps 3 (some kind of activation) and step 5 (actual development).
Note that activation of the virtual environment typically removes access to libraries installed globally. That is, inside the virtual environment, the developer starts with a fresh and clean environment with a bare compiler. That is actually a very sane decision as it ensures that system-wide installation does not affect the project-specific environment.
In other words, it improves on reproducibility of the whole setup. It also means that the developer needs to specify every dependency into the configuration file even if the dependency can be considered as one of those that are usually present everywhere.
Virtual environment for Python (a.k.a. virtualenv
or venv
)
To try installing Python packages safely, we will first setup a virtual environment for our project. Fortunately, Python has built-in support for creating a virtual environment.
We will demonstrate this on the following example:
#!/usr/bin/env python3
import argparse
import shutil
import sys
import fs
class FsCatException(Exception):
pass
def fs_cat(filesystem, filename, target):
try:
with fs.open_fs(filesystem) as my_fs:
try:
with my_fs.open(filename, 'rb') as my_file:
shutil.copyfileobj(my_file, target)
except fs.errors.FileExpected as e:
raise FsCatException(f"{filename} on {filesystem} is not a regular file") from e
except fs.errors.ResourceNotFound as e:
raise FsCatException(f"{filename} does not exist on {filesystem}") from e
except Exception as e:
if isinstance(e, FsCatException):
raise e
raise FsCatException(f"unable to read {filesystem}, perhaps misspelled path or protocol ({e})?") from e
def main():
args = argparse.ArgumentParser(description='Filesystem cat')
args.add_argument(
'filesystem',
nargs=1,
metavar='FILESYSTEM',
help='Filesystem specification, e.g. tar://path/to/file.tar'
)
args.add_argument(
'filename',
nargs=1,
metavar='FILENAME',
help='File path on FILESYSTEM, e.g. /README.md'
)
config = args.parse_args()
try:
fs_cat(config.filesystem[0], config.filename[0], sys.stdout.buffer)
except FsCatException as e:
print(f"Fatal: {e}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()
Save this snippet into fscat.py
and set the executable bit.
Note that fs.open_fs
is able to open various filesystems and access
files on them like if you use the builtin Pythonic open
.
In our program, we provide path to a filesystem and a file (residing on this filesystem)
to print to the
screen (hence the name, fscat
as it simulates cat
inside a different
filesystem).
Try running the fscat.py
program.
Unless you have already installed the python3-fs
package system-wide,
it should fail with ModuleNotFoundError: No module named 'fs'
.
The chances are that you do not have that module installed.
If you have installed the python3-fs
, uninstall it now and try again
(just for this demo).
But double-check that you would not remove some other program that may require it.
We could now install the python3-fs
with DNF but we already described
why that is a bad idea.
We could also install it with pip
globally but that is not the best course
of action either.
Instead, we will create a new virtual environment for it.
python3 -m venv my-venv
The above command creates a new directory my-venv
that contains a bare installation
of Python.
Feel free to investigate the contents of this directory.
We now need to activate the environment.
source my-venv/bin/activate
Your prompt should have changed: it is prefixed by (my-venv)
now.
Running fscat.py
will still terminate with ModuleNotFoundError
.
We will now install the dependency:
pip install fs
This will take some time as Python will also download transitive dependencies of this
library (and their dependencies etc.).
Once the installation finishes, run fscat.py
again.
This time, it should work.
./fscat.py
Okay, it printed an error message about required arguments. Download this tarball and run the script as follows:
./fscat.py tar://test.tar.gz testdir/test.txt
It should print Test string
as it is able to even handle tarballs as
filesystems and print files on them (verify that the file is really
there using either atool
, MC or using tar
directly).
Once we are finished with the development, we can deactivate the
environment by calling deactivate
(this time, without sourcing anything).
Running fscat.py
outside the environment shall again terminate
with ModuleNotFoundError
.
Installing Python-specific packages with pip
We have already seen one usage of pip
in practice, but pip
can do much more.
The nice walkthrough over all pip
capabilities can be found in
Using Python’s pip to Manage Your Projects’ Dependencies.
Here we provide a brief summary of the most important concepts and commands.
By default, pip install
is searching through the package registry PyPI,
in order to install the package specified in the command-line. We wouldn’t be far from truth,
by saying that all packages inside this registry are just archived directories, which
contain Python source code organized in a prescribed way.
If you would like to change this default package registry, you can use the --index-url
argument.
In a later section, we will learn how to turn a directory with code into a proper Python package.
Assuming that we have already done it, we can install that package directly (without archiving/packing)
by running pip install /path/to/python_package
.
For example, imagine a situation where you are interested in a third-party open-source package.
This package is available in a remote git repository (typically on GitHub or GitLab),
but it is NOT packed and published in PyPI. You can simply clone the repository
and run pip install .
. However, thanks to
pip VCS Support, you
can avoid the cloning phase and install the package directly with:
pip install git+https://git.example.com/MyProject
In order to upgrade a specific package, you run pip install --upgrade [packages]
.
Finally, for removing package you run pip uninstall [packages]
.
Dependency versioning
You might have heard about semantic versioning. Python uses a more or less compatible versioning, which is described in PEP 440 – Version Identification and Dependency Specification.
When you install dependencies from the package registry, you can specify this version.
pkgname # latest version
pkgname == 4.2 # specific version
pkgname >= 4.2 # minimal version
pkgname ~= 4.2 # equivalent to >= 4.2, == 4.*
Truth is that a version specifier consists of a series of version clauses, separated by commas. Therefore you can type:
pkgname >= 1.0, != 1.3.4.*, < 2.0
Sometimes it is helpful to save a list of all currently installed packages (including transitive dependencies). For example, you have recently noticed a new bug in your project and you would like to keep record of the precise version of currently installed dependencies, so that your co-worker can reproduce the bug.
In order to do that, it is possible to use pip freeze
and create a list
that sets specific versions, ensuring the same environment for every developer.
It is recommended to store these in requirements.txt
file.
# Generating requirements file
pip freeze > requirements.txt
# Installing package from it
pip install -r requirements.txt
Packaging Python Projects
Let’s say that you come up with a super cool algorithm and you want to enrich the world by sharing it. Python official documentation offers a step-by-step tutorial on how to achieve it.
Python Package Directory Structure
The very first step, before you can publish it, is to
transform it into a proper Python package. We need to create files called pyproject.toml
and setup.cfg
. These files contain information about the project,
a list of dependencies, and also information for project installation.
In fscat project,
you can find a Python package with the same functionality as our previous
fscat.py
script.
Please have a look at the directory structure as well as the content of setup.cfg
.
Hint.
Try to install this package with VCS Support with following command:
pip install git+http://gitlab.mff.cuni.cz/teaching/nswi177/2025/common/fscat.git
You perhaps noticed that the setup.cfg
file contained the section
[options.entry_points]
.
This section specifies what the actual scripts of your project are.
Note that after running the above command, you can execute the fscat
command directly.
Pip created a wrapper script for you and added it to the sandbox $PATH
.
fscat tar://tests/test.tar.gz testdir/test.txt
Now uninstall the package with:
pip uninstall matfyz-nswi177-fscat
Clone the repository to your local machine and change directory to it. Now run:
pip install -e .
pip install -e
produces an editable installation
for easy debugging. Instead of copying your code to the virtual environment,
it installs only a symlink-like thing (actually, an fscat.egg-link
file, which has a similar effect on Python’s mechanism for finding modules)
referring to the directory with your source files.
Building a Python package
Publishing a Python package
If you think that the package could be useful to other people, you can publish it in the Python Package Index. This is usually accomplished using the twine tool. The precise steps are described in Uploading the distribution archives.
Higher-level tools
We can think of pip
and virtualenv
as low-level tools. However, there
are also tools that combine both of them and bring more comfort to package
management. In Python, there are at least two favorite choices, namely
Poetry and
Pipenv.
Internally, these tools use pip
and venv
, so you are still able to
have independent working spaces as well as the possibility to install a
specific package from the Python Package Index (PyPI).
The complete introduction of these tools is out of the scope for this course. Generally, they follow the same principles, but they add some extra functions that are nice to have. Briefly, the major differences are:
- They can freeze specific versions of dependencies, so that the project
builds the same on all machines (using
poetry.lock
file). - Packages can be removed together with their dependencies.
- It is easier to initialize a new project.
Other languages
Other languages have their own tools with similar functions:
Tasks to check your understanding
We expect you will solve the following tasks before attending the labs so that we can discuss your solutions during the lab.
Learning outcomes and after class checklist
This section offers a condensed view of fundamental concepts and skills that you should be able to explain and/or use after each lesson. They also represent the bare minimum required for understanding subsequent labs (and other courses as well).
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …
-
explain the difference between a normal SSH port forward and a reverse port forward
-
explain what are requirements (library dependencies)
-
explain fundamentals of semantic versioning
-
explain what are pros and cons of installing dependencies system-wide vs installing them in a sandboxed environment
-
provide a high-level overview of a sandbox environment
-
explain pros and cons of specifying transitive requirements vs specification of top-level ones only
-
explain pros and cons of using exact versions vs minimal requirements
Practical skills
Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should be able to …
-
use SSH port forward to access service available on loopback device
-
use reverse SSH port forward to connect to a machine behind a NAT
-
create a new virtual environment for Python using
python3 -m venv
-
activate and deactivate virtual environment
-
install project dependencies in a virtual environment with
pip
-
develop program inside a virtual environment (with projects using
setup.cfg
andpyproject.toml
files) -
install Python project from its
setup.cfg
-
optional: setup Python project for installation