Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
In this lab we will have a look at two useful utilities, namely xargs
and find
. We will also extend our knowledge about SSH by learning
about port forwarding.
But in the majority of the lab we will explore how to develop Python projects in a sandboxed environment that is easily distributed among individual developers in a big software team. We will also see how Python programs can be prepared for further distribution.
Parts about find
and xargs
are somehow connected, otherwise the three
topics (find
/xargs
, SSH port forwarding and sandboxed development)
are independent and can be read in any order. Note that the sandboxed Python
development knowledge would be useful for the last homework task.
Preflight checklist
- You know what is SSH.
- You know what are TCP ports.
- You know what is a disk image.
- You remember about shell wildcards.
- You know that C strings are terminated with zero byte.
- You know how Python modules are created and organized.
xargs
(and parallel
) utilities
xargs
in its simplest form reads standard input and converts it to
program arguments for a user-specified program.
Assume we have the following files in a directory:
2024-04-16.txt 2024-04-24.txt 2024-05-02.txt 2024-05-10.txt
2024-04-17.txt 2024-04-25.txt 2024-05-03.txt 2024-05-11.txt
2024-04-18.txt 2024-04-26.txt 2024-05-04.txt 2024-05-12.txt
2024-04-19.txt 2024-04-27.txt 2024-05-05.txt 2024-05-13.txt
2024-04-20.txt 2024-04-28.txt 2024-05-06.txt 2024-05-14.txt
2024-04-21.txt 2024-04-29.txt 2024-05-07.txt 2024-05-15.txt
2024-04-22.txt 2024-04-30.txt 2024-05-08.txt
2024-04-23.txt 2024-05-01.txt 2024-05-09.txt
As a mini-task, write a shell one-liner to create these files.
Solution.Our task is to remove files that are older than 20 days. In this version, we only echo the command so that we do not need to recreate them again when debugging our solution.
cutoff_date="$( date -d "20 days ago" '+%Y%m%d' )"
for filename in 202[0-9]-[01][0-9]-[0-3][0-9].txt; do
date_num="$( basename "$filename" .txt | tr -d '-' )"
if [ "$date_num" -lt "$cutoff_date" ]; then
echo rm "$filename"
fi
done
This means that the program rm
would be called several times, always
removing just one.
The overhead of starting a new process could become a serious bottleneck
for larger scripts (think about thousands of files, for example).
It would be much better if we would call rm
just once, giving it a list
of files to remove (i.e., as multiple arguments).
xargs
is the solution here. Let’s modify the program a little bit:
cutoff_date="$( date -d "20 days ago" '+%Y%m%d' )"
for filename in 202[0-9]-[01][0-9]-[0-3][0-9].txt; do
date_num="$( basename "$filename" .txt | tr -d '-' )"
if [ "$date_num" -lt "$cutoff_date" ]; then
echo "$filename"
fi
done | xargs echo rm
Instead of removing the file right away, we just print its name and pipe the
whole loop to xargs
where any normal arguments refer to the program to be
launched.
Instead of many lines with rm ...
we will se just one long line with single
invocation of rm
.
Another situation where xargs
can come handy is when you are building
a complex command-line or when using command substitution ($( ... )
) would
make the script unreadable.
Of course, tricky filenames can still cause issues as xargs
assumes that
arguments are delimited by whitespace.
(Note that for above, we were safe as the filenames were reasonable.)
That can be changed with --delimiter
.
If you are piping input to xargs
from your program, consider delimiting
items with zero byte (i.e., the C string terminator, \0
).
Recall what you have heard about C strings – and how they are terminated –
in your Arduino course.
That is the safest option as this character cannot appear anywhere inside any
argument.
And tell xargs
about it via -0
or --null
.
Note that xargs
is smart enough to realize when the command-line would be
too long and splits it automatically (see manual for details).
It is also good to remember that xargs
can execute the command in parallel
(i.e., split the stdin into multiple chunks and call the program multiple times
with different chunks) via -P
.
If your shell scripts are getting slow but you have plenty of CPU power, this
may speed things up quite a lot for you.
parallel
This program can be used to execute multiple commands in parallel, hence speeding up the execution.
parallel
behaves almost exactly as xargs
but has much better support for
concurrent execution of individual jobs (not mixing their output, execution
on a remote machine etc. etc.).
The differences are rather well described in
parallel
documentation.
Please, also refer to parallel_tutorial(1)
(yes, that is a man page) and
for parallel(1)
for more details.
find
While ls(1)
and wild-card expansion are powerful, sometimes we need to select
files using more sophisticated criteria.
There comes the find(1)
program useful.
Without any arguments, it lists all files in current directory, including files in nested directories.
/
) unless you know what you are doing
(and definitely not on the shared linux.ms.mff.cuni.cz
machine).
With -name
parameter you can limit the search to files matching given wildcard
pattern.
Following command finds all alpha.txt
files in current directory and in any
subdirectory (regardless of depth).
find -name alpha.txt
Why the following command for finding all *.txt
files would not work?
find -name *.txt
find
has many options – we will not duplicate its manpage here but mention
those that are worth remembering.
-delete
immediately deletes the found files.
Very useful and very dangerous.
-exec
runs a given program on every found file.
You have to use {}
to specify the found filename and terminate the command
with ;
(since ;
terminates commands in shell too, you will need to escape it).
find -name '*.md' -exec wc -l {} \;
Note that for each found file, new invocation of wc
happens. This can be altered
by changing the command terminator (\;
) to +
. See the difference between
invocation of the following two commands:
find -name '*.md' -exec echo {} \;
find -name '*.md' -exec echo {} +
Caveats
By default, find
prints one filename per-line.
However, filename can even contain the newline character (!) and thus the
following idiom is not 100% safe.
find -options-for-find | while read filename; do
do_some_complicated_things_with "$filename"
done
If you want to be really safe, use -print0
and IFS= read -r -d $'\0' filename
as that would use the only safe delimiter – \0
Alternatively, you can pipe the output of find -print0
to xargs --null
.
However, if you are working with your own files or the pattern is safe,
the above loop is fine (just do not forget that directories are
files too and they can contain \n
in their names too).
Shell also allows you to export a function and call back to it from
inside xargs
.
The invocation pattern looks awful but it is a safe approach if you want
to execute a complex operation on top of found files.
my_callback_function() {
echo ""
echo "\$0 = $0"
echo "\$@ =" "$@"
}
export -f my_callback_function
find . -print0 | xargs -0 -n 1 bash -c 'my_callback_function "$@"' arg_zero arg_one
Recall that you can define functions directly in shell and the above can be actually created interactively without storing it as a script.
SSH port forwarding
Generally, services provided by a machine should not be exposed over the network for random “security researchers” to play with. Therefore, a firewall is usually configured to control access to your machine from the network.
If a service should be provided only locally, it is even easier to let it listen on the loopback device only. This way, only local users (including users connected to the machine via SSH) can access it.
As an example, you will find that there is a web server listening on port 8080 of linux.ms.mff.cuni.cz
.
This web server is not available when you try to access it as linux.ms.mff.cuni.cz
,
but accessing it locally (when logged to linux.ms.mff.cuni.cz
) works.
you@laptop$ curl http://linux.ms.mff.cuni.cz:8080 # Fails
you@laptop$ ssh linux.ms.mff.cuni.cz curl --silent http://localhost:8080 # Works
While using cURL to access this web server is possible, it is not the most user-friendly way to browse a web page.
Local Port Forwarding
SSH can be used to create a secure tunnel, through which a local port is forwarded to a port accessible from the remote machine. In essence, you will connect to a loopback device on your machine and SSH will forward that communication to the remote server, effectively making the remote port accessible.
The following command will make local port 8888 behave as port 8080 on the remote machine.
The 127.0.0.1
part refers to the loopback on the remote server
(you can write localhost
there, too.)
ssh -L 8888:127.0.0.1:8080 -N linux.ms.mff.cuni.cz
You always first specify which local port to forward (8888) and then the destination as if you were connecting from the remote machine (127.0.0.1:8080).
The -N
makes this connection usable only for forwarding – use Ctrl-C
to terminate it (without it, you will log in to the remote machine, too).
Open http://localhost:8888 in your browser to check that you can see
the same content as with the ssh linux.ms.mff.cuni.cz curl http://localhost:8080
command above.
You will often forward (local) port N to the same (remote) port N hence
it is very easy to forgot about the proper order. However, the
ordering of -L
parameters is important and switching the numbers
(e.g. 8888:127.0.0.1:9090
instead of 9090:127.0.0.1:8888
) will forward
different ports (usually, you will learn about it pretty quickly, though).
But do not worry if you are unable to remember it. That is why you have manual pages and even every-day users of Linux use them. It is not something to be ashamed or afraid of :-).
Remote/Reverse Port Forwarding
SSH allows to create also a so-called remote port forward.
It basically allows you to open a connection from the remote server to your local machine (in reverse to the ssh connection).
Practically, you can set up a remote port forwarding by connecting from your desktop you have at home to a machine in IMPAKT/Rotunda, for example, and then use it to connect from IMPAKT/Rotunda back to your desktop.
This feature will work even if your machine is behind NAT, which makes direct connections from the outside impossible.
The following command sets the remote port forwarding such that
connecting to port 2222
on the remote machine will be translated
to connection to port 22
(ssh) on the local machine:
ssh -N -R 2222:127.0.0.1:22 u-plN.ms.mff.cuni.cz
You first specify the remote port to forward (2222) and then the destination as if you were connecting from the local machine (127.0.0.1:22).
When trying this, ensure that your sshd
daemon is running
(recall lab 10 and systemctl
command)
and use a different port than 2222 to prevent collisions.
In order to connect to your desktop via this port forward, you have to do so from IMPAKT/Rotunda lab via the following command.
ssh -p 2222 your-desktop-login@localhost
We use localhost
as the connection
is only bound to the loopback interface, not to the actual network adapter available
on lab computers. (Actually, ssh
allows to bind the port forward on the public
IP address, but this is often disabled by the administrator for security reasons.)
Sandboxed software development
In one of the previous labs, we have showed that the preferred way of installing applications (and libraries and data files) on Linux is via the package manager. It installs the application for all users, it allows system-wide upgrades, and it generally keeps your system in a much cleaner state.
However, system-wide installation may not be always suitable. One typical example are project-specific dependencies. These are often not installed system-wide, mainly for the following reasons:
- You need different versions of dependencies for different projects.
- You do not want to remember to uninstall them when you stop working on the project.
- You want to control when you upgrade them: an upgrade of the OS should not affect your project.
- The versions you need are different from those available through the package manager.
- Or they may not be packaged at all.
For the above reasons, it is much better to create a project-specific
installation that is better isolated from the system.
Note that installing the dependency per-user (i.e., somewhere into $HOME
)
may not provide the isolation you wish to achieve.
Such approach is supported by most reasonable programming languages and can be usually found under names such as virtual environment, local repository, sandbox or similar (note that the concepts do not map 1:1 across languages and tools, but the general idea remains the same).
With a virtual environment, your dependencies are usually installed into a specific directory inside your project, kept outside version control. The compiler/interpreter is then told to use this location.
The directory-local installation then keeps your system clean. It also allows working on multiple projects with incompatible dependencies, because they are completely isolated.
Each developer can then recreate the environment without polluting the main repository with distribution-specific or even OS-dependent files. Yet the configuration file ensures that all developers will be working in the same environment (i.e., same versions of all the dependencies).
It also means that new members of software teams can easily set up their environment using the provided configuration file.
Dependency installation
Inside the virtual environment, the project usually does not use generic package managers (such as DNF). Instead, they install dependencies using language-specific package managers.
These are usually cross-platform and use their own software repository. Such repository then hosts only libraries for that particular language. Again, there can be multiple such repositories and it is up to the developers how they configure their projects
In our scenario, the language-specific managers would install only into the virtual environment directory without ever touching the system itself.
Installation directories
On a typical Linux system, there are multiple places where software can be installed:
/usr
– system packages handled by the distribution’s package manager/usr/local
– software installed locally by the administrator; language-specific managers usually install system-wide packages there/opt/$PACKAGE
– large packages installed outside distribution’s package manager often live in their own sub-directory inside/opt
.$HOME
(usually/home/$USER/
) – language-specific managers run by non-root users can install packages locally to their home directory (to language-specific sub-directories).$HOME/.local
is a favourite place for local installation that generally mirrors/usr/local
but for a single user only (executables are then placed inside$HOME/.local/bin
)- per-project virtual environments
Python Package Index (PyPI)
The rest of the text will focus mostly on Python tools supporting the above-mentioned principles. Similar tools are available for other languages, but we believe that demonstrating them on Python is sufficient to understand the principles in practice.
Python has a repository called the Python Package Index (PyPI) where anyone can publish their Python programs and/or libraries.
The repository can be used through a web browser, but also through a command-line
client called pip
.
pip
behaves rather similar to DNF.
You can use it to install, upgrade, or uninstall Python modules.
Issues of trust
In your distributions upstream package repository, all packages typically has to be reviewed by someone from the distribution’s security team. This is sadly not true for the PyPI or similar repositories. This said, you as a developer must be more cautious when installing from such sources.
Not all packages do what they claim to. Some are just innocently buggy, but some are outright malicious. Re-using other people’s code is generally a good practice, but you should give a thought to the trustworthiness of the author. After all, the code will be executed under your account either when you run your program or as a part of the installation process.
In particular, criminals like to publish malicious packages, whose name differs from a well-known package by a single typo. This is called typosquatting. You might read more for example in this blogpost, but searching the web will yield more results.
On the other hand, many PyPI packages are also available as packages for your
distribution (feel free to try dnf search python3-
on your Fedora box).
Hence they probably were reviewed by distribution maintainers and are probably
safe to use.
For packages not available for your distribution natively, always look for
tell-tales of normal vs malicious project. Popularity of the source code
repository. User activity. Reactions to bug reports. Documentation quality.
Etc. etc.
Recall that modern software is rarely built from scratch. Do not be afraid to explore what is available. Check it. And use it :-).
Typical workflow practically
While the actual tools will differ across different programming languages, the general steps for developing project in some kind of a sandbox are generally the same.
- The developer clones the project (e.g., from a Git repository).
- The sandbox (virtual environment) is initialized. Usually this means that a new directory with a fresh language environment is created.
- The virtual environment must be activated. Often the virtual environment
needs to modify
$PATH
(or rather some language-specific variant of such path that is used to search for libraries or modules), so the developer mustsource
(or.
) some activation script that modifies the path. - Then the developer can install dependencies of the project. They are usually stored in a file that can be passed to the package manager (of the given programming language).
- Only now the developer can actually work on the project. The project is fully isolated, removing the virtual environment directory removes all traces of the installed packages.
Everyday job then often involves only steps 3 (some kind of activation) and step 5 (actual development).
Note that activation of the virtual environment typically removes access to libraries installed globally. That is, inside the virtual environment, the developer starts with a fresh and clean environment with a bare compiler. That is actually a very sane decision as it ensures that system-wide installation does not affect the project-specific environment.
In other words, it improves on reproducibility of the whole setup. It also means that the developer needs to specify every dependency into the configuration file even if the dependency can be considered as one of those that are usually present everywhere.
Virtual environment for Python (a.k.a. virtualenv
or venv
)
To try installing Python packages safely, we will first setup a virtual environment for our project. Fortunately, Python has built-in support for creating a virtual environment.
We will demonstrate this on the following example:
#!/usr/bin/env python3
import argparse
import shutil
import sys
import fs
class FsCatException(Exception):
pass
def fs_cat(filesystem, filename, target):
try:
with fs.open_fs(filesystem) as my_fs:
try:
with my_fs.open(filename, 'rb') as my_file:
shutil.copyfileobj(my_file, target)
except fs.errors.FileExpected as e:
raise FsCatException(f"{filename} on {filesystem} is not a regular file") from e
except fs.errors.ResourceNotFound as e:
raise FsCatException(f"{filename} does not exist on {filesystem}") from e
except Exception as e:
if isinstance(e, FsCatException):
raise e
raise FsCatException(f"unable to read {filesystem}, perhaps misspelled path or protocol ({e})?") from e
def main():
args = argparse.ArgumentParser(description='Filesystem cat')
args.add_argument(
'filesystem',
nargs=1,
metavar='FILESYSTEM',
help='Filesystem specification, e.g. tar://path/to/file.tar'
)
args.add_argument(
'filename',
nargs=1,
metavar='FILENAME',
help='File path on FILESYSTEM, e.g. /README.md'
)
config = args.parse_args()
try:
fs_cat(config.filesystem[0], config.filename[0], sys.stdout.buffer)
except FsCatException as e:
print(f"Fatal: {e}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()
Save this snippet into fscat.py
and set the executable bit.
Note that fs.open_fs
is able to open various filesystems and access
files on them like if you use the builtin Pythonic open
.
In our program, we provide path to a filesystem and a file (residing on this filesystem)
to print to the
screen (hence the name, fscat
as it simulates cat
inside a different
filesystem).
Try running the fscat.py
program.
Unless you have already installed the python3-fs
package system-wide,
it should fail with ModuleNotFoundError: No module named 'fs'
.
The chances are that you do not have that module installed.
If you have installed the python3-fs
, uninstall it now and try again
(just for this demo).
But double-check that you would not remove some other program that may require it.
We could now install the python3-fs
with DNF but we already described
why that is a bad idea.
We could also install it with pip
globally but that is not the best course
of action either.
Instead, we will create a new virtual environment for it.
python3 -m venv my-venv
The above command creates a new directory my-venv
that contains a bare installation
of Python.
Feel free to investigate the contents of this directory.
We now need to activate the environment.
source my-venv/bin/activate
Your prompt should have changed: it is prefixed by (my-venv)
now.
Running fscat.py
will still terminate with ModuleNotFoundError
.
We will now install the dependency:
pip install fs
This will take some time as Python will also download transitive dependencies of this
library (and their dependencies etc.).
Once the installation finishes, run fscat.py
again.
This time, it should work.
./fscat.py
Okay, it printed an error message about required arguments. Download this tarball and run the script as follows:
./fscat.py tar://test.tar.gz testdir/test.txt
It should print Test string
as it is able to even handle tarballs as
filesystems and print files on them (verify that the file is really
there using either atool
, MC or using tar
directly).
Once we are finished with the development, we can deactivate the
environment by calling deactivate
(this time, without sourcing anything).
Running fscat.py
outside the environment shall again terminate
with ModuleNotFoundError
.
How does it work?
Python virtual environment uses two tricks in its implementation.
First, the activate
script extends $PATH
with the my-venv/bin
directory.
That means that calling python3
will prefer the application from
the virtualenv’s directory (e.g. my-venv/bin/python3
).
Try this yourself: print $PATH
before and after you activate a virtualenv.
This also explains why we should always specify /usr/bin/env python3
in the shebang instead of /usr/bin/python3
. env
will consult $PATH
that was modified by the activation of the virtualenv.
You can also view the activate
script and see how this is implemented.
Note that deactivate
is actually a function.
Why is the activate
script not executable?
Hint.
The second trick is that Python searches for modules (i.e., for files implementing
an imported module) relative to the path of the python3
binary.
Hence, when python3
is inside my-venv/bin
, Python will look for
the modules inside my-venv/lib
.
That is the location where your locally installed files will be placed.
You can check this by executing the following one-liner that prints Python search directories (again, before and after activation):
python3 -c 'import sys; print(sys.path)'
This behaviour is actually not hard-wired in the Python interpreter. When Python
starts up, it automatically imports a module called site
. This module contains
site-specific setup: it adjusts sys.path
to include all directories where your
distribution installs Python modules. It also detects virtual environments
by looking for the pyvenv.cfg
file in the grandparent directory of the python3
binary. In our case, this configuration file contains include-system-site-packages=false
,
which tells the site
module to skip distribution’s module directories.
You can see that the principle is very simple and the interpreter itself
needs to know nothing about virtual environments.
Installing Python-specific packages with pip
pip
VS. python3 -m pip
?
Generally, it is recommended to use python3 -m pip
, rather than raw pip
.
Reasons behind these additional 10 key strokes are well described in
Why you should use python3 -m pip
.
However, in order to make the following text more readable, we will use the
shorter pip
variant.
We have already seen one usage of pip
in practice, but pip
can do much more.
The nice walkthrough over all pip
capabilities can be found in
Using Python’s pip to Manage Your Projects’ Dependencies.
Here we provide a brief summary of the most important concepts and commands.
By default, pip install
is searching through the package registry PyPI,
in order to install the package specified in the command-line. We wouldn’t be far from truth,
by saying that all packages inside this registry are just archived directories, which
contain Python source code organized in a prescribed way.
If you would like to change this default package registry, you can use the --index-url
argument.
In a later section, we will learn how to turn a directory with code into a proper Python package.
Assuming that we have already done it, we can install that package directly (without archiving/packing)
by running pip install /path/to/python_package
.
For example, imagine a situation where you are interested in a third-party open-source package.
This package is available in a remote git repository (typically on GitHub or GitLab),
but it is NOT packed and published in PyPI. You can simply clone the repository
and run pip install .
. However, thanks to
pip VCS Support, you
can avoid the cloning phase and install the package directly with:
pip install git+https://git.example.com/MyProject
In order to upgrade a specific package, you run pip install --upgrade [packages]
.
Finally, for removing package you run pip uninstall [packages]
.
Dependency versioning
You might have heard about semantic versioning. Python uses a more or less compatible versioning, which is described in PEP 440 – Version Identification and Dependency Specification.
When you install dependencies from the package registry, you can specify this version.
pkgname # latest version
pkgname == 4.2 # specific version
pkgname >= 4.2 # minimal version
pkgname ~= 4.2 # equivalent to >= 4.2, == 4.*
Truth is that a version specifier consists of a series of version clauses, separated by commas. Therefore you can type:
pkgname >= 1.0, != 1.3.4.*, < 2.0
Sometimes it is helpful to save a list of all currently installed packages (including transitive dependencies). For example, you have recently noticed a new bug in your project and you would like to keep record of the precise version of currently installed dependencies, so that your co-worker can reproduce the bug.
In order to do that, it is possible to use pip freeze
and create a list
that sets specific versions, ensuring the same environment for every developer.
It is recommended to store these in requirements.txt
file.
# Generating requirements file
pip freeze > requirements.txt
# Installing package from it
pip install -r requirements.txt
Packaging Python Projects
Let’s say that you come up with a super cool algorithm and you want to enrich the world by sharing it. Python official documentation offers a step-by-step tutorial on how to achieve it.
Python Package Directory Structure
The very first step, before you can publish it, is to
transform it into a proper Python package. We need to create files called pyproject.toml
and setup.cfg
. These files contain information about the project,
a list of dependencies, and also information for project installation.
setup.py
script, rather that setup.cfg
and pyproject.toml
.
Therefore, in many repositories/tutorials you can still find usage of it. The content is more or less 1:1,
but there are certain cases, in which you are forced to use setup.py
.
Fortunately, this is not applicable for our usecase and so we have decided to describe the modern
variant with static configuration files.
setuptools
offers the experimental usage of having only a pyproject.toml
.
This approach is also used by Poetry, but
in the following text, we will stay with the stable combination of setup.cfg
and pyproject.toml
.
In fscat,
you can find a Python package with the same functionality as our previous
fscat.py
script.
setup.cfg
.
One may notice that the necessary dependencies are duplicated in setup.cfg
and in requirements.txt
.
Actually, this is not a mistake. In setup.cfg
, you should use the most possible relaxed version
of the dependency, whereas in requirements.txt
we need to specify all dependencies with a precise version.
There are also the transitive dependencies, which should NOT be present in setup.cfg
.
For more details, see install_requires vs requirements file.
Try to install this package with VCS Support with following command:
pip install git+http://gitlab.mff.cuni.cz/teaching/nswi177/2024/common/fscat.git
You perhaps noticed that the setup.cfg
file contained the section
[options.entry_points]
.
This section specifies what the actual scripts of your project are.
Note that after running the above command, you can execute the fscat
command directly.
Pip created a wrapper script for you and added it to the sandbox $PATH
.
fscat tar://tests/test.tar.gz testdir/test.txt
Now uninstall the package with:
pip uninstall matfyz-nswi177-fscat
Clone the repository to your local machine and change directory to it. Now run:
pip install -e .
pip install -e
produces an editable installation
for easy debugging. Instead of copying your code to the virtual environment,
it installs only a symlink-like thing (actually, an fscat.egg-link
file, which has a similar effect on Python’s mechanism for finding modules)
referring to the directory with your source files.
Building a Python package
Now, when we already have the proper directory structure, we are only two steps from publishing it to Package Registry.
Now, we prepare distribution packages for our code. First, we install the build
package by invoking pip install build
. Then we can run
python3 -m build
Two files are created in the dist
subdirectory:
-
matfyz-nswi177-fscat-0.0.1.tar.gz
– a source code archive -
matfyz_nswi177_fscat-0.0.1-py3-none-any.whl
– a wheel file, which is the built package (py3
is the Python version required,none
andany
tell that this is a platform-independent package).
Note that the wheel file is nothing more that a simple Zip archive.
$ file dist/matfyz_nswi177_fscat-0.0.1-py3-none-any.whl
dist/matfyz_nswi177_fscat-0.0.1-py3-none-any.whl: Zip archive data, at least v2.0 to extract, compression method=deflate
$ unzip -l dist/matfyz_nswi177_fscat-0.0.1-py3-none-any.whl
Archive: dist/matfyz_nswi177_fscat-0.0.1-py3-none-any.whl
Length Date Time Name
--------- ---------- ----- ----
51 2024-04-24 10:48 fscat/__init__.py
837 2024-04-24 10:48 fscat/fscat.py
777 2024-04-24 10:48 fscat/main.py
1075 2024-04-24 10:53 matfyz_nswi177_fscat-0.0.1.dist-info/LICENSE
1173 2024-04-24 10:53 matfyz_nswi177_fscat-0.0.1.dist-info/METADATA
92 2024-04-24 10:53 matfyz_nswi177_fscat-0.0.1.dist-info/WHEEL
42 2024-04-24 10:53 matfyz_nswi177_fscat-0.0.1.dist-info/entry_points.txt
6 2024-04-24 10:53 matfyz_nswi177_fscat-0.0.1.dist-info/top_level.txt
769 2024-04-24 10:53 matfyz_nswi177_fscat-0.0.1.dist-info/RECORD
--------- -------
4822 9 files
You may wonder, why there are two archives with very similar content. The answer can be found in What Are Python Wheels and Why Should You Care?.
You can now switch to a different virtualenv and install the package
using pip install
package.whl.
Publishing a Python package
If you think that the package could be useful to other people, you can publish it in the Python Package Index. This is usually accomplished using the twine tool. The precise steps are described in Uploading the distribution archives.
Creating distribution packages (e.g. for DNF)
While the work for creating the project files may seem to complicate things a lot, it actually saves time in the long run.
Virtually any Python developer would be now able to install your program and have a clear starting point when investigating other details.
Note that if you have installed some program via DNF system-wide and that
program was written in Python, somewhere inside it, there was setup.cfg
that looked
very similar to the one you have just seen.
Only instead of installing the script into your virtual environment, it was
installed globally.
There is really no other magic behind it.
Note that for example Ranger is written in Python and
this script
describes its installation (it is a script for creating packages for DNF).
Note that %py3_install
is a macro that actually calls setup.py install
.
Higher-level tools
We can think of pip
and virtualenv
as low-level tools. However, there
are also tools that combine both of them and bring more comfort to package
management. In Python, there are at least two favorite choices, namely
Poetry and
Pipenv.
Internally, these tools use pip
and venv
, so you are still able to
have independent working spaces as well as the possibility to install a
specific package from the Python Package Index (PyPI).
The complete introduction of these tools is out of the scope for this course. Generally, they follow the same principles, but they add some extra functions that are nice to have. Briefly, the major differences are:
- They can freeze specific versions of dependencies, so that the project
builds the same on all machines (using
poetry.lock
file). - Packages can be removed together with their dependencies.
- It is easier to initialize a new project.
Other languages
Other languages have their own tools with similar functions:
Tasks to check your understanding
We expect you will solve the following tasks before attending the labs so that we can discuss your solutions during the lab.
Learning outcomes
Learning outcomes provide a condensed view of fundamental concepts and skills that you should be able to explain and/or use after each lesson. They also represent the bare minimum required for understanding subsequent labs (and other courses as well).
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …
-
explain the difference between a normal SSH port forward and a reverse port forward
-
explain what are requirements (library dependencies)
-
explain fundamentals of semantic versioning
-
explain what are pros and cons of installing dependencies system-wide vs installing them in a sandboxed environment
-
provide a high-level overview of a sandbox environment
-
explain pros and cons of specifying transitive requirements vs specification of top-level ones only
-
explain pros and cons of using exact versions vs minimal requirements
Practical skills
Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should be able to …
-
use
xargs
program -
use
find
with basic predicates (-name
,-type
) and actions (-exec
,-delete
) -
use SSH port forward to access service available on loopback device
-
use reverse SSH port forward to connect to a machine behind a NAT
-
create a new virtual environment for Python using
python3 -m venv
-
activate and deactivate virtual environment
-
install project dependencies in a virtual environment with
pip
-
develop program inside a virtual environment (with projects using
setup.cfg
andpyproject.toml
files) -
install Python project from its
setup.cfg
-
optional: setup Python project for installation