13  Packaging

13.1 Packaging in Python

Packaging your Python code offers an effective way for distribution and reuse. By turning your code into a library and hosting it on a platform like PyPI (Python Package Index), you can significantly broaden your project’s reach. Moreover, embracing this approach not only enhances the quality and sustainability of your software but also invites contributions from external collaborators.

13.1.1 pyproject.toml

The pyproject.toml file has become the standard configuration file for packaging tools. This file contains metadata about the project and specifies which build tools should be used. The pyproject.toml consists of TOML tables, and can include [build-system], [project], or [tools] tables.

[build-system]

The [build-system] table is essential because it defines which build backend you will be using, and also which dependencies are required to build your project. This is needed because frontend tools like pip are not responsible for transforming your source code into a distributable package, and this is handled by one of the build backends (e.g. Hatchling, setuptools).

[build-system]
requires = ["setuptools>=64.0"]
build-backend = "setuptools.build_meta"

[project]

Under the [project] table you can describe your metadata. It can become quite extensive, but this is where you would list the name of your project, version, authors, licensing, dependencies specific to your project, and other requirements, as well as other optional information. For a detailed list of what can be included under [project] check the Declaring project metadata section of Python Packaging Guide.

[project]

name = "exampleproject"
# Define the name of your project here. This is mandatory. Once you publish your package for the first time,
# this name will be locked and associated with your project. It affects how users will
# install your package via pip, like so:
#
# $ pip install exampleproject
#
# Your project will be accessible at: https://pypi.org/project/exampleproject/
#
version = "2.0.0"
# Version numbers should conform to PEP 440, and are also mandatory (but they can be set dynamic)
# https://www.python.org/dev/peps/pep-0440/
#
description = "Short description of your project"
# Provide a short, one-line description of what your project does. This is known as the
# "Summary" metadata field:
# https://packaging.python.org/specifications/core-metadata/#summary
#
readme = "README.md"
# Here, you can include a longer description which often mirrors your README file.
# This description will appear on PyPI when your project is published.
# This corresponds to the "Description" metadata field:
# https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#readme
#
requires-python = ">=3.8"
# Indicate the versions of Python your project is compatible with. Unlike the
# 'Programming Language' classifiers, 'pip install' will verify this field
# and prevent installation if the Python version does not match.
#
license = {file = "LICENSE.txt"}
# This specifies the license.
# It can be a text (e.g. license = {text = "MIT License"}) or a reference to a file with the license text as shown above.
#
keywords = ["wind-energy", "simulation"]
# Keywords that describe your project. These assist users in discovering your project on PyPI searches.
# These should be a comma-separated list reflecting the nature or domain of the project.
#
authors = [
  {name = "A. Doe", email = "author@tudelft.nl" }
]
# Information about the original authors of the project and their contact details.
#
maintainers = [
  {name = "B. Smith", email = "maintainer@tudelft.nl" }
]
# Information about the current maintainers of the project and their contact details.
#
#
#
# Classifiers help categorize the project on PyPI and aid in discoverability.
# For a full list of valid classifiers, see https://pypi.org/classifiers/
classifiers = [
  # Indicate the development status of your project (maturity). Commonly, this is
  #   3 - Alpha
  #   4 - Beta
  #   5 - Stable
  #.  6 - Mature
  "Development Status :: 4 - Beta",

  # Target audience
  "Intended Audience :: Developers",
  "Topic :: Scientific/Engineering",

  # License type
  "License :: OSI Approved :: MIT License",

  # Python versions your software supports. This is not checked by pip install, and is different from "requires-python".
  "Programming Language :: Python :: 3",
  "Programming Language :: Python :: 3.8",
  "Programming Language :: Python :: 3.9",
  "Programming Language :: Python :: 3.10",
  "Programming Language :: Python :: 3.11",
  "Programming Language :: Python :: 3 :: Only",
]

# Dependencies needed by your project. These packages will be installed by pip when
# your project is installed. Ensure these are existing, valid packages.
#
# For more on how this field compares to pip's requirements files, see:
# https://packaging.python.org/discussions/install-requires-vs-requirements/
dependencies = [
  "numpy", 
  "pandas>=1.5.3", 
  "matplotlib>=3.7.1"
]
#
# You can define additional groups of dependencies here (e.g., development dependencies).
# These can be installed using the "extras" feature of pip, like so:
#
#   $ pip install exampleproject[dev]
#
# These are often referred to as "extras" and provide optional functionality.
[project.optional-dependencies]
test = ["coverage"]
#
[project.urls]
"Homepage" = "https://github.com/awegroup"
"Source" = "https://github.com/awegroup/MegAWES"
#
# List of relevant URLs for your project. These are displayed on the left sidebar of your PyPI page.
# This can include links to the homepage, source code, changelog, funding, etc.
#
#
# This [project] example was adopted from https://github.com/pypa/sampleproject/blob/main/pyproject.toml

[tools]

The [tool] table contains subtables specific to each tool. For example, Poetry uses the [tool.poetry] table instead of the [project] table.

Example: Poetry project setup

Difference between [build system] and [project]

The [build-system] and [project] tables serve distinct roles. The [build-system] table is essential and must always be included, as it specifies the build tool used, regardless of the backend. On the other hand, the [project] table is recognized by most build backends for defining project metadata, though some backends may not and use a different format.

Before shifting to pyproject.toml, a common approach was to use a setup.py build script. You might encounter them in legacy projects.

13.1.2 Package structuring

If you want to distribute your Python code as a package, you will need to have an __init__.py file in the root directory of your package. This allows Python to treat that directory as a package that can be imported. Every subfolder should also contain an __init__.py file.

When importing a package, Python searches through the directories on sys.path looking for the package subdirectory. The presence of __init__.py files within these directories is essential, as it tells Python that these directories should be treated as packages. This mechanism helps avoid the scenario where directories with commonplace names, accidentally overshadow valid modules that appear later in the search path.

While __init__.py can simply be an empty file, serving just to mark a directory as a package, it can also contain code that runs when the package is imported. This code can initialize package-level variables, import submodules, and other tasks.

Referring to our project organization in or Software Development Workflow guide we can build on top of that structure.

In a flat layout, the project’s root directory directly contains the package directories and modules. This layout is straightforward and works well for simple projects.

your_project/

├── mypkg/
   ├── __init__.py
   ├── module.py
   └── subpkg1/
       └── __init__.py

...

The src layout places the package directory inside a top-level src directory. This layout helps prevent accidental imports from the current working directory, ensuring that you always import from the installed package rather than the source directory.

your_project/

├── src/
   └── mypkg/
       ├── __init__.py
       ├── module.py
       └── subpkg1/
           └── __init__.py

...

So our example package structure would now look like this:

your_project/

├── docs/                     # documentation directory
├── notebooks/                # Jupyter notebooks or MATLAB Live Editor scripts
├── src/                      # your project's source code, including the main script
   └── yourpkg_name/         # Package
       ├── __init__.py       # Package initializer
       ├── module            # nested module
       └── subpkg1/          # sub-package
           └── __init__.py   # Sub-package initializer
├── tests/                    # your test directory  

├── data/                     # data files used in the project (if applicable)
├── processed_data/           # files from your analysis (if applicable)
├── results/                  # results (if applicable)

├── .gitignore                # untracked files 
├── pyproject.toml            # pyproject.toml
├── README.md                 # overview
└── LICENSE                   # license information

You might notice that in our updated structure the requirements.txt is absent. In many cases, if you have a pyproject.toml file, you may not need a requirements.txt file anymore, since the pyproject.toml file is part of the new standardized Python packaging format (defined in PEP 518) and can include dependencies.

However, some deployment and CI/CD pipelines might still expect a requirements.txt file, because a set of fixed dependency versions creates more stable pipelines. For simple projects, you can still prefer to use a requirements.txt for its simplicity and wide adoption.

It is not considered best practice to use the pyproject.toml to pin dependencies to specific versions or to specify sub-dependencies (i.e. dependencies of your dependencies). This is overly-restrictive, and prevents a user from gaining the benefit of dependency upgrades. For more info, see this discussion.

We also do not include lib/ and build/ directories:

  • The build/ directory is typically used to store compiled or built artifacts of your project, such as binary executables, wheels, or other distribution files. This directory is usually not part of your source code repository and is generated during the (automated) build or packaging process.
  • The lib/ directory stores third-party libraries or dependencies that are not installed through a package manager. By specifying your project’s dependencies in the pyproject.toml file, and using a package manager like pip or poetry to install and manage them, these dependencies will be automatically downloaded and installed in the appropriate location (usually the site packages directory).

13.1.3 Local package installation

By installing a Python package locally during development you can test your changes in an environment that mimics how the package will be used once it’s deployed. This process allows you to ensure that your package works correctly when installed and imported by others.

You can use pip to install your package in editable mode (-e). This way, changes you make to the source code are immediately available without needing to reinstall the package.

pip install -e .

13.1.4 Testing packaging on TestPyPI before publishing to PyPI

By testing your package on TestPyPI before publishing it to PyPI, you can identify and address any issues with your package metadata, dependencies, or distribution files before making your package publicly available.

You’ll need to create an account for TestPyPI. The next step is to create distribution packages for your package. These packages are archives that can be uploaded to TestPyPI/PyPI and installed using pip. Afterward, you can use Twine to upload your package to TestPyPI.

  1. Register on TestPyPI.
  2. Check if PyPA build is installed:
    • pip install --upgrade build
  3. Run either python3 -m build (Linux/macOS) or python -m build (Windows) from the same directory where pyproject.toml is located. This creates the distribution packages.
  • After running this command, you’ll see a substantial amount of text output. Upon completion, it will generate two files (a wheel and .tar.gz file) in the dist/ directory. The .tar.gz file represents a source distribution, while the .whl file is a built distribution. More recent versions of pip prioritize the installation of built distributions, reverting to source distributions if necessary. It’s advisable to always upload a source distribution and include built distributions compatible with the platforms your project supports.
  1. Install Twine (pip install twine).
  2. Upload to TestPyPI by specifying the --repository flag.
    • twine upload --repository testpypi dist/*
  3. You will find your package on https://test.pypi.org/project/yourproject. You can then pip install it by adding the --index-url flag.
    • pip install --index-url https://test.pypi.org/simple/yourpackage

13.1.5 Publishing to PyPI

Publishing your package to PyPI makes it accessible to anyone in the Python community through a simple pip install your_package command.

You’ll also need an account for PyPI. TestPyPI and PyPI use separate databases so you need to register on both sites.

  1. Register on PyPI.
  2. Run pip install --upgrade build
  3. Then run python3 -m build (Linux/macOS) or py -m build (Windows) from the same directory where pyproject.toml is located.
  4. Use twine upload dist/* to upload your package to PyPI. Input your credentials associated with the account you registered on the official PyPI platform.
  5. Your package is live on PyPI.
  6. You can now install it by simply pip install yourpackage
Tip

If you need a particular name for your package, check whether it is taken on PyPI and claim it as soon as possible if available.

13.2 Packaging and publishing in MATLAB

MATLAB does not have a centralized repository similar to PyPI. Packaging in MATLAB involves creating a .mlappinstall file or a toolbox .mltbx file which can be shared directly with users or through MATLAB File Exchange.

A MATLAB app is a self-contained MATLAB program with a user interface that automates a task or calculation. When you package an app, the app packaging tool:

  • Performs a dependency analysis that helps you find and add the files your app requires.
  • Reminds you to add shared resources and helper files.
  • Stores information you provide about your app with the app package. This information includes a description, a list of additional MATLAB products required by your app, and a list of supported platforms.
  • Automates app updates (versioning).

Then, when others install your app:

  • It is a one-click installation.
  • Users do not need to manage the MATLAB search path or other installation details.
  • Your app appears alongside MATLAB toolbox apps in the apps gallery.