3 Development Workflow
It is important to create a development environment and workflow that not only allows effective collaboration but also sets a foundation for the growth and evolution of your project. In this guide, we discuss organizing your project in a repository and setting up a workflow for personal and collaborative projects.
3.1 Project organization
In software development, the initial choices will affect the final outcomes of our project. Among these choices, an important one is how to structure your project. To ensure your work is reproducible, a crucial initial step is to systematically organize your projects.
3.1.1 Essential principles
- Directory Structure: Employ a consistent and meaningful directory naming convention.
- Naming Files and Directories: Use underscores or hyphens.
- Handling Access Levels: Utilize different Git repositories for public and private parts of your project. Use
.gitignore
or a specific non-tracked folder for sensitive content and/or files that are too large. - Clear Documentation: Include a
README
at the root to provide a project summary and add an appropriateLICENSE
to your project. This establishes the terms under which others can engage, reuse, and modify it. Also, this ensures your work is legally safeguarded and the usage rights are clearly defined. - Adhere to Coding Standards: Follow a consistent coding style to enhance code readability.
3.1.2 Other recommendations
- Code Reusability: Store reusable software elements in a separate repository for efficiency across projects and consider packaging them.
- Code Modularity: Aim for modular code design to improve maintainability and reusability, especially in larger projects.
- Dependency Management: Use virtual environments (Python) or similar tools to manage project dependencies, ensuring consistent environments.
- CI/CD Integration: Consider setting up Continuous Integration/Continuous Deployment pipelines to streamline testing and deployment processes.
A common repository structure that works well for MATLAB and Python projects:
your_project/
│
├── build/ # Compiled application for distribution (if applicable)
├── docs/ # documentation directory
├── lib/ # third-party libraries
├── notebooks/ # Jupyter notebooks or MATLAB Live Editor scripts
├── src/ # your project's source code, including the main script
│ └── mypkg/ # package
│ ├── module # nested module
│ └── subpkg1/ # sub-package
├── tests/ # your test directory
│
├── data/ # data files used in the project (if applicable)
├── processed_data/ # files from your analysis (if applicable)
├── results/ # results (if applicable)
│
├── .gitignore # untracked files
├── requirements.txt # software dependencies (Python)
├── README.md # overview
└── LICENSE # license information
This structure is a guideline and can be adapted based on the specific needs and practices of your project. Some additional observations:
- Naming convention: use lowercase for folders. Particular metadata files are often capitalized, such as README, LICENSE, CONTRIBUTING, CODE_OF_CONDUCT, CHANGELOG, CITATION.cff, NOTICE, and MANIFEST.
- Carefully consider how users will access your software. They may not have access to your repository structure when installing it as a library.
- Generally, all content that is generated upon build- or runtime should be added to
.gitignore
. This likely includes the content ofprocessed_data
andresults
folder. - Git cannot track empty folders. If you want to add empty folders to enforce a folder structure, e.g.,
processed_data
orresults
, add the file.gitkeep
to the folder.
3.2 Project templates
Templates are versatile tools that aim to standardize the software development process across various domains.
3.2.1 GitHub repository templates
You can make an existing repository a template, so you and others can generate new repositories with the same directory structure, branches, and files. Note, the template repository cannot include files stored using Git LFS. For more info, check out Creating a template repository.
3.3 Reusing projects and repositories
Packaging
Create an installable package or library that can be installed as a dependency in the environment.
Git submodules
Git submodules allow you to keep a Git repository as a subdirectory of another Git repository. It is a record that points to a specific commit in another external repository. Submodules are useful for incorporating external code or libraries into your project while keeping them separate and easily updatable.
Adding submodules
This will add a new submodule to your repository: git submodule add <repo-url>
Cloning a repository with submodules
When you clone a repository that has submodules, you will have to initialize and fetch the submodules: git submodule init
and then git submodule update
.
To update the submodules to the latest commit use: git submodule update --remote
.
You can also point to a specific commit within a submodule by navigating to the submodule’s directory and using: git checkout <specific-commit>
, and then committing the change to the main repository.
You can use the shorthand command that automatically clones, initializes, and updates all the submodules:
git clone --recurse-submodules <repo-url>
Check the status of your submodules
To check the status of your submodules, run: git submodule status
There should also be a file called .gitmodules
, it’s important to also version control that similarly to .gitignore
. Then, commit and push your changes, as you would typically.
If you are using GitHub Desktop, be aware that there might be some limitations when working with submodules. While GitHub Desktop supports basic submodule functionality, some operations may require using the command line. Known issues include difficulties in initializing submodules, switching branches with submodules, and visualizing submodule changes. These limitations are acknowledged and tracked by the GitHub Desktop team. Although some issues have been addressed over time, there might still be case-by-case issues.
See this discussion as an example. For more details, refer to the official GitHub Desktop documentation or issue tracker.
- Simplified Git submodules tutorial
- Guide on Git submodules - comprehensive guide that covers everything from the basics to advanced workflows
Git subtree
Git subtree allows you to merge the history of one repository into another as a subdirectory. It essentially brings the contents of a repository into another as if it were part of the directory structure.
In summary, submodules are more suitable when you need to maintain separate histories and explicit references to specific commits of nested repositories, while subtrees are useful when you want to merge the history of nested repositories into a single repository without maintaining separate references.
- Storing commonly-used folders in a separate folder on your system and adding the folder to the Python PATH. Other users/developers will not have access to these folders.
- Direct copy-and-pasting of code as you lose any upstream changes to the external repository.
3.4 Dependency management
Managing dependencies is a critical aspect of any software project. Efficient dependency management ensures that your project is reproducible, easy to set up, and less prone to conflicts between the different libraries that your code depends on.
3.4.1 Python
Ensuring that every contributor uses the same dependency versions is essential for project consistency and stability.
- Virtual Environments: Use
venv
orvirtualenv
to create isolated Python environments for your projects. This prevents package versions from interfering with each other across different projects. - Requirements File: A
requirements
file to list all dependencies with their specific versions. You can generate this file using the commandpip freeze > requirements.txt
in an activated virtual environment. - Dependency Management Tools: Tools like
poetry
andpipenv
provide a more sophisticated dependency management by handling virtual environment creation and dependency resolution in a more integrated manner.
Consider using Conda, it is a preferred choice within the research software community. Conda is a system package manager that allows for managing both packages and environments. It is ideal for projects requiring specific Python versions, packages not available via pip, and other dependencies such as R libraries, C and C++ libraries.
3.4.2 MATLAB
MATLAB does not use virtual environments in the same sense as Python, but it allows for setting up paths and toolboxes that act similarly by organizing and encapsulating project-specific functions and scripts. Dependency management in MATLAB often involves ensuring the correct toolboxes are licensed and available, and using MATLAB’s Project feature to manage and share paths and environments with others.
MATLAB toolbox requirements can be found with the function requiredfilesandproducts
or with the Dependency Analyzer.