diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d43bb346db6f70f3284fb578b7073a8d26a5b7b6..4239d9c1db3f7eaf35af0199720ebf0b62bc6ae6 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,8 +1,9 @@ # Contributing -Anyone is welcome to contribute. +Anyone is welcome to contribute clarifications or additional material. -Please note, that all data, code and text must be self-authored and [licensed as CC0](LICENSE.md) (or the material must be properly cited and licensed openly). +Preferably, all data, code, and text here should be self-authored and [licensed as CC0](LICENSE.md). +In case you want to add material from another source, make sure it is openly licensed and you add a proper citation. -Also note that this project is publicly available to anyone. +Note that this project is publicly available to anyone. diff --git a/README.md b/README.md index 1806c4c59cbaa31195099d90baf84a58d82319f5..887b03711567437cbc600745b65c52a689e1ff28 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,10 @@ -# Git Large File Storage +# Git Large File Storage: How To <img src="logo.svg" width="100px" /> -*How to use Git LFS on JLU GitLab (and if this is a good idea)* +*Support information how to use Git LFS on JLU GitLab, and if this is a good idea.* -__NOTE: This project is work in progress!__ +__Please note: This is work in progress; any [contributions](CONTRIBUTING.md) are welcome!__ [[_TOC_]] @@ -22,32 +22,85 @@ There are technical reasons in Git's design to make this extra step necessary: Even if you work alone, this is a bit of a hassle — it gets much worse if other people have clones, because history change requires everyone involved to confirm the deletion with their clone. +### Optional: More technical details +With the Git LFS extension, you can version control (large) files "in association" with a Git repository. +Instead of storing a file within the Git repository as a *blob*, Git LFS only stores *pointer files* in the repository, but stores the actual file contents on a (separate) Git LFS server (and locally, in another folder). +A file tracked by Git LFS gets downloaded only if needed, e.g. when you check out a Git branch containing the tracked file (but it also gets cached locally, if you downloaded it before). + +LFS uses Git *filters* and *hooks* to coordinate between the normal repository and the tracked files. +A *smudge filter* gets the file contents based on the pointer file. +A *clean filter* creates a new version of the pointer file if the file changes. +When you `push` a commit that contains a new/changed file tracked by LFS, a *pre-push hook* takes care to separately upload the large file contents to the Git LFS server. + + ## Is it a good idea to use Git LFS for your project? -You must consider several aspects before uploading any data to JLU GitLab. +You must consider several aspects before uploading any research data to JLU GitLab. Please read [this information](https://gitlab.test.uni-giessen.de/jlugitlab/support/-/blob/master/en/Information.md#storage-of-research-data) on research data management. If in doubt about your specific situation, please consult the department for research data, [forschungsdaten@uni-giessen.de](mailto:forschungsdaten@uni-giessen.de). -- FIXME: Clarify that we talk about LFS on JLU GitLab, but that people can run their own LFS server with different settings/policies! +With this in mind, using Git LFS may be a natural choice, if: + +1. You're already using Git to organize your project, +2. data are an integral part of the project, +3. the data files are binary and/or larger than a few KiB, +4. all project members who need to access these data files can work with Git LFS, and +5. all machines that need to access these files have network access to the LFS server. -## Practical steps how to use Git LFS -Assumptions: -- You have Git installed on your machine and you know the basics how to use it (TODO, specify: `add`, `commit`, `pull`, `push`). - (If you don't, [here is a good starting point](https://git-scm.com/)). +### Optional: External Git LFS server +As an alternative to storing LFS data on JLU GitLab, you could store LFS data on an external server, while still using JLU GitLab to host your project. +For example, your workgroup could run their own Git LFS server; you can choose from e.g. [this list](https://github.com/git-lfs/git-lfs/wiki/Implementations). +The advantage of an external LFS server is independence from JLU GitLab; e.g. you could implement different policies, such as potentially more suitable security practices. + + +## Practical tips how to use Git LFS +The following tips make the following assumptions: + +- You have Git installed on your machine and you know the basics (if you don't, [here is a good starting point](https://git-scm.com/)). - You have a project on JLU GitLab that includes a Git repository, and you have a local clone of it on your machine. - You can type Git commands into a command line interface (terminal). - Below, example terminal commands are indicated with a different font and with a leading dollar sign, `$ like this`. -- You have the Git LFS extension installed on your machine (you can find [instructions here](https://git-lfs.github.com/)). + Below, example commands are indicated with a different font and with a leading dollar sign, `$ like this`; to reproduce them, drop the dollar sign `$ `. +- You have the Git LFS extension installed on your machine (you can find [instructions here](https://git-lfs.github.com/)); you can check e.g. with typing `$ git lfs version`. + -Practical steps: -1. In your local repository clone, configure which types of files you want to track by LFS. - - For example, to let LFS keep track of `CSV` files, type: `$ git lfs track "*.csv"`. +### Basic use +1. Set up Git LFS; you have to do this once per machine and repository: `$ git lfs install`. +2. In your local repository, choose what types of files to track by LFS. + - For example, to track all `CSV` files, type: `$ git lfs track "*.csv"`. - This will create/change the Git configuration file [`.gitattributes`](.gitattributes). - You should track this configuration change in Git, e.g. by the usual Git commands `$ git add .gitattributes` and `$ git commit -m "start tracking HDF files with LFS"`. + You should track this configuration change in the repository itself, with the usual Git commands `$ git add .gitattributes` and `$ git commit -m "start tracking CSV files with LFS"`. *Note that because the file name `.gitattributes` starts with a dot, it may be hidden from view (on Linux and MacOS, use `$ ls -a` to see it; FIXME: What to do on Windows?).* -2. FIXME: How to generally add files into LFS, and how to interact with them. -3. FIXME: Clarify locking mechanism, relevant for people working in teams, see https://github.com/git-lfs/git-lfs/wiki/File-Locking. +3. Now you can interact with the LFS-tracked files in the usual way to control versions with Git. + For example, to make a new snapshot with a file `some_data.csv`, use the usual commands `add`, `commit`, and `push` like any other file in the repository: + ``` + $ git add some_data.csv + $ git commit -m "Add data to LFS" + $ git push + ``` + + +### Optional: File locking mechanism +FIXME: Clarify locking mechanism, may be especially relevant for people working in teams, see https://github.com/git-lfs/git-lfs/wiki/File-Locking. + + +### Optional: Ignore LFS files +You may want to work with a Git repository but ignore the (large) files stored by LFS. +For example, you may want to clone a repository on a machine that doesn't have access to the Git LFS server, or you simply don't require the large files. + +To temporally ignore the LFS content, you can set the [environment variable](https://en.wikipedia.org/wiki/Environment_variable) called `GIT_LFS_SKIP_SMUDGE` to the value `1`. +To stop ignoring LFS, set the variable to `0`. + +The syntax to set the variable depends on your command line interface: +- On Windows, type `$ set GIT_LFS_SKIP_SMUDGE=1`. +- For Bash (e.g. Linux), type `$ export GIT_LFS_SKIP_SMUDGE=1`. + +After that, you can e.g. clone the repository without downloading the LFS files: + +``` +$ git clone <REMOTE-URL> <LOCAL-FOLDER> +``` +TODO?: You can also ignore LFS files permanently, via Git configuration. ## Example(s) - [Here](example) you can find an of an analysis script that relies on data stored in Git LFS. @@ -55,5 +108,6 @@ Practical steps: ## Useful links -- https://git-lfs.github.com/ -- https://docs.gitlab.com/ce/topics/git/lfs/ +- Main website about Git LFS: https://git-lfs.github.com/ +- Information on LFS on GitLab: https://docs.gitlab.com/ce/topics/git/lfs/ +- A list of LFS server implementations: https://github.com/git-lfs/git-lfs/wiki/Implementations