Skip to content
Snippets Groups Projects
Commit 2d27beaf authored by Johannes Keyser's avatar Johannes Keyser
Browse files

More information all around...

- More about decision for/against LFS.
- More technical description.
- Clarify LFS can be externalized.
- How to ignore LFS content.
- Describe links.
parent a164f004
No related branches found
No related tags found
No related merge requests found
# Contributing # Contributing
Anyone is welcome to contribute. Anyone is welcome to contribute clarifications or additional material.
Please note, that all data, code and text must be self-authored and [licensed as CC0](LICENSE.md) (or the material must be properly cited and licensed openly). Preferably, all data, code, and text here should be self-authored and [licensed as CC0](LICENSE.md).
In case you want to add material from another source, make sure it is openly licensed and you add a proper citation.
Also note that this project is publicly available to anyone. Note that this project is publicly available to anyone.
# Git Large File Storage # Git Large File Storage: How To
<img src="logo.svg" width="100px" /> <img src="logo.svg" width="100px" />
*How to use Git LFS on JLU GitLab (and if this is a good idea)* *Support information how to use Git LFS on JLU GitLab, and if this is a good idea.*
__NOTE: This project is work in progress!__ __Please note: This is work in progress; any [contributions](CONTRIBUTING.md) are welcome!__
[[_TOC_]] [[_TOC_]]
...@@ -22,32 +22,85 @@ There are technical reasons in Git's design to make this extra step necessary: ...@@ -22,32 +22,85 @@ There are technical reasons in Git's design to make this extra step necessary:
Even if you work alone, this is a bit of a hassle — it gets much worse if other people have clones, because history change requires everyone involved to confirm the deletion with their clone. Even if you work alone, this is a bit of a hassle — it gets much worse if other people have clones, because history change requires everyone involved to confirm the deletion with their clone.
### Optional: More technical details
With the Git LFS extension, you can version control (large) files "in association" with a Git repository.
Instead of storing a file within the Git repository as a *blob*, Git LFS only stores *pointer files* in the repository, but stores the actual file contents on a (separate) Git LFS server (and locally, in another folder).
A file tracked by Git LFS gets downloaded only if needed, e.g. when you check out a Git branch containing the tracked file (but it also gets cached locally, if you downloaded it before).
LFS uses Git *filters* and *hooks* to coordinate between the normal repository and the tracked files.
A *smudge filter* gets the file contents based on the pointer file.
A *clean filter* creates a new version of the pointer file if the file changes.
When you `push` a commit that contains a new/changed file tracked by LFS, a *pre-push hook* takes care to separately upload the large file contents to the Git LFS server.
## Is it a good idea to use Git LFS for your project? ## Is it a good idea to use Git LFS for your project?
You must consider several aspects before uploading any data to JLU GitLab. You must consider several aspects before uploading any research data to JLU GitLab.
Please read [this information](https://gitlab.test.uni-giessen.de/jlugitlab/support/-/blob/master/en/Information.md#storage-of-research-data) on research data management. Please read [this information](https://gitlab.test.uni-giessen.de/jlugitlab/support/-/blob/master/en/Information.md#storage-of-research-data) on research data management.
If in doubt about your specific situation, please consult the department for research data, [forschungsdaten@uni-giessen.de](mailto:forschungsdaten@uni-giessen.de). If in doubt about your specific situation, please consult the department for research data, [forschungsdaten@uni-giessen.de](mailto:forschungsdaten@uni-giessen.de).
- FIXME: Clarify that we talk about LFS on JLU GitLab, but that people can run their own LFS server with different settings/policies! With this in mind, using Git LFS may be a natural choice, if:
1. You're already using Git to organize your project,
2. data are an integral part of the project,
3. the data files are binary and/or larger than a few KiB,
4. all project members who need to access these data files can work with Git LFS, and
5. all machines that need to access these files have network access to the LFS server.
## Practical steps how to use Git LFS ### Optional: External Git LFS server
Assumptions: As an alternative to storing LFS data on JLU GitLab, you could store LFS data on an external server, while still using JLU GitLab to host your project.
- You have Git installed on your machine and you know the basics how to use it (TODO, specify: `add`, `commit`, `pull`, `push`). For example, your workgroup could run their own Git LFS server; you can choose from e.g. [this list](https://github.com/git-lfs/git-lfs/wiki/Implementations).
(If you don't, [here is a good starting point](https://git-scm.com/)). The advantage of an external LFS server is independence from JLU GitLab; e.g. you could implement different policies, such as potentially more suitable security practices.
## Practical tips how to use Git LFS
The following tips make the following assumptions:
- You have Git installed on your machine and you know the basics (if you don't, [here is a good starting point](https://git-scm.com/)).
- You have a project on JLU GitLab that includes a Git repository, and you have a local clone of it on your machine. - You have a project on JLU GitLab that includes a Git repository, and you have a local clone of it on your machine.
- You can type Git commands into a command line interface (terminal). - You can type Git commands into a command line interface (terminal).
Below, example terminal commands are indicated with a different font and with a leading dollar sign, `$ like this`. Below, example commands are indicated with a different font and with a leading dollar sign, `$ like this`; to reproduce them, drop the dollar sign `$ `.
- You have the Git LFS extension installed on your machine (you can find [instructions here](https://git-lfs.github.com/)). - You have the Git LFS extension installed on your machine (you can find [instructions here](https://git-lfs.github.com/)); you can check e.g. with typing `$ git lfs version`.
Practical steps: ### Basic use
1. In your local repository clone, configure which types of files you want to track by LFS. 1. Set up Git LFS; you have to do this once per machine and repository: `$ git lfs install`.
- For example, to let LFS keep track of `CSV` files, type: `$ git lfs track "*.csv"`. 2. In your local repository, choose what types of files to track by LFS.
- For example, to track all `CSV` files, type: `$ git lfs track "*.csv"`.
- This will create/change the Git configuration file [`.gitattributes`](.gitattributes). - This will create/change the Git configuration file [`.gitattributes`](.gitattributes).
You should track this configuration change in Git, e.g. by the usual Git commands `$ git add .gitattributes` and `$ git commit -m "start tracking HDF files with LFS"`. You should track this configuration change in the repository itself, with the usual Git commands `$ git add .gitattributes` and `$ git commit -m "start tracking CSV files with LFS"`.
*Note that because the file name `.gitattributes` starts with a dot, it may be hidden from view (on Linux and MacOS, use `$ ls -a` to see it; FIXME: What to do on Windows?).* *Note that because the file name `.gitattributes` starts with a dot, it may be hidden from view (on Linux and MacOS, use `$ ls -a` to see it; FIXME: What to do on Windows?).*
2. FIXME: How to generally add files into LFS, and how to interact with them. 3. Now you can interact with the LFS-tracked files in the usual way to control versions with Git.
3. FIXME: Clarify locking mechanism, relevant for people working in teams, see https://github.com/git-lfs/git-lfs/wiki/File-Locking. For example, to make a new snapshot with a file `some_data.csv`, use the usual commands `add`, `commit`, and `push` like any other file in the repository:
```
$ git add some_data.csv
$ git commit -m "Add data to LFS"
$ git push
```
### Optional: File locking mechanism
FIXME: Clarify locking mechanism, may be especially relevant for people working in teams, see https://github.com/git-lfs/git-lfs/wiki/File-Locking.
### Optional: Ignore LFS files
You may want to work with a Git repository but ignore the (large) files stored by LFS.
For example, you may want to clone a repository on a machine that doesn't have access to the Git LFS server, or you simply don't require the large files.
To temporally ignore the LFS content, you can set the [environment variable](https://en.wikipedia.org/wiki/Environment_variable) called `GIT_LFS_SKIP_SMUDGE` to the value `1`.
To stop ignoring LFS, set the variable to `0`.
The syntax to set the variable depends on your command line interface:
- On Windows, type `$ set GIT_LFS_SKIP_SMUDGE=1`.
- For Bash (e.g. Linux), type `$ export GIT_LFS_SKIP_SMUDGE=1`.
After that, you can e.g. clone the repository without downloading the LFS files:
```
$ git clone <REMOTE-URL> <LOCAL-FOLDER>
```
TODO?: You can also ignore LFS files permanently, via Git configuration.
## Example(s) ## Example(s)
- [Here](example) you can find an of an analysis script that relies on data stored in Git LFS. - [Here](example) you can find an of an analysis script that relies on data stored in Git LFS.
...@@ -55,5 +108,6 @@ Practical steps: ...@@ -55,5 +108,6 @@ Practical steps:
## Useful links ## Useful links
- https://git-lfs.github.com/ - Main website about Git LFS: https://git-lfs.github.com/
- https://docs.gitlab.com/ce/topics/git/lfs/ - Information on LFS on GitLab: https://docs.gitlab.com/ce/topics/git/lfs/
- A list of LFS server implementations: https://github.com/git-lfs/git-lfs/wiki/Implementations
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment