From 2d27beaf16c385476d8827c6cec62d564083fe03 Mon Sep 17 00:00:00 2001
From: Johannes Keyser <johannes.keyser@sport.uni-giessen.de>
Date: Wed, 24 Feb 2021 18:46:29 +0100
Subject: [PATCH] More information all around...

- More about decision for/against LFS.
- More technical description.
- Clarify LFS can be externalized.
- How to ignore LFS content.
- Describe links.
---
 CONTRIBUTING.md |  7 ++--
 README.md       | 92 +++++++++++++++++++++++++++++++++++++++----------
 2 files changed, 77 insertions(+), 22 deletions(-)

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index d43bb34..4239d9c 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,8 +1,9 @@
 # Contributing
 
-Anyone is welcome to contribute.
+Anyone is welcome to contribute clarifications or additional material.
 
-Please note, that all data, code and text must be self-authored and [licensed as CC0](LICENSE.md) (or the material must be properly cited and licensed openly).
+Preferably, all data, code, and text here should be self-authored and [licensed as CC0](LICENSE.md).
+In case you want to add material from another source, make sure it is openly licensed and you add a proper citation.
 
-Also note that this project is publicly available to anyone.
+Note that this project is publicly available to anyone.
 
diff --git a/README.md b/README.md
index 1806c4c..887b037 100644
--- a/README.md
+++ b/README.md
@@ -1,10 +1,10 @@
-# Git Large File Storage
+# Git Large File Storage: How To
 
 <img src="logo.svg" width="100px" />
 
-*How to use Git LFS on JLU GitLab (and if this is a good idea)*
+*Support information how to use Git LFS on JLU GitLab, and if this is a good idea.*
 
-__NOTE: This project is work in progress!__
+__Please note: This is work in progress; any [contributions](CONTRIBUTING.md) are welcome!__
 
 [[_TOC_]]
 
@@ -22,32 +22,85 @@ There are technical reasons in Git's design to make this extra step necessary:
   Even if you work alone, this is a bit of a hassle — it gets much worse if other people have clones, because history change requires everyone involved to confirm the deletion with their clone.
 
 
+### Optional: More technical details
+With the Git LFS extension, you can version control (large) files "in association" with a Git repository.
+Instead of storing a file within the Git repository as a *blob*, Git LFS only stores *pointer files* in the repository, but stores the actual file contents on a (separate) Git LFS server (and locally, in another folder).
+A file tracked by Git LFS gets downloaded only if needed, e.g. when you check out a Git branch containing the tracked file (but it also gets cached locally, if you downloaded it before).
+
+LFS uses Git *filters* and *hooks* to coordinate between the normal repository and the tracked files.
+A *smudge filter* gets the file contents based on the pointer file.
+A *clean filter* creates a new version of the pointer file if the file changes.
+When you `push` a commit that contains a new/changed file tracked by LFS, a *pre-push hook* takes care to separately upload the large file contents to the Git LFS server.
+
+
 ## Is it a good idea to use Git LFS for your project?
-You must consider several aspects before uploading any data to JLU GitLab.
+You must consider several aspects before uploading any research data to JLU GitLab.
 Please read [this information](https://gitlab.test.uni-giessen.de/jlugitlab/support/-/blob/master/en/Information.md#storage-of-research-data) on research data management.
 If in doubt about your specific situation, please consult the department for research data, [forschungsdaten@uni-giessen.de](mailto:forschungsdaten@uni-giessen.de).
 
-- FIXME: Clarify that we talk about LFS on JLU GitLab, but that people can run their own LFS server with different settings/policies!
+With this in mind, using Git LFS may be a natural choice, if:
+
+1. You're already using Git to organize your project,
+2. data are an integral part of the project,
+3. the data files are binary and/or larger than a few KiB,
+4. all project members who need to access these data files can work with Git LFS, and 
+5. all machines that need to access these files have network access to the LFS server.
 
 
-## Practical steps how to use Git LFS
-Assumptions:
-- You have Git installed on your machine and you know the basics how to use it (TODO, specify: `add`, `commit`, `pull`, `push`).
-  (If you don't, [here is a good starting point](https://git-scm.com/)).
+### Optional: External Git LFS server
+As an alternative to storing LFS data on JLU GitLab, you could store LFS data on an external server, while still using JLU GitLab to host your project.
+For example, your workgroup could run their own Git LFS server; you can choose from e.g. [this list](https://github.com/git-lfs/git-lfs/wiki/Implementations).
+The advantage of an external LFS server is independence from JLU GitLab; e.g. you could implement different policies, such as potentially more suitable security practices.
+
+
+## Practical tips how to use Git LFS
+The following tips make the following assumptions:
+
+- You have Git installed on your machine and you know the basics (if you don't, [here is a good starting point](https://git-scm.com/)).
 - You have a project on JLU GitLab that includes a Git repository, and you have a local clone of it on your machine.
 - You can type Git commands into a command line interface (terminal).
-  Below, example terminal commands are indicated with a different font and with a leading dollar sign, `$ like this`.
-- You have the Git LFS extension installed on your machine (you can find [instructions here](https://git-lfs.github.com/)).
+  Below, example commands are indicated with a different font and with a leading dollar sign, `$ like this`; to reproduce them, drop the dollar sign `$ `.
+- You have the Git LFS extension installed on your machine (you can find [instructions here](https://git-lfs.github.com/)); you can check e.g. with typing `$ git lfs version`.
+
 
-Practical steps:
-1. In your local repository clone, configure which types of files you want to track by LFS.
-    - For example, to let LFS keep track of `CSV` files, type: `$ git lfs track "*.csv"`.
+### Basic use
+1. Set up Git LFS; you have to do this once per machine and repository: `$ git lfs install`.
+2. In your local repository, choose what types of files to track by LFS.
+    - For example, to track all `CSV` files, type: `$ git lfs track "*.csv"`.
     - This will create/change the Git configuration file [`.gitattributes`](.gitattributes).
-      You should track this configuration change in Git, e.g. by the usual Git commands `$ git add .gitattributes` and `$ git commit -m "start tracking HDF files with LFS"`.  
+      You should track this configuration change in the repository itself, with the usual Git commands `$ git add .gitattributes` and `$ git commit -m "start tracking CSV files with LFS"`.  
       *Note that because the file name `.gitattributes` starts with a dot, it may be hidden from view (on Linux and MacOS, use `$ ls -a` to see it; FIXME: What to do on Windows?).*
-2. FIXME: How to generally add files into LFS, and how to interact with them.
-3. FIXME: Clarify locking mechanism, relevant for people working in teams, see https://github.com/git-lfs/git-lfs/wiki/File-Locking.
+3. Now you can interact with the LFS-tracked files in the usual way to control versions with Git.
+   For example, to make a new snapshot with a file `some_data.csv`, use the usual commands `add`, `commit`, and `push` like any other file in the repository:
+   ```
+   $ git add some_data.csv
+   $ git commit -m "Add data to LFS"
+   $ git push
+   ```
+
+
+### Optional: File locking mechanism
+FIXME: Clarify locking mechanism, may be especially relevant for people working in teams, see https://github.com/git-lfs/git-lfs/wiki/File-Locking.
+
+
+### Optional: Ignore LFS files
+You may want to work with a Git repository but ignore the (large) files stored by LFS.
+For example, you may want to clone a repository on a machine that doesn't have access to the Git LFS server, or you simply don't require the large files.
+
+To temporally ignore the LFS content, you can set the [environment variable](https://en.wikipedia.org/wiki/Environment_variable) called `GIT_LFS_SKIP_SMUDGE` to the value `1`.
+To stop ignoring LFS, set the variable to `0`.
+
+The syntax to set the variable depends on your command line interface:
+- On Windows, type `$ set GIT_LFS_SKIP_SMUDGE=1`.
+- For Bash (e.g. Linux), type `$ export GIT_LFS_SKIP_SMUDGE=1`.
+
+After that, you can e.g. clone the repository without downloading the LFS files:
+
+```
+$ git clone <REMOTE-URL> <LOCAL-FOLDER>
+```
 
+TODO?: You can also ignore LFS files permanently, via Git configuration.
 
 ## Example(s)
 - [Here](example) you can find an of an analysis script that relies on data stored in Git LFS.
@@ -55,5 +108,6 @@ Practical steps:
 
 
 ## Useful links
-- https://git-lfs.github.com/
-- https://docs.gitlab.com/ce/topics/git/lfs/
+- Main website about Git LFS: https://git-lfs.github.com/
+- Information on LFS on GitLab: https://docs.gitlab.com/ce/topics/git/lfs/
+- A list of LFS server implementations: https://github.com/git-lfs/git-lfs/wiki/Implementations
-- 
GitLab