From 0caf09ca88b4842440a7ee023cac69cc8cfdd34a Mon Sep 17 00:00:00 2001
From: Johannes Keyser <johannes.keyser@sport.uni-giessen.de>
Date: Tue, 2 Feb 2021 18:42:34 +0100
Subject: [PATCH] Add more explanations.

---
 CONTRIBUTING.md   |  8 +++++++
 README.md         | 57 ++++++++++++++++++++++++++++++-----------------
 example/README.md | 43 +++++++++++++++++++++++++++++++++++
 3 files changed, 87 insertions(+), 21 deletions(-)
 create mode 100644 CONTRIBUTING.md
 create mode 100644 example/README.md

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000..d43bb34
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,8 @@
+# Contributing
+
+Anyone is welcome to contribute.
+
+Please note, that all data, code and text must be self-authored and [licensed as CC0](LICENSE.md) (or the material must be properly cited and licensed openly).
+
+Also note that this project is publicly available to anyone.
+
diff --git a/README.md b/README.md
index df1fd59..9e5ac7e 100644
--- a/README.md
+++ b/README.md
@@ -1,42 +1,57 @@
-# Git Large File Storage How-To
+# Git Large File Storage
 
 <img src="logo.svg" width="120px" />
 
 *How to use Git LFS on JLU GitLab (and if this is a good idea)*
 
-__NOTE: This project is work in progress!__
+__NOTE: This project is very much work in progress!__
 
-## Initial ideas for this project
+[__TOC__]
 
-- Explain/discuss/elaborate if LFS on JLU GitLab is the best choice for your data/project.
-    - Contrast especially with [JLUdata](https://jlupub.ub.uni-giessen.de/handle/jlupub/1).
-    - Explain why data don't belong in a plain Git repository:
-        - Because it degrades performance and bloats the repository.
-        - Because it is useless, since data are not expected to come in "versions" that need to be controlled.
-        - Because it is impossible (or at least a huge hassle) to actually delete data from a Git repository, especially if others have a clone.
-- Provide simple, practical examples what LFS can offer.
-    - Tight integration of some data and analysis code.
-    - 1st idea: Python code to generate some random data, and plot them in a Jupyter notebook.
-    - Maybe add more examples in other languages, or different data (audio, video, images)?
-- Anyone is welcome to contribute, with these guidelines:
-    - All data, code and text must be self-authored and [licensed as CC0](LICENSE.md), or the material must be properly cited and licensed openly.
-    - This project is publicly available to anyone.
+## What Problem is solved by Git LFS?
+The main purpose of Git LFS is to treat data files with the same convenience as files in your Git repository, while technically keeping them out of the Git repository.
+The main reasons behind Git LFS are technical:
+
+- Large files will bloat the Git repository for everyone who has a copy, and degrade the performance of Git operations.
+  Git is optimized for text-based content, not for binary files.
+- Git is designed to distribute the full history to everyone who has a clone.
+  If data are part of the Git repository (at any point in time!), it means they get replicated on every clone.
+- Git is designed to make it impossible to delete data from the repository's history.
+  All you can do to "delete data" is force Git to you explicitly re-write the snapshot history.
+  Even if you work alone, this is a bit of a hassle.
+  This gets worse if other people have a clone of the repository, because the deletion requires everyone involved to manually re-create the deletion with their clone.
+
+## Is it a good idea to use Git LFS for your project?
+You should consider several aspects before uploading data to JLU GitLab.
+If in doubt about your specific situation, please ask the research data manager via email, at [forschungsdaten@uni-giessen.de](mailto:forschungsdaten@uni-giessen.de).
+
+- You should __never__ save data containing personally identifying information on JLU GitLab.
+- In principle, Git LFS is suitable for data with "normal protection requirements" (in German, "normalem Schutzbedarf"), FIXME: EXPLAIN.
+- Git LFS is most suitable if you want to integrate data and their analysis code in the same place.
+  If you just want a place to keep your data on their own, you should also consider the option to use [JLUbox](https://www.uni-giessen.de/fbz/svc/hrz/svc/daten/jlubox) as well as [network drives](https://www.uni-giessen.de/fbz/svc/hrz/svc/daten/san/index_html), etc. 
+- Compared with [JLUdata](https://jlupub.ub.uni-giessen.de/handle/jlupub/1), you can store data privately among the members of your project.
+  JLUdata is the preferred choice to *publish* data (for example, you get a DOI).
 
 ## Practical steps how to use Git LFS (UNFINISHED)
 Assumptions:
-- You have Git installed on your machine and you know the basics how to use it.
+- You have Git installed on your machine and you know the basics how to use it (TODO, specify: `add`, `commit`, `pull`, `push`).
   (If you don't, [here is a good starting point](https://git-scm.com/)).
 - You have a project on JLU GitLab that includes a Git repository, and you have a local clone of it on your machine.
 - You can type Git commands into a command line interface (terminal).
   Below, example terminal commands are indicated with a different font and with a leading dollar sign, `$ like this`.
+- You have the Git LFS extension installed on your machine (you can find [instructions here](https://git-lfs.github.com/)).
 
-1. On your machine, install the Git LFS extension ([here](https://git-lfs.github.com/) are some instructions).
-2. In your local repository clone, configure which types of files you want to track by LFS.
-    - For example, to let LFS keep track of [HDF files](https://www.hdfgroup.org/), type: `$ git lfs track "*.hdf"`.
+1. In your local repository clone, configure which types of files you want to track by LFS.
+    - For example, to let LFS keep track of `CSV` files, type: `$ git lfs track "*.csv"`.
     - This will create/change the Git configuration file [`.gitattributes`](.gitattributes).
       You should track this configuration change in Git, e.g. by the usual Git commands `$ git add .gitattributes` and `$ git commit -m "start tracking HDF files with LFS"`.  
       *Note that because the file name `.gitattributes` starts with a dot, it may be hidden from view (on Linux and MacOS, use `$ ls -a` to see it; FIXME: What to do on Windows?).*
-3. FIXME: How to add files into LFS, and how to interact with them.
+2. FIXME: How to generally add files into LFS, and how to interact with them.
+3. FIXME: Clarify the locking mechanism.
+
+## Example(s)
+- [Here](example) you can find an of an analysis script that relies on data stored in Git LFS.
+- Maybe add more examples with different data (audio, video, images)?
 
 ## Useful links
 - https://git-lfs.github.com/
diff --git a/example/README.md b/example/README.md
new file mode 100644
index 0000000..ac9d6e7
--- /dev/null
+++ b/example/README.md
@@ -0,0 +1,43 @@
+# Example: Analysis that relies on data in Git LFS
+
+## Main illustration
+This example illustrates how Git LFS enables tight integration of analysis code and its data:
+Researchers who want to reproduce the results just need to clone this repository to get both.
+
+The example consists of
+- an analysis script ([example_analyis_script.ipynb](example_analyis_script.ipynb)),
+- which relies on binary data ([example_data.hdf](example_data.hdf)) stored by Git LFS,
+- and stores figure data (as `.png` files) in the folder [plots](plots), stored by Git LFS.
+
+The analysis script is stored as a normal Git snapshot, without LFS.
+Note the small badge "LFS" at the files stored in LFS in GitLab's file overview.
+If you cloned the repository on your machine (and you have LFS installed), you type `$ git lfs ls-files` to get an overview of which files are stored in Git LFS:
+
+For example:
+```
+$ git lfs ls-files
+
+3cf63089c8 * example/example_data.hdf
+3d177cad03 * example/plots/plot_example_cumsum.png
+f4aa57ea83 * example/plots/plot_example_histogram.png
+82065f2929 * example/plots/plot_example_trace.png
+```
+
+To achieve tracking with LFS of these files types (PNG and HDF), irrespective of their path in the repository, you can use these two commands:
+
+``` sh
+$ git lfs track "*.hdf"
+$ git lfs track "*.png"
+```
+
+This should result in a Git configuration file `.gitattributes` that contains these lines:
+
+```
+*.hdf filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+```
+
+
+## Appendix information
+- For the sake of readibility in the browser, the analysis script is a [Jupyter notebook](https://jupyter.org/).
+- The data are stored in the open data file format [HDF-5](https://www.hdfgroup.org/).
-- 
GitLab