From b835eb01ad7e0353c5c0f912e25048720d8d81d7 Mon Sep 17 00:00:00 2001
From: Johannes Keyser <johannes.keyser@sport.uni-giessen.de>
Date: Tue, 2 Feb 2021 19:54:14 +0100
Subject: [PATCH] A few reformulations and fixes.

---
 README.md                            | 30 ++++++++++++++++------------
 example/README.md                    |  8 ++++----
 example/example_analyis_script.ipynb | 24 ++++++++--------------
 example/example_data.hdf             |  2 +-
 4 files changed, 30 insertions(+), 34 deletions(-)

diff --git a/README.md b/README.md
index 9e5ac7e..2eb4913 100644
--- a/README.md
+++ b/README.md
@@ -1,25 +1,26 @@
 # Git Large File Storage
 
-<img src="logo.svg" width="120px" />
+<img src="logo.svg" width="100px" />
 
 *How to use Git LFS on JLU GitLab (and if this is a good idea)*
 
-__NOTE: This project is very much work in progress!__
+__NOTE: This project is work in progress!__
 
-[__TOC__]
+[[_TOC_]]
 
-## What Problem is solved by Git LFS?
-The main purpose of Git LFS is to treat data files with the same convenience as files in your Git repository, while technically keeping them out of the Git repository.
-The main reasons behind Git LFS are technical:
 
-- Large files will bloat the Git repository for everyone who has a copy, and degrade the performance of Git operations.
+## What problem is solved by Git LFS?
+The main purpose of Git LFS is to treat **data** files *as conveniently as if they were inside* a Git repository, while *actually keeping them outside* of the repository.
+There are technical reasons in Git's design to make this extra step necessary:
+
+- Large files will bloat the Git repository for everyone who has a clone, and degrade the performance of Git operations.
   Git is optimized for text-based content, not for binary files.
-- Git is designed to distribute the full history to everyone who has a clone.
-  If data are part of the Git repository (at any point in time!), it means they get replicated on every clone.
+- Git is designed to distribute the entire snapshot history to every clone.
+  If data are part of the Git repository (at any point in history!), it means it gets replicated on every clone, even if's not needed (any more).
 - Git is designed to make it impossible to delete data from the repository's history.
   All you can do to "delete data" is force Git to you explicitly re-write the snapshot history.
-  Even if you work alone, this is a bit of a hassle.
-  This gets worse if other people have a clone of the repository, because the deletion requires everyone involved to manually re-create the deletion with their clone.
+  Even if you work alone, this is a bit of a hassle â€” it gets much worse if other people have clones, because history change requires everyone involved to confirm the deletion with their clone.
+
 
 ## Is it a good idea to use Git LFS for your project?
 You should consider several aspects before uploading data to JLU GitLab.
@@ -32,7 +33,8 @@ If in doubt about your specific situation, please ask the research data manager
 - Compared with [JLUdata](https://jlupub.ub.uni-giessen.de/handle/jlupub/1), you can store data privately among the members of your project.
   JLUdata is the preferred choice to *publish* data (for example, you get a DOI).
 
-## Practical steps how to use Git LFS (UNFINISHED)
+
+## Practical steps how to use Git LFS
 Assumptions:
 - You have Git installed on your machine and you know the basics how to use it (TODO, specify: `add`, `commit`, `pull`, `push`).
   (If you don't, [here is a good starting point](https://git-scm.com/)).
@@ -49,9 +51,11 @@ Assumptions:
 2. FIXME: How to generally add files into LFS, and how to interact with them.
 3. FIXME: Clarify the locking mechanism.
 
+
 ## Example(s)
 - [Here](example) you can find an of an analysis script that relies on data stored in Git LFS.
-- Maybe add more examples with different data (audio, video, images)?
+- TODO: Maybe add more examples with different data (audio, video, images)?
+
 
 ## Useful links
 - https://git-lfs.github.com/
diff --git a/example/README.md b/example/README.md
index ac9d6e7..c9d42a2 100644
--- a/example/README.md
+++ b/example/README.md
@@ -9,11 +9,11 @@ The example consists of
 - which relies on binary data ([example_data.hdf](example_data.hdf)) stored by Git LFS,
 - and stores figure data (as `.png` files) in the folder [plots](plots), stored by Git LFS.
 
-The analysis script is stored as a normal Git snapshot, without LFS.
+The analysis script is stored as a normal Git snapshot (without LFS).
 Note the small badge "LFS" at the files stored in LFS in GitLab's file overview.
-If you cloned the repository on your machine (and you have LFS installed), you type `$ git lfs ls-files` to get an overview of which files are stored in Git LFS:
 
-For example:
+If you cloned the repository on your machine (and you have LFS installed), you type `$ git lfs ls-files` to get an overview of which files are stored in Git LFS.
+In this example, you would see something like this:
 ```
 $ git lfs ls-files
 
@@ -25,7 +25,7 @@ f4aa57ea83 * example/plots/plot_example_histogram.png
 
 To achieve tracking with LFS of these files types (PNG and HDF), irrespective of their path in the repository, you can use these two commands:
 
-``` sh
+```sh
 $ git lfs track "*.hdf"
 $ git lfs track "*.png"
 ```
diff --git a/example/example_analyis_script.ipynb b/example/example_analyis_script.ipynb
index 9a63b2e..ca421c9 100644
--- a/example/example_analyis_script.ipynb
+++ b/example/example_analyis_script.ipynb
@@ -5,7 +5,7 @@
    "id": "failing-rebecca",
    "metadata": {},
    "source": [
-    "# Example \"analysis\" script\n",
+    "# Example analysis script\n",
     "\n",
     "This example illustrates the integration of analysis code and data:\n",
     "\n",
@@ -20,7 +20,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 2,
    "id": "composite-secretariat",
    "metadata": {},
    "outputs": [],
@@ -79,8 +79,8 @@
     }
    ],
    "source": [
-    "# Main illustration: Create figure based on existing data stored with Git LFS,\n",
-    "# and save figures as SVG files, which are also tracked by Git LFS.\n",
+    "# Main illustration: Create figure based on data stored with Git LFS,\n",
+    "# and save figures as PNG files, which are also tracked by Git LFS.\n",
     "\n",
     "# The data file contains samples of the standard normal distribution.\n",
     "with h5py.File(DATA_FILE, 'r') as file_handle:\n",
@@ -113,24 +113,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 93,
+   "execution_count": 3,
    "id": "quiet-castle",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Wrote data into file \"example_data.hdf\".\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "# Appendix: Generate data set (for the sake of completeness).\n",
     "\n",
     "# Draw samples from the standard normal distribution.\n",
-    "np.random.seed(42)\n",
-    "Num_Data_Points = 150*1000\n",
+    "numpy.random.seed(42)\n",
+    "Num_Data_Points = 150000\n",
     "rand_normal_samples = numpy.random.randn(Num_Data_Points, 1)\n",
     "\n",
     "# Save the samples in a HDF-5 file.\n",
diff --git a/example/example_data.hdf b/example/example_data.hdf
index f0c9311..7b1ce99 100644
--- a/example/example_data.hdf
+++ b/example/example_data.hdf
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3cf63089c81a765c9f1e9436fdc7edacdae1b9af1015d2b1c5f9440f748db103
+oid sha256:5908419e1f7a2f9806fe0b4bff078c5edbb2d5406bed95fa8d956e95332933c3
 size 1202048
-- 
GitLab