Linked Open Development

Workshop at SWIB16

Fabian Steeg / @fsteeg & Adrian Pohl / @acka47
Linked Open Data, Hochschulbibliothekszentrum NRW (hbz)

Bonn, 2016-11-28

This presentation:
Creative Commons License

Software development in libraries

Individuals, in-house teams, cross-organization projects

Factually always many people are involved in development

How do we get together?

Open, transparent & inclusive

Open: make your data and code accessible under an open license

Transparent: Let others see what you're doing; make public your road map, issue tracking and discussions etc.

Inclusive: Let others participate without excluding anybody; encourage participation, give up absolute power


Something like a social network, to
"learn, share, and work together to build software"

Lets you share your own projects

Lets you collaborate on projects you or other people have shared

GitHub helps a lot with being open, transparent, inclusive




Git (released 2005) ≠ GitHub (founded 2008)

Git is the technical core of GitHub

It's a version control system

Version control

Version control is like a time machine for your files

You commit files and edits to the version control system

The version control system tracks all files and changes by all people

How to version control?

You describe the commit with a comment

For each commit, Git stores which files changed and how, your description of the changes, and other metadata like the time and author of the changes

Why is that useful?

You have a history where you can look up how, why, and who changed the files to be the way they are right now

And a safety net that allows you to roll back to any previous state

Not just for software

Version control and collaboration make sense for many things

Data, specifications, configuration, documentation, publication, etc.

What we'll learn

I. Share your own projects
(individual exercises)

Setting up and working on local projects with GitGit basics: init, add, commit, log13:00-13:40
Sharing and working on projects on GitHubGit remotes, GitHub web UI, Git push, pull13:40-14:00
Documenting your project on GitHubreadme, Markdown, GitHub pages14:00-15:00

II. Collaborate on projects
(team exercises)

Contributing to existing GitHub projectsGitHub flow: forking, cloning; issues; branching, pull requests15:30-17:00
Improving openness of our own projectslicense, contributing, issue_template, pull_request_template17:15-18:00
GitHub APIusing the GitHub issues API (with cURL & jq)18:00-18:45

Diverse audience

Librarians and developers, web and library systems,
with and without Git & GitHub experience

We hope there's something interesting for everybody

For exercises you're familiar with, maybe you'd like to help

For things you do differently, let's learn about that

I. Share your own projects

Setting up and working on local projects with Git

Prerequisite: Git installed

Exercise 1:
set up a local project with Git

Exercise 1:
set up a local project with Git

Create a location for all your Git repos

$ cd ~
$ mkdir git
$ cd git
Create an empty Git repo

$ mkdir lodworkshop-project
$ cd lodworkshop-project
$ git init
Verify project setup

$ git status

('$' means you should type this in your terminal application or Git shell.)

Exercise 2:
add a file to your local project

Exercise 2:
add a file to your local project

With your text editor, create a new file with content:

Save it as 'lodworkshop-project/locations.csv' and add it

$ git add locations.csv
$ git status
Commit the file (triggers setup instructions, opens an editor)

$ git commit
Verify addition

$ git status
$ git log

You created your first commit

Sharing and working on GitHub projects

Prerequisite: GitHub account


Remote repo: a copy of your local Git repo on a server, e.g. on GitHub's servers

We push our local changes to remote repos, and pull changes by others from remote repos

Exercise 3:
Set up a remote

Exercise 3:
Set up a remote repo

Create new repo ('+' sign, upper right), set the repository name to 'lodworkshop-project'
Copy repo HTTPS URL (under the 'Quick setup' header) and use it to add a new remote

$ git remote add origin <URL>
Verify the remote setup

$ git remote -v
Push your local repo content

$ git push origin master

Visit your repo on GitHub, verify your file is there

Open, transparent & inclusive

Documenting your project on GitHub


The README file is the entry point for your project's documentation

It enables readers to quickly orient themselves

GitHub displays the README's content on your repo's entry page

Exercise 4-a:
simple README file

Create a new file:

The locations.csv file maps 
location names to geo coordinates 
formatted as "latitude,longitude".
Save as 'README', add it

$ git add README
Verify the status

$ git status
Commit and push it to GitHub

$ git commit
$ git log
$ git push origin master

Visit your repo on GitHub to see it used


Markdown is a lightweight markup language

Plain, simple, readable text files

"a radically simplified and far more human-readable form of HTML" (Jeff Atwood)

Supported everywhere on GitHub: wiki, issues, README, etc.
(and on many other sites)

Exercise 4-b:
README formatting

Rename the README file to

$ git mv README
Add headers, styling, and details to your README content    See GitHub formatting docs
Verify the status

$ git status
Commit and push it to GitHub

$ git commit
$ git log
$ git push origin master

Visit your repo on GitHub to see it used


a wiki can provide additional, long-form documentation

GitHub contains a Wiki for each repo,

The Wiki is actually a Git repo with Markdown files,

This means you can get a full local copy of the Wiki,
including it's history

GitHub pages

GitHub also provides hosting of static HTML for users and repos

Pages are available at

We use it for this presentation:

Exercise 5:
Set up a GitHub page

Go to your repo's settings, 'Launch automatic page generator'
Load initial content from, select a theme
The pages are actually files in your repo, see details:

Visit your project's page at


II. Collaborating on GitHub projects

Team exercises

Mixed teams, different roles: developers, librarians, editors, etc.

Different tasks for each role

Form teams

an interactive map

Libraries in Bonn


Forks, clones, repos

Fork: your own copy of a repo on GitHub

Clone: a local copy of a repo

Original project, fork, and clone are all repos

Exercise 6:
fork and clone the repo

Fork the project repo ('Fork', upper right)
Copy the URL (green button 'clone or download') and clone your fork

$ cd ~
$ cd git
$ git clone <URL>
$ cd swib16-project
Look around a bit in the new repo

$ git status
$ git log
$ git remote -v

You now have a local clone of your own fork of the original project

Libraries in Bonn:

HTML & Javascript for the map

GitHub pages for hosting the site

Uses the Lobid API for organisation data

Using an API?

What does that even mean?

For Web APIs:

opening URLs!

Just like opening a web site

but we want structured data



We can edit and re-open the URL in the browser bar

We're using the API

Same URL, other usage

Call URL from command line

Call URL from code, e.g. Javascript

We want to get Libraries in Bonn from the API


Display that data on a map


Call URL from Javascript code





Libraries in Bonn:

Look at a sample organisation
Find field to query type AND location.address.addressLocality
Create query
Use location from response and location.geo.lon
Open HTML file in browser site, source

Look around a bit, get familiar with the implementation

GitHub issues

GitHub contains an integrated issue tracker

Issues are organized with colored labels and support Markdown

Easily link to code, commits, users, other issues, specific comments


References to users and issues

Mention users like on Twitter: @username

Reference issues with a hash: 'See #123'

If you reference an issue in your commit message the commit automatically appears in the issue


Linked open development

GitHub graphs

It's all connected







Exercise 7:
fix a bug

Exercise 7-a:
Implement your fix

Find a bug in the current project implementation    visit project site
   view source
Open an issue for the bug    open new GitHub issue
De-duplicate bug reports

Duplicate of #123
Fix the bug locally

Parallel work

Our scenario: different teams worked in parallel on the same project

Common task: we want to work independently and integrate later


Git's internal data structure is a graph

Each commit is a node with a single predecessor

Branches happen when you have two different commits
with the same direct predecessor


Branches are separate, parallel lines of activity on the files

They allow parallel, independent work by different people

Parallel changes need to be merged together in the end

GitHub flow

Used by GitHub to develop GitHub

Git makes branching and merging easy

Implement new features and bugfixes in separate branches


Exercise 7-b:
Push to a branch

Create a new local branch

$ git checkout -b bugfix-issue-1
Verify the status and what you're changing

$ git status
$ git diff
Commit your changes to the branch

$ git commit -a
$ git log
Push your branch to the remote

$ git push origin bugfix-issue-1

Remember to reference the issue in your commit

Visit your repo on GitHub, notice the yellow box

Pull requests

Request that your branch is pulled into the master branch

Opening a pull request initiates a code review of suggested changes

Changes are visualized in the pull request (what's removed/added?), and can be commented at specific lines

Can contain automatic checks like running tests

Basically like an issue with extra functionality





Our review process

Functional review: review actual functionality, associated with the issue, done by non-developer

Code review: review code changes, associated with the pull request, done by developer

Exercise 7-c:
Review your fix

Developer opens a pull request for the bugfix branch, includes reference to the fixed issue
Developer adds instructions on how to review the fix in the issue
Librarian reviews the functional side of the fix (functional review), comments in the issue, says +1 when done
Developer reviews the implementation of the fix (code review), comments in the pull request, says +1 when done
Both reviews are positive: maintainer merges the pull request

With GitHub pages created from the master branch, we're done


To continue working on the project, we want the current state

We need to get the bugfixes from other teams into our local repo

They have been merged into the original repo

The original repo is often called the upstream repo

Exercise 7-d:
get updates from upstream

Check your current branch

											$ git status
Go back to your local master branch

											$ git checkout master
Add the original repo as a new remote

											$ git remote add upstream 
Get the latest additions to the upstream master

											$ git remote -v
											$ git pull upstream master

We now have the bugfixes from every team in our local repo

Exercise 8:
add a feature

"Libraries in X"

Exercise 8-a:
add a new map

Libraries in X, each team picks a city in Germany

Open an issue for creating the new map
Implement the feature locally, propose, and review it

Remember to reference the issue in commits and pull requests

But where are the other maps?

Exercise 8-b:
integrate with other maps

Open an issue for the integration:
add links to the other maps below your own map
Implement the links locally, propose, and review them

Remember to reference the issue in commits and pull requests

Process visualization

To get a unified view of a team's issues from different repos we use a Kanban board

A Kanban board visualizes the development workflow and the current status of each issue


(License: Creative Commons Attribution-Share Alike 3.0 Unported)


Waffle is a Kanban board with GitHub integration:
every issue is a card, columns are associated with labels

(GitHub now has an integrated board, but for now it's single-repo)



In general: left → right

Backlog Ready Working Review Deploy Done
New, unlabelled issues Requirements and dependencies are clear enough to start working Actively worked on issues Issues under review Can be deployed to production Deployed and running in production

We move high priority issues to the top of each column

Improving openness & inclusivity


an important instrument that legally allows usage

Basically: just a file called LICENSE in your repo

Detected by GitHub to show license on top of repo page

Software licenses

Strong copyleft: GPL, AGPL

Weak copyleft: LGPL, EPL

Permissive: MIT, BSD

GitHub maintains a catalog of licenses and a tool to choose one at

" is intended to demystify license choices, not present or catalog all of them."

Data, specs, docs, etc.

For non-software, other licenses might make more sense

It lists all open Creative Commons licenses:
CC0, CC-BY-4.0 and CC-BY-SA-4.0

Exercise 9:
add a license

Option 1: local file, add, commit, push
Option 2: use GitHub web UI, add file, use template


Contributor guidelines

How to set up development environment, coding conventions, etc.

Basically: just a file called in your repo

Linked when creating new issues and pull requests


Code of conduct

“define community standards, signal a welcoming and inclusive project, and outline procedures for handling abuse”

Basically: just a file called in your repo


Issue templates

Template for new issues

Things you expect in an issue

Expected behavior, actual behavior, steps to reproduce, environment (browser, operating system, etc.)

Exercise 11:
issue templates

Create, add, commit, push file:
(Template for: expected behavior, actual behavior, steps to reproduce, environment like browser, operating system, etc.)
Create a new issue to see the template

Pull request templates

Template for new pull request

Things you expect in a pull request

Which issue is fixed? How to reproduce the fix?

Exercise 12:
pull request templates

Create, add, commit, push file:
(Template for: which issue is fixed? How to reproduce the fix?)
Open a pull request to see the template

Feedback welcome!
(with issue template)

Owning your content

GitHub is proprietary software

Almost all the things we saw are part of the Git repo
(actual files, readme, contributor guide, templates)

Except: issues and pull request

GitHub API




(& pull requests)








"Command line tool and library for transferring data with URLs"

Exercise 13:

# Get your own profile
$ curl<user>

# Same, but authenticated and show some information:
$ curl -i -u <user>

# Get data for a repo
$ curl


Last cURL output was pretty long, we often want specific values

jq: "A lightweight and flexible command-line JSON processor"

Pick out specific data fields

Different sample: commits, the history

Pick out only author name and message



Exercise 14:
jq basics

# We start with cURL again:
$ curl ''

# We pipe the JSON to jq and pick out just the first element
$ curl [...] | jq '.[0]'

# Construct new JSON with the fields we care about
jq '.[0] | {message: .commit.message, author:}'

# Looks good, now let's feed each commit into that format
jq '.[] | {message: .commit.message, author:}'

# You can redirect any output to a file
$ curl [...] | jq '.[] |' > output.txt

Exercise 15:
backup issues

# We start with cURL again:

# The task: get all issues, with all their comments, store in files
# jq supports compact output, "JSON lines"
jq -c

# How to delete quotes from a URL:
"" | tr -d '"'

# How to iterate over a list of URLs in a file:
while read URL
  echo $URL
done < urls.txt

Exercise 15:
sample solution

# First lets collect all issues with IDs and their comments URLs:
 | jq -c '.[] | {id: .number, body: .body, comments: .comments_url'} 
 > issues.jsonl

# For each line, we then append the body and all comments to a file:
while read ISSUE
	ID=$(echo $ISSUE | jq '.id')
	echo $ISSUE | jq '.body' > $ID.txt
	URL=$(echo $ISSUE | jq '.comments' | tr -d '"')
	curl $URL | jq '.[] | .body' >> $ID.txt
done < issues.jsonl

Missing: error handling, additional info on comment author etc.


Run a script like that automatically

As a reaction to specific activity

Like creation of new issues or comments

Webhooks & events

You can register a webhook for your GitHub repo

Makes GitHub send events to a specified URL
e.g. when commits are pushed, issues opened, or comments added

Server running at given URL can handle these events

Webhook sample

We use GitHub Webhooks for the Metafacture project

Back up issues in the metafacture-documentation repo | details

Tweet project activity | details

Further reading

Time for questions, discussions, feedback