Linked Open Development

Workshop at SWIB16

Fabian Steeg / @fsteeg & Adrian Pohl / @acka47
Linked Open Data, Hochschulbibliothekszentrum NRW (hbz)


Bonn, 2016-11-28

This presentation:
http://hbz.github.io/swib16-workshop
Creative Commons License

Software development in libraries

Individuals, in-house teams, cross-organization projects

Factually always many people are involved in development

How do we get together?

Open, transparent & inclusive

Open: make your data and code accessible under an open license

Transparent: Let others see what you're doing; make public your road map, issue tracking and discussions etc.

Inclusive: Let others participate without excluding anybody; encourage participation, give up absolute power

GitHub

Something like a social network, to
"learn, share, and work together to build software"

Lets you share your own projects

Lets you collaborate on projects you or other people have shared

GitHub helps a lot with being open, transparent, inclusive

 
 
 

 
 
 

Git

Git (released 2005) ≠ GitHub (founded 2008)

Git is the technical core of GitHub

It's a version control system

Version control

Version control is like a time machine for your files

You commit files and edits to the version control system

The version control system tracks all files and changes by all people

How to version control?

You describe the commit with a comment

For each commit, Git stores which files changed and how, your description of the changes, and other metadata like the time and author of the changes

Why is that useful?

You have a history where you can look up how, why, and who changed the files to be the way they are right now

And a safety net that allows you to roll back to any previous state

Not just for software

Version control and collaboration make sense for many things

Data, specifications, configuration, documentation, publication, etc.

https://github.com/schemaorg/schemaorg

https://github.com/hbz/swib16-workshop

https://github.com/hbz/lookup-tables

https://github.com/libreas/ausgabe29

What we'll learn

I. Share your own projects
(individual exercises)

Setting up and working on local projects with GitGit basics: init, add, commit, log13:00-13:40
Sharing and working on projects on GitHubGit remotes, GitHub web UI, Git push, pull13:40-14:00
Documenting your project on GitHubreadme, Markdown, GitHub pages14:00-15:00

II. Collaborate on projects
(team exercises)

Contributing to existing GitHub projectsGitHub flow: forking, cloning; issues; branching, pull requests15:30-17:00
Improving openness of our own projectslicense, contributing, issue_template, pull_request_template17:15-18:00
GitHub APIusing the GitHub issues API (with cURL & jq)18:00-18:45

Diverse audience

Librarians and developers, web and library systems,
with and without Git & GitHub experience

We hope there's something interesting for everybody

For exercises you're familiar with, maybe you'd like to help

For things you do differently, let's learn about that

I. Share your own projects

Setting up and working on local projects with Git

Prerequisite: Git installed

Exercise 1:
set up a local project with Git

Exercise 1:
set up a local project with Git

Create a location for all your Git repos

$ cd ~
$ mkdir git
$ cd git
										
Create an empty Git repo

$ mkdir lodworkshop-project
$ cd lodworkshop-project
$ git init
										
Verify project setup

$ git status
										

('$' means you should type this in your terminal application or Git shell.)

Exercise 2:
add a file to your local project

Exercise 2:
add a file to your local project

With your text editor, create a new file with content:

bonn,"50.733992,7.099814"
										
Save it as 'lodworkshop-project/locations.csv' and add it

$ git add locations.csv
$ git status
										
Commit the file (triggers setup instructions, opens an editor)

$ git commit
										
Verify addition

$ git status
$ git log
										

You created your first commit

Sharing and working on GitHub projects

Prerequisite: GitHub account

Remotes

Remote repo: a copy of your local Git repo on a server, e.g. on GitHub's servers

We push our local changes to remote repos, and pull changes by others from remote repos

Exercise 3:
Set up a remote

Exercise 3:
Set up a remote repo

Create new repo ('+' sign, upper right), set the repository name to 'lodworkshop-project'    http://github.com
Copy repo HTTPS URL (under the 'Quick setup' header) and use it to add a new remote

$ git remote add origin <URL>
										
Verify the remote setup

$ git remote -v
										
Push your local repo content

$ git push origin master
										

Visit your repo on GitHub, verify your file is there

Open, transparent & inclusive

Documenting your project on GitHub

README

The README file is the entry point for your project's documentation

It enables readers to quickly orient themselves

GitHub displays the README's content on your repo's entry page

Exercise 4-a:
simple README file

Create a new file:

The locations.csv file maps 
location names to geo coordinates 
formatted as "latitude,longitude".
										
Save as 'README', add it

$ git add README
										
Verify the status

$ git status
										
Commit and push it to GitHub

$ git commit
$ git log
$ git push origin master
										

Visit your repo on GitHub to see it used

Markdown

Markdown is a lightweight markup language

Plain, simple, readable text files

"a radically simplified and far more human-readable form of HTML" (Jeff Atwood)

Supported everywhere on GitHub: wiki, issues, README, etc.
(and on many other sites)

Exercise 4-b:
README formatting

Rename the README file to README.md

$ git mv README README.md
										
Add headers, styling, and details to your README content    See GitHub formatting docs
Verify the status

$ git status
										
Commit and push it to GitHub

$ git commit
$ git log
$ git push origin master
										

Visit your repo on GitHub to see it used

Wiki

a wiki can provide additional, long-form documentation

GitHub contains a Wiki for each repo,
e.g. https://github.com/d3/d3/wiki

The Wiki is actually a Git repo with Markdown files,
e.g. https://github.com/d3/d3.wiki.git

This means you can get a full local copy of the Wiki,
including it's history

GitHub pages

GitHub also provides hosting of static HTML for users and repos

Pages are available at
http://<user>.github.io/<repo>

We use it for this presentation:
http://hbz.github.io/swib16-workshop

Exercise 5:
Set up a GitHub page

Go to your repo's settings, 'Launch automatic page generator'
Load initial content from README.md, select a theme
The pages are actually files in your repo, see details:
https://help.github.com/categories/github-pages-basics/

Visit your project's page at
http://<user>.github.io/<repo>

Break

II. Collaborating on GitHub projects

Team exercises

Mixed teams, different roles: developers, librarians, editors, etc.

Different tasks for each role

Form teams

Project:
an interactive map

Libraries in Bonn

 
 
 

Forks, clones, repos

Fork: your own copy of a repo on GitHub

Clone: a local copy of a repo

Original project, fork, and clone are all repos

Exercise 6:
fork and clone the repo

Fork the project repo ('Fork', upper right)    https://github.com/hbz/swib16-project
Copy the URL (green button 'clone or download') and clone your fork

$ cd ~
$ cd git
$ git clone <URL>
$ cd swib16-project
										
Look around a bit in the new repo

$ git status
$ git log
$ git remote -v
										

You now have a local clone of your own fork of the original project

Libraries in Bonn:
implementation

HTML & Javascript for the map

GitHub pages for hosting the site

Uses the Lobid API for organisation data

Using an API?

What does that even mean?

For Web APIs:

opening URLs!

Just like opening a web site

but we want structured data

 
 
 

 
 
 

We can edit and re-open the URL in the browser bar

We're using the API

Same URL, other usage

Call URL from command line

Call URL from code, e.g. Javascript

We want to get Libraries in Bonn from the API

 
 
 

Display that data on a map

 
 
 

Call URL from Javascript code

 
 
 

 
 
 

 
 
 

 
 
 

Libraries in Bonn:
implementation

Look at a sample organisation http://beta.lobid.org/organisations/DE-Bo133?format=json
Find field to query type AND location.address.addressLocality
Create query http://beta.lobid.org/organisations/search?format=json&q=type:Library+AND+location.address.addressLocality:Bonn
Use location from response location.geo.lat and location.geo.lon
Open HTML file in browser site, source

Look around a bit, get familiar with the implementation

GitHub issues

GitHub contains an integrated issue tracker

Issues are organized with colored labels and support Markdown

Easily link to code, commits, users, other issues, specific comments

 
 
 

References to users and issues

Mention users like on Twitter: @username

Reference issues with a hash: 'See #123'

If you reference an issue in your commit message the commit automatically appears in the issue

https://guides.github.com/features/issues/#notifications

 
 
 

Linked open development

GitHub graphs

It's all connected

Users

Organisations

Code

Issues

etc.

Break

https://vimeo.com/109505574

Exercise 7:
fix a bug

Exercise 7-a:
Implement your fix

Find a bug in the current project implementation    visit project site
   view source
Open an issue for the bug    open new GitHub issue
De-duplicate bug reports

Duplicate of #123
										
Fix the bug locally

Parallel work

Our scenario: different teams worked in parallel on the same project

Common task: we want to work independently and integrate later

Branching

Git's internal data structure is a graph

Each commit is a node with a single predecessor

Branches happen when you have two different commits
with the same direct predecessor

Merging

Branches are separate, parallel lines of activity on the files

They allow parallel, independent work by different people

Parallel changes need to be merged together in the end

GitHub flow

Used by GitHub to develop GitHub

Git makes branching and merging easy

Implement new features and bugfixes in separate branches

 
 
 

Exercise 7-b:
Push to a branch

Create a new local branch

$ git checkout -b bugfix-issue-1
										
Verify the status and what you're changing

$ git status
$ git diff
										
Commit your changes to the branch

$ git commit -a
$ git log
										
Push your branch to the remote

$ git push origin bugfix-issue-1
										

Remember to reference the issue in your commit

Visit your repo on GitHub, notice the yellow box

Pull requests

Request that your branch is pulled into the master branch

Opening a pull request initiates a code review of suggested changes

Changes are visualized in the pull request (what's removed/added?), and can be commented at specific lines

Can contain automatic checks like running tests

Basically like an issue with extra functionality

 
 
 

 
 
 

 
 
 

 
 
 

Our review process

Functional review: review actual functionality, associated with the issue, done by non-developer

Code review: review code changes, associated with the pull request, done by developer

https://hbz.github.io/#dev-process

Exercise 7-c:
Review your fix

Developer opens a pull request for the bugfix branch, includes reference to the fixed issue
Developer adds instructions on how to review the fix in the issue
Librarian reviews the functional side of the fix (functional review), comments in the issue, says +1 when done
Developer reviews the implementation of the fix (code review), comments in the pull request, says +1 when done
Both reviews are positive: maintainer merges the pull request

With GitHub pages created from the master branch, we're done

Integration

To continue working on the project, we want the current state

We need to get the bugfixes from other teams into our local repo

They have been merged into the original repo

The original repo is often called the upstream repo

Exercise 7-d:
get updates from upstream

Check your current branch

											$ git status
										
Go back to your local master branch

											$ git checkout master
										
Add the original repo as a new remote

											$ git remote add upstream 
											https://github.com/hbz/swib16-project.git
										
Get the latest additions to the upstream master

											$ git remote -v
											$ git pull upstream master
										

We now have the bugfixes from every team in our local repo

Exercise 8:
add a feature

"Libraries in X"

Exercise 8-a:
add a new map

Libraries in X, each team picks a city in Germany

Open an issue for creating the new map
Implement the feature locally, propose, and review it

Remember to reference the issue in commits and pull requests

But where are the other maps?

Exercise 8-b:
integrate with other maps

Open an issue for the integration:
add links to the other maps below your own map
Implement the links locally, propose, and review them

Remember to reference the issue in commits and pull requests

Process visualization

To get a unified view of a team's issues from different repos we use a Kanban board

A Kanban board visualizes the development workflow and the current status of each issue

 
 
 







Source: https://commons.wikimedia.org/wiki/File:Simple-kanban-board-.jpg
(License: Creative Commons Attribution-Share Alike 3.0 Unported)

Waffle

Waffle is a Kanban board with GitHub integration:
every issue is a card, columns are associated with labels

(GitHub now has an integrated board, but for now it's single-repo)

 
 
 

Process

In general: left → right

Backlog Ready Working Review Deploy Done
New, unlabelled issues Requirements and dependencies are clear enough to start working Actively worked on issues Issues under review Can be deployed to production Deployed and running in production

We move high priority issues to the top of each column

Improving openness & inclusivity

help.github.com/articles/helping-people-contribute-to-your-project

LICENSE

an important instrument that legally allows usage

Basically: just a file called LICENSE in your repo

Detected by GitHub to show license on top of repo page

Software licenses

Strong copyleft: GPL, AGPL

Weak copyleft: LGPL, EPL

Permissive: MIT, BSD

GitHub maintains a catalog of licenses and a tool to choose one at http://choosealicense.com/

"Choosealicense.com is intended to demystify license choices, not present or catalog all of them."

Data, specs, docs, etc.

For non-software, other licenses might make more sense

http://choosealicense.com/non-software/

It lists all open Creative Commons licenses:
CC0, CC-BY-4.0 and CC-BY-SA-4.0

Exercise 9:
add a license

Option 1: local file, add, commit, push
Option 2: use GitHub web UI, add file, use template
https://help.github.com/articles/adding-a-license-to-a-repository

CONTRIBUTING

Contributor guidelines

How to set up development environment, coding conventions, etc.

Basically: just a file called CONTRIBUTING.md in your repo

Linked when creating new issues and pull requests

 
 
 

Code of conduct

“define community standards, signal a welcoming and inclusive project, and outline procedures for handling abuse”

Basically: just a file called CODE_OF_CONDUCT.md in your repo

help.github.com/articles/adding-a-code-of-conduct-to-your-project

 
 
 

Issue templates

Template for new issues

Things you expect in an issue

Expected behavior, actual behavior, steps to reproduce, environment (browser, operating system, etc.)

Exercise 11:
issue templates

Create, add, commit, push file: ISSUE_TEMPLATE.md
(Template for: expected behavior, actual behavior, steps to reproduce, environment like browser, operating system, etc.)
Create a new issue to see the template
https://help.github.com/articles/creating-an-issue-template-for-your-repository/

Pull request templates

Template for new pull request

Things you expect in a pull request

Which issue is fixed? How to reproduce the fix?

Exercise 12:
pull request templates

Create, add, commit, push file: PULL_REQUEST_TEMPLATE.md
(Template for: which issue is fixed? How to reproduce the fix?)
Open a pull request to see the template
https://help.github.com/articles/creating-a-pull-request-template-for-your-repository

Feedback welcome!
(with issue template)

https://github.com/hbz/swib16-workshop/issues/new

Owning your content

GitHub is proprietary software

Almost all the things we saw are part of the Git repo
(actual files, readme, contributor guide, templates)

Except: issues and pull request

GitHub API

https://developer.github.com/

Repos

https://api.github.com/repos/hbz/swib16-workshop

 
 
 

 
 
 

Issues
(& pull requests)

https://api.github.com/repos/hbz/swib16-workshop/issues?state=all

 
 
 

 
 
 

 
 
 

Comments

https://api.github.com/repos/hbz/swib16-workshop/issues/19/comments

 
 
 

 
 
 

cURL

"Command line tool and library for transferring data with URLs"

https://curl.haxx.se/download.html

Exercise 13:
cURL


# Get your own profile
$ curl https://api.github.com/users/<user>

# Same, but authenticated and show some information:
$ curl -i -u <user> https://api.github.com/user

# Get data for a repo
$ curl https://api.github.com/repos/hbz/swib16-workshop
					

jq

Last cURL output was pretty long, we often want specific values

jq: "A lightweight and flexible command-line JSON processor"

https://stedolan.github.io/jq/

https://jqplay.org/

Pick out specific data fields

Different sample: commits, the history

Pick out only author name and message

 
 
 

 
 
 

Exercise 14:
jq basics


# We start with cURL again:
$ curl 'https://api.github.com/repos/hbz/swib16-workshop/commits'

# We pipe the JSON to jq and pick out just the first element
$ curl [...] | jq '.[0]'

# Construct new JSON with the fields we care about
jq '.[0] | {message: .commit.message, author: .commit.author.name}'

# Looks good, now let's feed each commit into that format
jq '.[] | {message: .commit.message, author: .commit.author.name}'

# You can redirect any output to a file
$ curl [...] | jq '.[] | .commit.author.name' > output.txt

Exercise 15:
backup issues

# We start with cURL again:
curl https://api.github.com/repos/hbz/swib16-workshop/issues?state=all

# The task: get all issues, with all their comments, store in files
						
# jq supports compact output, "JSON lines"
jq -c

# How to delete quotes from a URL:
"https://api.github.com/repos/hbz/swib16-workshop" | tr -d '"'

# How to iterate over a list of URLs in a file:
while read URL
do
  echo $URL
done < urls.txt
						

Exercise 15:
sample solution

# First lets collect all issues with IDs and their comments URLs:
curl https://api.github.com/repos/hbz/swib16-workshop/issues?state=all 
 | jq -c '.[] | {id: .number, body: .body, comments: .comments_url'} 
 > issues.jsonl

# For each line, we then append the body and all comments to a file:
while read ISSUE
do
	ID=$(echo $ISSUE | jq '.id')
	echo $ISSUE | jq '.body' > $ID.txt
	URL=$(echo $ISSUE | jq '.comments' | tr -d '"')
	curl $URL | jq '.[] | .body' >> $ID.txt
done < issues.jsonl
						

Missing: error handling, additional info on comment author etc.

Automation

Run a script like that automatically

As a reaction to specific activity

Like creation of new issues or comments

Webhooks & events

You can register a webhook for your GitHub repo

Makes GitHub send events to a specified URL
e.g. when commits are pushed, issues opened, or comments added

Server running at given URL can handle these events

Webhook sample

We use GitHub Webhooks for the Metafacture project

Back up issues in the metafacture-documentation repo | details

Tweet project activity | details

Further reading

https://octoverse.github.com

https://data-lessons.github.io/library-git

https://github.com/jlord/git-it-electron

https://git-scm.com/book/en/v2

Time for questions, discussions, feedback

https://hbz.github.io/swib16-workshop/

https://github.com/hbz/swib16-workshop/