Fabian Steeg /
@fsteeg &
Adrian Pohl /
@acka47
Linked
Open Data, Hochschulbibliothekszentrum NRW (hbz)
Bonn, 2016-11-28
This presentation:
http://hbz.github.io/swib16-workshop
Individuals, in-house teams, cross-organization projects
Factually always many people are involved in development
How do we get together?
Open: make your data and code accessible under an open license
Transparent: Let others see what you're doing; make public your road map, issue tracking and discussions etc.
Inclusive: Let others participate without excluding anybody; encourage participation, give up absolute power
Something like a social network, to
"learn, share, and work together to build software"
Lets you share your own projects
Lets you collaborate on projects you or other people have shared
Git (released 2005) ≠ GitHub (founded 2008)
Git is the technical core of GitHub
It's a version control system
Version control is like a time machine for your files
You commit files and edits to the version control system
The version control system tracks all files and changes by all people
You describe the commit with a comment
For each commit, Git stores which files changed and how, your description of the changes, and other metadata like the time and author of the changes
You have a history where you can look up how, why, and who changed the files to be the way they are right now
And a safety net that allows you to roll back to any previous state
Version control and collaboration make sense for many things
Data, specifications, configuration, documentation, publication, etc.
https://github.com/schemaorg/schemaorg
https://github.com/hbz/swib16-workshop
Setting up and working on local projects with Git | Git basics: init, add, commit, log | 13:00-13:40 |
Sharing and working on projects on GitHub | Git remotes, GitHub web UI, Git push, pull | 13:40-14:00 |
Documenting your project on GitHub | readme, Markdown, GitHub pages | 14:00-15:00 |
Contributing to existing GitHub projects | GitHub flow: forking, cloning; issues; branching, pull requests | 15:30-17:00 |
Improving openness of our own projects | license, contributing, issue_template, pull_request_template | 17:15-18:00 |
GitHub API | using the GitHub issues API (with cURL & jq) | 18:00-18:45 |
Librarians and developers, web and library systems,
with and without Git & GitHub experience
We hope there's something interesting for everybody
For exercises you're familiar with, maybe you'd like to help
For things you do differently, let's learn about that
Prerequisite: Git installed
Create a location for all your Git repos |
|
Create an empty Git repo |
|
Verify project setup |
|
('$' means you should type this in your terminal application or Git shell.)
With your text editor, create a new file with content: |
|
Save it as 'lodworkshop-project/locations.csv' and add it |
|
Commit the file (triggers setup instructions, opens an editor) |
|
Verify addition |
|
You created your first commit
Prerequisite: GitHub account
Remote repo: a copy of your local Git repo on a server, e.g. on GitHub's servers
We push our local changes to remote repos, and pull changes by others from remote repos
Create new repo ('+' sign, upper right), set the repository name to 'lodworkshop-project' | http://github.com |
Copy repo HTTPS URL (under the 'Quick setup' header) and use it to add a new remote |
|
Verify the remote setup |
|
Push your local repo content |
|
Visit your repo on GitHub, verify your file is there
The README file is the entry point for your project's documentation
It enables readers to quickly orient themselves
GitHub displays the README's content on your repo's entry page
Create a new file: |
|
Save as 'README', add it |
|
Verify the status |
|
Commit and push it to GitHub |
|
Visit your repo on GitHub to see it used
Markdown is a lightweight markup language
Plain, simple, readable text files
"a radically simplified and far more human-readable form of HTML" (Jeff Atwood)
Supported everywhere on GitHub: wiki, issues, README, etc.
(and on many other sites)
Rename the README file to README.md |
|
Add headers, styling, and details to your README content | See GitHub formatting docs |
Verify the status |
|
Commit and push it to GitHub |
|
Visit your repo on GitHub to see it used
a wiki can provide additional, long-form documentation
GitHub contains a Wiki for each repo,
e.g. https://github.com/d3/d3/wiki
The Wiki is actually a Git repo with Markdown files,
e.g. https://github.com/d3/d3.wiki.git
This means you can get a full local copy of the Wiki,
including it's history
GitHub also provides hosting of static HTML for users and repos
Pages are available at http://<user>.github.io/<repo>
We use it for this presentation:
http://hbz.github.io/swib16-workshop
Go to your repo's settings, 'Launch automatic page generator' |
Load initial content from README.md, select a theme |
The pages are actually files in your repo, see details: https://help.github.com/categories/github-pages-basics/ |
Visit your project's page at http://<user>.github.io/<repo>
Mixed teams, different roles: developers, librarians, editors, etc.
Different tasks for each role
Form teams
Fork: your own copy of a repo on GitHub
Clone: a local copy of a repo
Original project, fork, and clone are all repos
Fork the project repo ('Fork', upper right) | https://github.com/hbz/swib16-project |
Copy the URL (green button 'clone or download') and clone your fork |
|
Look around a bit in the new repo |
|
You now have a local clone of your own fork of the original project
HTML & Javascript for the map
GitHub pages for hosting the site
Uses the Lobid API for organisation data
What does that even mean?
For Web APIs:
We can edit and re-open the URL in the browser bar
Call URL from command line
Call URL from code, e.g. Javascript
We want to get Libraries in Bonn from the API
Look at a sample organisation | http://beta.lobid.org/organisations/DE-Bo133?format=json |
Find field to query | type AND location.address.addressLocality |
Create query | http://beta.lobid.org/organisations/search?format=json&q=type:Library+AND+location.address.addressLocality:Bonn |
Use location from response | location.geo.lat and location.geo.lon |
Open HTML file in browser | site, source |
Look around a bit, get familiar with the implementation
GitHub contains an integrated issue tracker
Issues are organized with colored labels and support Markdown
Easily link to code, commits, users, other issues, specific comments
Mention users like on Twitter: @username
Reference issues with a hash: 'See #123'
If you reference an issue in your commit message the commit automatically appears in the issue
Users
Organisations
Code
Issues
etc.
Find a bug in the current project implementation | visit project site view source |
Open an issue for the bug | open new GitHub issue |
De-duplicate bug reports |
|
Fix the bug locally |
Our scenario: different teams worked in parallel on the same project
Common task: we want to work independently and integrate later
Git's internal data structure is a graph
Each commit is a node with a single predecessor
Branches happen when you have two different commits
with the same direct predecessor
Branches are separate, parallel lines of activity on the files
They allow parallel, independent work by different people
Parallel changes need to be merged together in the end
Used by GitHub to develop GitHub
Git makes branching and merging easy
Implement new features and bugfixes in separate branches
Create a new local branch |
|
Verify the status and what you're changing |
|
Commit your changes to the branch |
|
Push your branch to the remote |
|
Remember to reference the issue in your commit
Visit your repo on GitHub, notice the yellow box
Request that your branch is pulled into the master branch
Opening a pull request initiates a code review of suggested changes
Changes are visualized in the pull request (what's removed/added?), and can be commented at specific lines
Can contain automatic checks like running tests
Basically like an issue with extra functionality
Functional review: review actual functionality, associated with the issue, done by non-developer
Code review: review code changes, associated with the pull request, done by developer
Developer opens a pull request for the bugfix branch, includes reference to the fixed issue |
Developer adds instructions on how to review the fix in the issue |
Librarian reviews the functional side of the fix (functional review), comments in the issue, says +1 when done |
Developer reviews the implementation of the fix (code review), comments in the pull request, says +1 when done |
Both reviews are positive: maintainer merges the pull request |
With GitHub pages created from the master branch, we're done
To continue working on the project, we want the current state
We need to get the bugfixes from other teams into our local repo
They have been merged into the original repo
The original repo is often called the upstream repo
Check your current branch |
|
Go back to your local master branch |
|
Add the original repo as a new remote |
|
Get the latest additions to the upstream master |
|
We now have the bugfixes from every team in our local repo
Libraries in X, each team picks a city in Germany
Open an issue for creating the new map |
Implement the feature locally, propose, and review it |
Remember to reference the issue in commits and pull requests
Open an issue for the integration: add links to the other maps below your own map |
Implement the links locally, propose, and review them |
Remember to reference the issue in commits and pull requests
To get a unified view of a team's issues from different repos we use a Kanban board
A Kanban board visualizes the development workflow and the current status of each issue
Source: https://commons.wikimedia.org/wiki/File:Simple-kanban-board-.jpg
(License: Creative Commons Attribution-Share Alike 3.0 Unported)
Waffle is a Kanban board with GitHub integration:
every issue is a card, columns are associated with labels
(GitHub now has an integrated board, but for now it's single-repo)
In general: left → right
Backlog | Ready | Working | Review | Deploy | Done |
---|---|---|---|---|---|
New, unlabelled issues | Requirements and dependencies are clear enough to start working | Actively worked on issues | Issues under review | Can be deployed to production | Deployed and running in production |
We move high priority issues to the top of each column
help.github.com/articles/helping-people-contribute-to-your-project
an important instrument that legally allows usage
Basically: just a file called LICENSE in your repo
Detected by GitHub to show license on top of repo page
Strong copyleft: GPL, AGPL
Weak copyleft: LGPL, EPL
Permissive: MIT, BSD
GitHub maintains a catalog of licenses and a tool to choose one at http://choosealicense.com/
"Choosealicense.com is intended to demystify license choices, not present or catalog all of them."
For non-software, other licenses might make more sense
http://choosealicense.com/non-software/
It lists all open Creative Commons licenses:
CC0, CC-BY-4.0 and CC-BY-SA-4.0
Option 1: local file, add, commit, push |
Option 2: use GitHub web UI, add file, use template |
https://help.github.com/articles/adding-a-license-to-a-repository |
Contributor guidelines
How to set up development environment, coding conventions, etc.
Basically: just a file called CONTRIBUTING.md in your repo
Linked when creating new issues and pull requests
“define community standards, signal a welcoming and inclusive project, and outline procedures for handling abuse”
Basically: just a file called CODE_OF_CONDUCT.md in your repo
help.github.com/articles/adding-a-code-of-conduct-to-your-project
Template for new issues
Things you expect in an issue
Expected behavior, actual behavior, steps to reproduce, environment (browser, operating system, etc.)
Create, add, commit, push file: ISSUE_TEMPLATE.md (Template for: expected behavior, actual behavior, steps to reproduce, environment like browser, operating system, etc.) |
Create a new issue to see the template |
https://help.github.com/articles/creating-an-issue-template-for-your-repository/ |
Template for new pull request
Things you expect in a pull request
Which issue is fixed? How to reproduce the fix?
Create, add, commit, push file: PULL_REQUEST_TEMPLATE.md (Template for: which issue is fixed? How to reproduce the fix?) |
Open a pull request to see the template |
https://help.github.com/articles/creating-a-pull-request-template-for-your-repository |
GitHub is proprietary software
Almost all the things we saw are part of the Git repo
(actual files, readme, contributor guide, templates)
Except: issues and pull request
https://api.github.com/repos/hbz/swib16-workshop/issues?state=all
https://api.github.com/repos/hbz/swib16-workshop/issues/19/comments
"Command line tool and library for transferring data with URLs"
# Get your own profile
$ curl https://api.github.com/users/<user>
# Same, but authenticated and show some information:
$ curl -i -u <user> https://api.github.com/user
# Get data for a repo
$ curl https://api.github.com/repos/hbz/swib16-workshop
Last cURL output was pretty long, we often want specific values
jq: "A lightweight and flexible command-line JSON processor"
Different sample: commits, the history
Pick out only author name and message
# We start with cURL again:
$ curl 'https://api.github.com/repos/hbz/swib16-workshop/commits'
# We pipe the JSON to jq and pick out just the first element
$ curl [...] | jq '.[0]'
# Construct new JSON with the fields we care about
jq '.[0] | {message: .commit.message, author: .commit.author.name}'
# Looks good, now let's feed each commit into that format
jq '.[] | {message: .commit.message, author: .commit.author.name}'
# You can redirect any output to a file
$ curl [...] | jq '.[] | .commit.author.name' > output.txt
# We start with cURL again:
curl https://api.github.com/repos/hbz/swib16-workshop/issues?state=all
# The task: get all issues, with all their comments, store in files
# jq supports compact output, "JSON lines"
jq -c
# How to delete quotes from a URL:
"https://api.github.com/repos/hbz/swib16-workshop" | tr -d '"'
# How to iterate over a list of URLs in a file:
while read URL
do
echo $URL
done < urls.txt
# First lets collect all issues with IDs and their comments URLs:
curl https://api.github.com/repos/hbz/swib16-workshop/issues?state=all
| jq -c '.[] | {id: .number, body: .body, comments: .comments_url'}
> issues.jsonl
# For each line, we then append the body and all comments to a file:
while read ISSUE
do
ID=$(echo $ISSUE | jq '.id')
echo $ISSUE | jq '.body' > $ID.txt
URL=$(echo $ISSUE | jq '.comments' | tr -d '"')
curl $URL | jq '.[] | .body' >> $ID.txt
done < issues.jsonl
Missing: error handling, additional info on comment author etc.
Run a script like that automatically
As a reaction to specific activity
Like creation of new issues or comments
You can register a webhook for your GitHub repo
Makes GitHub send events to a specified URL
e.g. when commits are pushed, issues opened, or comments added
Server running at given URL can handle these events
We use GitHub Webhooks for the Metafacture project
Back up issues in the metafacture-documentation repo | details
Tweet project activity | details
https://data-lessons.github.io/library-git