What is GitHub? More than Git version control in the cloud
- 02 April, 2018 20:00
GitHub is at heart a Git repository hosting service, i.e. a cloud-based source code management or version control system, but that’s just the beginning. In addition, GitHub implements features for code review (pull requests, diffs, and review requests), project management (including issue tracking and assignment), integrations with other developer tools, team management, documentation, and “social coding.”
Something like a social networking site for programmers, GitHub is an open environment where programmers can freely share and collaborate (even ad hoc) on open source code. GitHub makes it easy to find useful code, copy repositories for your own use, and submit changes to others’ projects. As a result, GitHub has become home to virtually every open source project of any importance.
Whenever I want to explore an open source project, I start by searching for the project name. Once I find the project website, I look for its code repository link, and nine times out of 10 I wind up on GitHub.
Git version control
Before we can understand what GitHub does and how GitHub works, we need to understand Git. Git is a distributed version control system, originally written by Linus Torvalds in 2005 for and with help from the Linux kernel community. I’m not here to sell you on Git, so I’ll spare you the spiel about how fast and small and flexible and popular it is, but you should know that when you clone a Git repository (“repo,” for short) you get the entire version history on your own computer, not just a snapshot from one branch at one time.
Git started as a command-line tool, befitting its origin in the Linux kernel community. You can still use the Git command line, if you like, but you don’t have to. Instead of or in addition to the command line, you can use the free GitHub client on Windows or Mac, or any of a number of other GUIs for Git, or a code editor that integrates with Git. All of these options are initially easier to use than the command line. The Git command line comes pre-installed on most Mac and Linux systems and supports all operations; the GUIs typically support a frequently used subset of Git operations.
Git is different from older version control systems such as Subversion in that it is distributed rather than centralized. It’s also quite fast, especially since most operations happen on your local repository. Nevertheless, using Git adds a level of complexity: committing code to your local repository and pushing your commits to a remote repository are separate steps. When teams forget this (or weren’t taught about it) it can lead to situations where different developers are working with code bases that have diverged.
A remote Git repository can be on a server, or it can be on another developer’s machine. That enables many possible workflows for teams. One common workflow involves using a server repository as the “blessed” repository, to which only reviewed, well-tested code is committed, often through a pull request issued from a developer’s repository.
I’ve already noted that GitHub is a cloud-based Git server for code hosting and social coding, and that it implements features for code review (pull requests, diffs, and review requests), project management (including issue tracking and assignment), integrations with other developer tools, team management, and documentation.
The latest innovation in social coding from GitHub is commit co-authors, which you accomplish by adding one or more “co-authored-by” trailers to the end of a commit message. This mechanism doesn’t affect the repo core per se, and doesn’t change how the repo looks on plain Git, but on GitHub the chrome will show multiple committers in the commit list, and give each co-author credit in his or her contribution graph.
If you wish, you can extend GitHub using the GitHub GraphQL API. This is a significant improvement over GitHub’s previous API, which was based on REST calls.
GitHub.com is a cloud hosting service that can handle a range of account types: free (public repos only) and paid ($7 per month) developer accounts, teams ($9 per user per month), and businesses ($21 per user per month). Should you wish to run GitHub Enterprise on-premises or in your own cloud instance on AWS, Microsoft Azure, Google Cloud Platform, or IBM Cloud, you can do so for the same $21 per user per month price as a hosted business account. GitHub Enterprise adds a few useful features, such as in-app messaging to users and access provisioning integrated with LDAP directories, but gives up GitHub.com’s 99.95 percent uptime SLA for hosted business accounts.
GitHub vs. Bitbucket
GitHub isn’t the only hosted enhanced Git service, and GitHub Enterprise isn’t the only on-premises product for companies. Atlassian Bitbucket competes with both of them, with slightly lower pricing and with a free five-member team level that includes unlimited private repos and the use of Bitbucket Pipelines for continuous integration. GitHub is a more popular site for open source projects and it has a much larger pool of open source developers. Bitbucket’s pricing is more favorable for small startups.
GitHub vs. GitLab
GitLab competes with both GitHub and Bitbucket, both hosted and on-premises. On the surface, GitLab appears to have more lifecycle functionality than the others, but the difference from Atlassian mostly disappears if you include Jira when you evaluate Bitbucket. GitLab offers Gold-plan cloud features to open source projects for free, but that additional functionality doesn’t really compensate for the larger open-source developer community on GitHub.
GitHub Desktop, shown below, makes it easy to manage your GitHub.com and GitHub Enterprise repositories. While it doesn’t implement all the features of the Git command line and the GitHub web GUI, it does implement all the operations you’ll do on a daily basis from your desktop while contributing to projects. Typically, you will clone repos from GitHub to GitHub Desktop, sync them as needed, create branches for your work, commit your work, and occasionally revert one or more commits.
To work with repos for which you lack commit and collaborate privileges, you typically start by forking the repo on GitHub and cloning the fork to your desktop. Then you add any branches you need in GitHub Desktop, commit any changes you wish, test your work, push the commits back to your remote forked repo, and finally generate a pull request to the parent project.
You can see the Pull Request button at the upper right of the GitHub Desktop interface. You can also see many commits in the Neo4j project that were merges of branches or pull requests. That’s typical of open source projects with few committers and many contributors.
You can use any programming editor you like to edit code, including GitHub’s free, open source, hackable Atom editor (shown below), which integrates well with GitHub and GitHub Desktop. You can use Atom on MacOS, Windows, or Linux. You can open Atom from GitHub Desktop by right-clicking on the repository you wish to browse or edit.
Open source software projects often need ways to enforce quality control while still accepting contributions from outside the core team of committers. The need for contributors is huge, but bringing new contributors into the project while maintaining the integrity of the codebase is a difficult and potentially dangerous undertaking. At the same time, the need for feedback from users of the project is also huge.
GitHub has a number of mechanisms that can help grease the wheels of open source projects. For example, users can add issues to the project on GitHub to report bugs or request features. Some other systems call these tickets. Project managers working with issues can generate task lists, assign issues to specific contributors, mention other interested contributors so that they are notified of changes, add labels, and add milestones.
To contribute to a project, you basically start from a topic head branch that contains the committed changes that you want added to the project base branch and initialize a pull request from the head branch, as shown below. Then you push your commits and add them to the project branch. Other contributors can review your proposed changes, add review comments, contribute to the pull request discussion, and add their own commits to the pull request.
Once everyone involved is happy with the proposed changes, a committer can merge the pull request. The merge can preserve all the commits, squash all changes into a single commit, or rebase the commits from the head branch into the base branch. If the merge generates conflicts, you can resolve them on GitHub or using the command line.
Code reviews on GitHub allow a distributed team to collaborate asynchronously. Useful GitHub tools for reviewers include diffs (the lower half of the screenshot below), history (the upper half), and blame view (a way to view the evolution of a file commit by commit). Code discussions on GitHub go into comments that are presented inline with your code changes. If the built-in tools don’t suffice for your project, you can add code review and continuous integration tools from the GitHub marketplace. Marketplace add-ons are often free for open source projects.
Gists are special GitHub repositories for sharing your work (public) or for saving work for later reuse (secret). They can contain single files, parts of files, or full applications. You can download gists, clone them, fork them, and embed them.
Public gists can be discovered and found in searches. You can use keywords to narrow down what you find, including prefixes to restrict the results to gists from specific users, gists with at least N stars, gists with specific filenames, and so on.
Secret gists are not searchable, but anyone with the URL can see them. If you really want your code to be protected, use a private repository.
As we’ve seen, GitHub provides Git repositories as a service, along with features for code review, project management, integrations with other developer tools, team management, social coding, and documentation. While GitHub is not the only product in its category, it is the dominant repository for open source software development.