Github
This page contains the setup guide and reference information for the GitHub source connector.
Prerequisites
- List of GitHub Repositories (and access for them in case they are private)
For Calabi Connect:
- OAuth
- Personal Access Token (see Permissions and scopes)
For Calabi Connect:
- Personal Access Token (see Permissions and scopes)
Setup guide
Step 1: Set up GitHub
Create a GitHub Account.
Calabi Connect additional setup steps
Log into GitHub and then generate a personal access token. To load balance your API quota consumption across multiple API tokens, input multiple tokens separated with ,.
Step 2: Set up the GitHub connector in Calabi Connect
For Calabi Connect:
- Log into your Calabi Connect account.
- Click Sources and then click + New source.
- On the Set up the source page, select GitHub from the Source type dropdown.
- Enter a name for the GitHub connector.
- To authenticate:
-
For Calabi Connect: Authenticate your GitHub account to authorize your GitHub account. Calabi Connect will authenticate the GitHub account you are already logged in to. Please make sure you are logged into the right account.
-
For Calabi Connect: Authenticate with Personal Access Token. To generate a personal access token, log into GitHub and then generate a personal access token. Enter your GitHub personal access token. To load balance your API quota consumption across multiple API tokens, input multiple tokens separated with
,.
- GitHub Repositories - Enter a list of GitHub organizations/repositories, e.g.
airbytehq/airbytefor single repository,airbytehq/airbyte airbytehq/another-repofor multiple repositories. If you want to specify the organization to receive data from all its repositories, then you should specify it according to the following example:airbytehq/*.
Repositories with the wrong name or repositories that do not exist or have the wrong name format will be skipped with WARN message in the logs.
- Start date (Optional) - The date from which you'd like to replicate data for streams. For streams which support this configuration, only data generated on or after the start date will be replicated.
-
These streams will only sync records generated on or after the Start Date:
comments,commit_comment_reactions,commit_comments,commits,deployments,events,issue_comment_reactions,issue_events,issue_milestones,issue_reactions,issues,project_cards,project_columns,projects,pull_request_comment_reactions,pull_requests,pull_request_stats,releases,review_comments,reviews,stargazers,workflow_runs,workflows. -
The Start Date does not apply to the streams below and all data will be synced for these streams:
assignees,branches,collaborators,issue_labels,organizations,pull_request_commits,repositories,tags,teams,users
- Branch (Optional) - List of GitHub repository branches to pull commits from, e.g.
airbytehq/airbyte/master. If no branches are specified for a repository, the default branch will be pulled. (e.g.airbytehq/airbyte/master airbytehq/airbyte/my-branch).
For Calabi Connect:
- Navigate to the Calabi Connect dashboard.
- Click Sources and then click + New source.
- On the Set up the source page, select GitHub from the Source type dropdown.
- Enter a name for the GitHub connector.
Supported sync modes
The GitHub source connector supports the following sync modes:
- Full Refresh - Overwrite
- Full Refresh - Append
- Incremental Sync - Append
- Incremental Sync - Append + Deduped
Supported Streams
This connector outputs the following full refresh streams:
- Assignees
- Branches
- Contributor Activity
- Collaborators
- Issue labels
- Organizations
- Pull request commits
- Tags
- TeamMembers
- TeamMemberships
- Teams
- Users
- Issue timeline events
This connector outputs the following incremental streams:
- Comments
- Commit comment reactions
- Commit comments
- Commits
- Deployments
- Events
- Issue comment reactions
- Issue events
- Issue milestones
- Issue reactions
- Issues
- Project (Classic) cards
- Project (Classic) columns
- Projects (Classic)
- ProjectsV2
- Pull request comment reactions
- Pull request stats
- Pull requests
- Releases
- Repositories
- Review comments
- Reviews
- Stargazers
- WorkflowJobs
- WorkflowRuns
- Workflows
Entity-Relationship Diagram (ERD)
Notes
-
Only 4 streams (
comments,commits,issuesandreview comments) from the listed above streams are pure incremental meaning that they:- read only new records;
- output only new records.
-
Streams
workflow_runsandworkflow_jobsare almost pure incremental: -
Other 19 incremental streams are also incremental but with one difference, they:
- read all records;
- output only new records. Please, consider this behaviour when using those 19 incremental streams because it may affect you API call limits.
-
Sometimes for large streams specifying very distant
start_datein the past may result in keep on getting error from GitHub instead of records (respectiveWARNlog message will be outputted). In this case Specifying more recentstart_datemay help. The "Start date" configuration option does not apply to the streams below, because the GitHub API does not include dates which can be used for filtering:
assigneesbranchescollaboratorsissue_labelsorganizationspull_request_commitsrepositoriestagsteamsusers
Limitations & Troubleshooting
Expand to see details about GitHub connector limitations and troubleshooting.
Connector limitations
Rate limiting
You can use a personal access token to make API requests. Additionally, you can authorize a GitHub App or OAuth app, which can then make API requests on your behalf. All of these requests count towards your personal rate limit of 5,000 requests per hour (15,000 requests per hour if the app is owned by a GitHub Enterprise Cloud organization).
REST API and GraphQL API rate limits are counted separately. The REST API uses a request-based limit, while the GraphQL API uses a point-based limit where each query costs a calculated number of points. Streams that use the GraphQL API include pull_request_stats, reviews, pull_request_comment_reactions, issue_reactions, releases, and projects_v2.
In the event that limits are reached before all streams have been read, it is recommended to take the following actions:
- Utilize Incremental sync mode.
- Set a higher sync interval.
- Divide the sync into separate connections with a smaller number of streams.
Refer to GitHub article Rate limits for the REST API.
Releases stream asset limit
The Releases stream uses the GitHub GraphQL API and fetches up to 100 assets per release. Releases with more than 100 assets will only include the first 100. Sub-pagination for release assets is not currently supported.
Permissions and scopes
If you use OAuth authentication method, the OAuth2.0 application requests the next list of scopes: repo, read:org, read:repo_hook, read:user, read:discussion, read:project, workflow. For personal access token you need to manually select needed scopes.
Your token should have at least the repo scope. Depending on which streams you want to sync, the user generating the token needs more permissions:
- For syncing Collaborators, the user which generates the personal access token must be a collaborator. To become a collaborator, they must be invited by an owner. If there are no collaborators, no records will be synced. Read more about access permissions here.
- Syncing Teams is only available to authenticated members of a team's organization. Personal user accounts and repositories belonging to them don't have access to Teams features. In this case no records will be synced.
- To sync the Projects stream, the repository must have the Projects feature enabled.
Troubleshooting
- Check out common troubleshooting issues for the GitHub source connector on our Calabi Connect Forum