Exercise 6.2 - Gremlin Queries for GitHub

Objectives

The objective of this exercise is to help new users of Syndeia graph analysis to formulate Gremlin queries to analyze their graph with the Syndeia Web Dashboard. GitHub is an open-File configuration-managed software code repository and DevOps tool developed by GitHub Inc. The specific learning objectives of this exercise are to create lists of

  • GitHub artifacts of a specific artifact type

  • GitHub artifacts in a specific Container (Git Repository)

  • GitHub artifacts connected as part of a specific Syndeia Project

Preparation

This exercise assumes the student has

  • Syndeia Cloud 3.3 or 3.4 installed with a valid user account, and

  • An existing Syndeia graph containing GitHub objects connected to elements in other repositories.

Because the content of your Syndeia graph will be different, the specific examples in the following exercise instructions are only a guide and example for your actions. It is generally advisable to carry out these exercises in a non-production repository, a "sandbox", set up for training and practice purposes.
See the tutorials under Syndeia Cloud Web-Dashboard/Part 19 – Syndeia Cloud Graph Analysis for an overview of this feature.

Background – Syndeia Cloud Data Model

Figure 1 provides a simplified schema for elements in the Syndeia Cloud graph. All graph nodes are either Repositories, Containers, or Artifacts, where each Artifact is owned by a Container and Each Container is owned by a Repository. Each has a Type; the set of ContainerTypes and ArtifactTypes are owned by the Repository. How the GitHub data model maps to the Syndeia Cloud data model is discussed in the next section.

Figure 1 Syndeia Cloud Schema (simplified)

Background – GitHub

As of Syndeia release 3.4, the Syndeia Web Dashboard can extract and display some model information from a GitHub repository. Figure 2 shows a tree view of this information, with labels identifying the GitHub element types. Note the different icons. The label color coding indicates how the GitHub element type is mapped to the Syndeia Cloud element types: Repository (green), Container (red), and Artifact (blue). The Syndeia GitHub integration supports a large number of standard and custom GitHub artifact types, including Branch, Commit, Issue, Tag, and File. A more complete diagram of the GitHub data model as it is understood by Syndeia is available through the web dashboard help menu on the left.

It is also important to understand the limitations of graph queries with respect to the GitHub repositories. As of Syndeia 3.4, graph queries cannot extract the internal structure of a GitHub repository, i.e. they cannot be used to obtain the full structure of the GitHub repository or internal (intra-model) relations between GitHub artifacts. Graph queries are most useful in viewing inter-model connections from GitHub elements to other repositories.

Figure 2 Tree view of GitHub repository

Exercise

  1. Log on to the Syndeia Cloud Web Dashboard (see Video 1.9) and click on the Graph Queries icon on the left border.

  2. The first task is to compile a list of GitHub Artifacts of a specific type. Per Figure 1, ArtifactTypes are owned by (specific to) a Repository. We typically want to begin by creating a list of Artifact types available in such a Repository.

    1. If we use Query Builder (Figure 3), we select ArtifactType from the pull-down menu under Label.

    2. To restrict the list of ArtifactTypes to our current GitHub repository, we click Filters. We will filter by the name of our Repository, so we select Repository from the pull-down menu at the top marked Property of. Under Property Key, we select the Name property and under Property Value, we enter GitHub @ Intercax. We then click the Plus button to add the filter in the bottom list and the window should look similar to Figure 4. Click Close.

  3. Back on the Graph Queries page, click Run. The results, a list of all ArtifactTypes in GitHub @ Intercax, may be displayed in table form as in Figure 5. Key ArtifactType properties in the table are Name and Key because we will use these in the next search. Click the Exports icon to export the list as a CSV file for future reference, if desired.

  4. Note at the top of Figure 5, the Query Builder utility has created a Gremlin query. We could have performed the same search with the same results by going to the Raw Query mode and entering this query directly. g.V().has('sLabel','ArtifactType').where(outE().has('sLabel','ownedBy').inV().has('name','GitHub @ Intercax'))

  5. The final part of the first task is to generate a list of all Artifacts of type File within the GitHub @ Intercax Repository. Note that Syndeia will return only those GitHub File that are connected within the Syndeia Cloud graph or own elements that are connected within the Syndeia Cloud graph, not all files in the repository.

    1. We can search by ArtifactType Name ("GitHub File") or Key (ART-TYPE20), which we got from the table in Figure 5. Generally, it is better to search by Key, which is unique within the Syndeia Cloud database, rather than Name, which is not unique.

    2. If we use Query Builder, we select Artifact from the pull-down menu under Label.

    3. To restrict the list of Artifacts to the GitHub File type, we click Filters. We will filter by the ArtifactType Key, so we select ArtifactType from the pull-down menu at the top marked Property of. Under Property Key, we select the sKey property and under Property Value, we enter ART-TYPE420, which we took from the table in Figure 5. After we click the Plus icon, the Filters window should look like Figure 7. Click Close.

  6. Back on the Graph Queries page, click Run. The results, a list of all Artifacts of type ART-TYPE420, which is owned by the repository GitHub @ Intercax, may be displayed in table form as in Figure 8. Click the Exports icon to export the list as a CSV file for future reference, if desired.

  7. Note at the top of Figure 5, the Query Builder utility has created a Gremlin query. We could have performed the same search with the same results by going to the Raw Query mode and entering this query directly.
    g.V().has('sLabel','Artifact').where(outE().has('sLabel','hasType').inV().has('sKey','ART-TYPE420'))

  8. The second task is to compile a list of GitHub Artifacts in a specific GitHub Git Repository. Per Figure 2, Git Repositories in GitHub are Containers. We will begin by creating a list of Containers available in a GitHub Repository.

    1. If we use Query Builder (Figure 9), we select Container from the pull-down menu under Label.

       

    2. To restrict the list of Containers to our current GitHub repository, we click Filters. We will filter by the name of our Repository, so we select Repository from the pull-down menu at the top marked Property of. Under Property Key, we select the Name property and under Property Value, we enter GitHub @ Intercax. We then click the Plus button to add the filter in the bottom list and the window should look similar to Figure 10. Click Close.

       

  9. Back on the Graph Queries page, click Run. The results, a list of all Containers in GitHub @ Intercax may be displayed in table form as in Figure 11. Key Container properties in the table are Name and Key because we will use these in the next search. Click the Exports icon to export the list as a CSV file for future reference, if desired.
    Caution: Containers in GitHub include both Organizations and Git Repositories. Because the Syndeia data model in Figure 1 does not map perfectly to the GitHub data model in Figure 2, Gremlin queries related to Organizations work irregularly and we will only be working with Git Repositories as Containers. The list also does not include all Containers in the GitHub @ Intercax repository. Only those Git Repositories that own Artifacts that are connected to other models (or are connected directly themselves) appear on the list. Other GitHub Git Repositories that do not involve connections to other repositories are not part of the Syndeia Cloud graph and do not appear in Gremlin graph query results.

  10. Note at the top of Figure 11, the Query Builder utility has created a Gremlin query. We could have performed the same search with the same results by going to the Raw Query mode and entering this query directly.
    g.V().has('sLabel','Container').where(outE().has('sLabel','ownedBy').inV().has('name','GitHub @ Intercax'))

  11. The final part of the second task is to generate a list of all Artifacts in a specific Container within the GitHub @ Intercax Repository. Note that Syndeia will return only those GitHub Artifacts that are connected within the Syndeia Cloud graph, not all Artifacts in the container or repository.

    1. We can search by Container Name ("Autonomous_Vehicle_Systems") or Key (CONT1057), which we got from the table in Figure 11. Generally, it is better to search by Key, which is unique within the Syndeia Cloud database, rather than Name, which is not unique.

    2. If we use Query Builder, we select Artifact from the pull-down menu under Label, as in Figure 12.

    3. To restrict the list of Artifacts to the GitHub Git Repository Autonomous_Vehicle_Systems, we click Filters. We will filter by the Container Key, so we select Container from the pull-down menu at the top marked Property of. Under Property Key, we select the sKey property and under Property Value, we enter CONT1057, which we took from the table in Figure 11. After we click the Plus icon, the Filters window should look like Figure 13. Click Close.

  12. Back on the Graph Queries page, click Run. The results, a list of all Artifacts in Container CONT1057, which is owned by the repository GitHub @ Intercax, may be displayed in table form as in Figure 14. Note that only GitHub elements that are part of the Syndeia Cloud graph appear; there may be other GitHub elements in this Project without connections to other repositories that do not appear.

  13. Note at the top of Figure 14, the Query Builder utility has created a Gremlin query. We could have performed the same search with the same results by going to the Raw Query mode and entering this query directly.

    g.V().has('sLabel','Artifact').where(outE().has('sLabel','ownedBy').inV().has('sKey','CONT1057'))

  14. The third task is to compile a list of GitHub Artifacts that are connected as part of a specific Syndeia Project. Syndeia Projects are partitions within the Syndeia Cloud graph database that separate different projects or system models. Syndeia Projects are Containers owned by the Syndeia Repository. Unlike GitHub Git Repositories, Syndeia Projects contain only relations, the inter-model relations that define the "macrostructure" of the Digital Thread for that system or project. In this case, we are looking not for the GitHub elements directly; we are looking for inter-model connections where one end is a GitHub element.

  15. We will begin by creating a list of Containers available in the Syndeia Repository.

    1. If we use Query Builder (Figure 15), we select Container from the pull-down menu under Label.

    2. To restrict the list of Containers to the Syndeia repository, we click Filters. We will filter by the name of our Repository, so we select Repository from the pull-down menu at the top marked Property of. Under Property Key, we select the Name property and under Property Value, we enter Syndeia Repository. We then click the Plus button to add the filter in the bottom list and the window should look similar to Figure 16. Click Close.

  16. Back on the Graph Queries page, click Run. The results, a list of all Containers in the Syndeia Repository may be displayed in table form as in Figure 17. Key Container properties in the table are Name and Key because we will use these in the next search. Click the Exports icon to export the list as a CSV file for future reference, if desired.

  17. Note at the top of Figure 17, the Query Builder utility has created a Gremlin query. We could have performed the same search with the same results by going to the Raw Query mode and entering this query directly.
    g.V().has('sLabel','Container').where(outE().has('sLabel','ownedBy').inV().has('name','Syndeia Repository'))

  18. The next part of the third task is to generate a list of all Relations within a specific Syndeia Project.

    1. We can search by Container Name ("Dirk Sandbox 16") or Key (DZSB16), which we got from the table in Figure 17. Generally, it is better to search by Key, which is unique within the Syndeia Cloud database, rather than Name, which is not unique.

    2. If we use Query Builder, we select Relation from the pull-down menu under Label, as in Figure 18. Remember, the Syndeia Projects contain relations, not artifacts.

    3. To restrict the list of Relations to a specific Syndeia Project, we click Filters. We will filter by the Container Key, so we select Container from the pull-down menu at the top marked Property of. Under Property Key, we select the sKey property and under Property Value, we enter DZSB16, which we took from the table in Figure 17. After we click the Plus icon, the Filters window should look like Figure 19. Click Close.

  19. Back on the Graph Queries page, click Run. The results, a list of all Relations in Container DZSB15, which is owned by the Syndeia Repository, may be displayed in table form as in Figure 20. Note that all relations within the project appear, not just those with a GitHub artifact at one end.

  20. The final step is to identify the GitHub File elements that participate in these relations, but this cannot be done in Query Builder alone. Note at the top of Figure 20, the Query Builder utility has created a Gremlin query.
    g.E().has('sLabel','Relation').has('container','DZSB16')

    We will use the Gremlin query language to append an additional condition. First, we will add an additional traversal step to go to the vertices at the end of the relations. Since we don't know whether the GitHub requirement will have an incoming or outgoing relation in the Syndeia project, we use the bothV() step to cover both ends.
    g.E().has('sLabel','Relation').has('container', 'DZSB16').bothV()
    Next, we will check all vertices for ArtifactType. Going back to the table in Figure 5, we choose GitHub File, ART-TYPE420. g.E().has('sLabel','Relation').has('container', 'DZSB16').bothV().has('type','ART-TYPE420')
    If we select Raw Query and enter this in the Gremlin Query field, we generate the table in Figure 21, showing all GitHub elements of ART-TYPE420 used in the Syndeia Project DZSB16.

  21. There are alternate ways to approach the problem. If we wanted to search for GitHub elements in a specific GitHub Git Repository (CONT1057) that were used in a Syndeia Project (DZSB16), we could reformulate the query using the first part from Step 20 and the second part from Step 13.

    g.E().has('sLabel','Relation').has('container','DZSB16').bothV().where(outE().has('sLabel','ownedBy') .inV().has('sKey','CONT1057'))

Related pages