GNU GCC compiler and json data or json graph data

 The GNU GCC compiler is at https://gcc.gnu.org/ and cn generate some json data but maybe can also generate json jgf graph data. See json.cc in the gcc sources.

This is the summary of the gcc-11.1 compiler graph options:

-fdump-rtl-all-graph
-fdump-tree-all-graph
-fdump-ipa-all-graph
-fcallgraph-info
-fdump-analyzer-callgraph
-fdump-analyzer-exploded-graph
-fdump-analyzer-supergraph
-fdump-analyzer-state-purge
-fdump-analyzer-feasibility
-fdump-analyzer-json

The analyzer json data can be used with firefox browser but do not know the exact format of the data or tools to use the data.

Created a patch to generate gml graph data fom GCC for gml4gtk viewer at https://notabug.org/mooigraph/gcc-10.1-gml

One conclusion is that basic graphml can be important and also that a data description and writing json jgf graph data can be a flexible solution because for every language there is a reliable json parser. 

The json jgf graph format is defined at https://jsongraphformat.info/  and supported by gml4gtk at https://sourceforge.net/projects/gml4gtk/ 

Using json jgf graph data format it is easy to embed graphviz specific data simply creating data items as "dot_label": { ... } then the program parsing the json data can decide itself what to do with this data.

Current gcc can write the diagnostics as json with this option and do not knwo about tools using this feature, see https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Diagnostic-Message-Formatting-Options.html



-fdiagnostics-format=FORMAT

    Select a different format for printing diagnostics. FORMAT is ‘text’ or ‘json’. The default is ‘text’.

    The ‘json’ format consists of a top-level JSON array containing JSON objects representing the diagnostics.

    The JSON is emitted as one line, without formatting; the examples below have been formatted for clarity.

    Diagnostics can have child diagnostics. For example, this error and note:

    misleading-indentation.c:15:3: warning: this 'if' clause does not
      guard... [-Wmisleading-indentation]
       15 |   if (flag)
          |   ^~
    misleading-indentation.c:17:5: note: ...this statement, but the latter
      is misleadingly indented as if it were guarded by the 'if'
       17 |     y = 2;
          |     ^

    might be printed in JSON form (after formatting) like this:

    [
        {
            "kind": "warning",
            "locations": [
                {
                    "caret": {
                        "column": 3,
                        "file": "misleading-indentation.c",
                        "line": 15
                    },
                    "finish": {
                        "column": 4,
                        "file": "misleading-indentation.c",
                        "line": 15
                    }
                }
            ],
            "message": "this \u2018if\u2019 clause does not guard...",
            "option": "-Wmisleading-indentation",
            "children": [
                {
                    "kind": "note",
                    "locations": [
                        {
                            "caret": {
                                "column": 5,
                                "file": "misleading-indentation.c",
                                "line": 17
                            }
                        }
                    ],
                    "message": "...this statement, but the latter is …"
                }
            ]
        },
        …
    ]

    where the note is a child of the warning.

    A diagnostic has a kind. If this is warning, then there is an option key describing the command-line option controlling the warning.

    A diagnostic can contain zero or more locations. Each location has up to three positions within it: a caret position and optional start and finish positions. A location can also have an optional label string. For example, this error:

    bad-binary-ops.c:64:23: error: invalid operands to binary + (have 'S' {aka
       'struct s'} and 'T' {aka 'struct t'})
       64 |   return callee_4a () + callee_4b ();
          |          ~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
          |          |              |
          |          |              T {aka struct t}
          |          S {aka struct s}

    has three locations. Its primary location is at the “+” token at column 23. It has two secondary locations, describing the left and right-hand sides of the expression, which have labels. It might be printed in JSON form as:

        {
            "children": [],
            "kind": "error",
            "locations": [
                {
                    "caret": {
                        "column": 23, "file": "bad-binary-ops.c", "line": 64
                    }
                },
                {
                    "caret": {
                        "column": 10, "file": "bad-binary-ops.c", "line": 64
                    },
                    "finish": {
                        "column": 21, "file": "bad-binary-ops.c", "line": 64
                    },
                    "label": "S {aka struct s}"
                },
                {
                    "caret": {
                        "column": 25, "file": "bad-binary-ops.c", "line": 64
                    },
                    "finish": {
                        "column": 36, "file": "bad-binary-ops.c", "line": 64
                    },
                    "label": "T {aka struct t}"
                }
            ],
            "message": "invalid operands to binary + …"
        }

    If a diagnostic contains fix-it hints, it has a fixits array, consisting of half-open intervals, similar to the output of -fdiagnostics-parseable-fixits. For example, this diagnostic with a replacement fix-it hint:

    demo.c:8:15: error: 'struct s' has no member named 'colour'; did you
      mean 'color'?
        8 |   return ptr->colour;
          |               ^~~~~~
          |               color

    might be printed in JSON form as:

        {
            "children": [],
            "fixits": [
                {
                    "next": {
                        "column": 21,
                        "file": "demo.c",
                        "line": 8
                    },
                    "start": {
                        "column": 15,
                        "file": "demo.c",
                        "line": 8
                    },
                    "string": "color"
                }
            ],
            "kind": "error",
            "locations": [
                {
                    "caret": {
                        "column": 15,
                        "file": "demo.c",
                        "line": 8
                    },
                    "finish": {
                        "column": 20,
                        "file": "demo.c",
                        "line": 8
                    }
                }
            ],
            "message": "\u2018struct s\u2019 has no member named …"
        }

    where the fix-it hint suggests replacing the text from start up to but not including next with string’s value. Deletions are expressed via an empty value for string, insertions by having start equal next.



also gcov has json support and do not know about tools how this data is used see https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Invoking-Gcov.html


-i
--json-format

    Output gcov file in an easy-to-parse JSON intermediate format which does not require source code for generation. The JSON file is compressed with gzip compression algorithm and the files have .gcov.json.gz extension.

    Structure of the JSON is following:

    {
      "current_working_directory": current_working_directory,
      "data_file": data_file,
      "format_version": format_version,
      "gcc_version": gcc_version
      "files": [file]
    }

    Fields of the root element have following semantics:

        current_working_directory: working directory where a compilation unit was compiled
        data_file: name of the data file (GCDA)
        format_version: semantic version of the format
        gcc_version: version of the GCC compiler 

    Each file has the following form:

    {
      "file": file_name,
      "functions": [function],
      "lines": [line]
    }

    Fields of the file element have following semantics:

        file_name: name of the source file 

    Each function has the following form:

    {
      "blocks": blocks,
      "blocks_executed": blocks_executed,
      "demangled_name": "demangled_name,
      "end_column": end_column,
      "end_line": end_line,
      "execution_count": execution_count,
      "name": name,
      "start_column": start_column
      "start_line": start_line
    }

    Fields of the function element have following semantics:

        blocks: number of blocks that are in the function
        blocks_executed: number of executed blocks of the function
        demangled_name: demangled name of the function
        end_column: column in the source file where the function ends
        end_line: line in the source file where the function ends
        execution_count: number of executions of the function
        name: name of the function
        start_column: column in the source file where the function begins
        start_line: line in the source file where the function begins 

    Note that line numbers and column numbers number from 1. In the current implementation, start_line and start_column do not include any template parameters and the leading return type but that this is likely to be fixed in the future.

    Each line has the following form:

    {
      "branches": [branch],
      "count": count,
      "line_number": line_number,
      "unexecuted_block": unexecuted_block
      "function_name": function_name,
    }

    Branches are present only with -b option. Fields of the line element have following semantics:

        count: number of executions of the line
        line_number: line number
        unexecuted_block: flag whether the line contains an unexecuted block (not all statements on the line are executed)
        function_name: a name of a function this line belongs to (for a line with an inlined statements can be not set) 

    Each branch has the following form:

    {
      "count": count,
      "fallthrough": fallthrough,
      "throw": throw
    }

    Fields of the branch element have following semantics:

        count: number of executions of the branch
        fallthrough: true when the branch is a fall through branch
        throw: true when the branch is an exceptional branch 


This idea about using jgf json graph data can be tried using a small gcc patch and see where problems rise and it also needs to take a closer look how it could be applied with llvm clang compiler data.

And gcc has plugin feature usable for a small tool generatin json jgf graph data and the maintened version works using python.