# FlowQuery **Repository Path**: mirrors_microsoft/FlowQuery ## Basic Information - **Project Name**: FlowQuery - **Description**: A declarative query language for data processing pipelines. - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-16 - **Last Updated**: 2026-06-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # FlowQuery **Author:** Niclas Kjäll-Ohlsson A declarative OpenCypher-based query language for virtual graphs and data processing pipelines. FlowQuery is a declarative query language aiming to fully support OpenCypher, extended with capabilities such as **virtual graphs**, HTTP data loading, f-strings, and **custom function extensibility**. Virtual nodes and relationships are backed by sub-queries that can fetch data dynamically (e.g., from REST APIs), and FlowQuery's graph engine supports pattern matching, variable-length traversals, optional matches, relationship direction, and filter pass-down, enabling you to model and explore complex data relationships without a traditional graph database. Beyond graphs, FlowQuery provides a full data processing pipeline language with features like `LOAD JSON FROM` for HTTP calls (GET/POST with headers), f-strings, list comprehensions, inline predicate aggregation, temporal functions, and a rich library of scalar and aggregate functions. ### Graph RAG with FlowQuery The combination of graph querying and pipeline processing makes FlowQuery ideal for the retrieval stage of Retrieval Augmented Generation (RAG). A typical graph RAG flow works as follows: 1. **User query** - The user asks a question in natural language. 2. **Schema retrieval** - The application retrieves the virtual graph schema via `CALL schema()` and injects it into the system instructions of the query-generation LLM, so it knows which node labels, relationship types, and properties are available. 3. **Query generation** - The LLM, grounded in the schema, generates a precise OpenCypher query to retrieve the data needed to answer the question. 4. **Query execution** - The FlowQuery engine executes the generated OpenCypher query against the virtual graph and returns the results as grounding data. 5. **Response formulation** - The LLM formulates a final response informed by the grounding data. ``` ┌───────────────────┐ │ Graph Schema │ │ (via schema()) │ └────────┬──────────┘ │ system instructions v ┌──────────┐ ┌───────────────┐ ┌─────────────────┐ ┌───────────────┐ │ User │────>│ LLM │────>│ FlowQuery │────>│ LLM │ │ Question │ │ Generate Query│ │ Execute Query │ │ Formulate │ │ │ │ (OpenCypher) │ │ (Virtual Graph) │ │ Response │ └──────────┘ └───────────────┘ └─────────────────┘ └───────┬───────┘ │ v ┌──────────┐ │ Answer │ └──────────┘ ``` The schema is retrieved using FlowQuery's built-in `schema()` function, which returns the structure of all registered virtual nodes and relationships - including labels, types, endpoint labels, property names, and sample values. This schema is then included in the LLM's system instructions so it can generate correct queries grounded in the actual graph model: ```cypher CALL schema() YIELD kind, label, type, from_label, to_label, properties, sample RETURN kind, label, type, from_label, to_label, properties ``` For the [Virtual Org Chart](#virtual-org-chart) example, this returns: | kind | label | type | from_label | to_label | properties | | ------------ | -------- | ---------- | ---------- | -------- | ------------------------------------------- | | Node | Employee | N/A | N/A | N/A | [name, jobTitle, department, phone, skills] | | Relationship | N/A | REPORTS_TO | Employee | Employee | N/A | Node rows carry `label` and `properties`; relationship rows carry `type`, `from_label`, `to_label`, and `properties`. Fields not applicable to a row are `null`. See the [Language Reference](#language-reference) and [Quick Cheat Sheet](#quick-cheat-sheet) for full syntax documentation. For a complete worked example, see [Virtual Org Chart](#virtual-org-chart). FlowQuery is written in TypeScript and runs both in the browser and in Node.js as a self-contained single-file JavaScript library. A pure Python implementation of FlowQuery with full functional fidelity is also available in the [flowquery-py](./flowquery-py) sub-folder (`pip install flowquery`). - Test live at https://microsoft.github.io/FlowQuery/. - Try as a VSCode plugin from https://marketplace.visualstudio.com/items?itemName=FlowQuery.flowquery-vscode. ## Howto - Dev: `npm start` - This will start a FlowQuery command line where you can run statements. - Test: `npm test` - This will run all unit tests. - Build: `npm run build` (builds for both Node and web) ## Installation & Usage ### Node.js Install FlowQuery from npm: ```bash npm install flowquery ``` Then use it in your code: ```javascript const FlowQuery = require("flowquery").default; // Or with ES modules: // import FlowQuery from 'flowquery'; async function main() { const query = new FlowQuery("WITH 1 AS x RETURN x + 1"); await query.run(); console.log(query.results); // [ { expr0: 2 } ] } main(); ``` ### Browser Include the minified bundle in your HTML: ```html ``` Or import from the browser-specific entry point: ```javascript import FlowQuery from "flowquery/browser"; const query = new FlowQuery('WITH "Hello" AS greeting RETURN greeting'); await query.run(); console.log(query.results); ``` ### Python Install FlowQuery from PyPI: ```bash pip install flowquery ``` Then use it in your code: ```python import asyncio from flowquery import Runner runner = Runner("WITH 1 AS x RETURN x + 1 AS result") asyncio.run(runner.run()) print(runner.results) # [{'result': 2}] ``` Or start the interactive REPL: ```bash flowquery ``` See [flowquery-py](./flowquery-py) for more details, including custom function extensibility in Python. ## Language Reference ### Clauses #### RETURN Returns results. Expressions can be aliased with `AS`. ```cypher RETURN 1 + 2 AS sum, 3 + 4 AS sum2 // [{ sum: 3, sum2: 7 }] ``` #### WITH Introduces variables into scope. Works like `RETURN` but continues the pipeline. ```cypher WITH 1 AS a RETURN a // [{ a: 1 }] ``` #### UNWIND Expands a list into individual rows. ```cypher UNWIND [1, 2, 3] AS num RETURN num // [{ num: 1 }, { num: 2 }, { num: 3 }] ``` Unwinding `null` produces zero rows. #### LOAD JSON FROM Fetches JSON data from a URL. Supports GET (default) and POST with headers. ```cypher LOAD JSON FROM "https://api.example.com/data" AS data RETURN data // With POST body and custom headers LOAD JSON FROM 'https://api.example.com/endpoint' HEADERS { `Content-Type`: 'application/json', Authorization: f'Bearer {token}' } POST { key: 'value' } AS response RETURN response ``` #### LIMIT Restricts the number of rows. Can appear mid-pipeline or after `RETURN`. ```cypher UNWIND range(1, 100) AS i RETURN i LIMIT 5 ``` #### CALL ... YIELD Invokes an async function and yields named fields into scope. ```cypher CALL myAsyncFunction() YIELD result RETURN result // If last operation, YIELD is optional CALL myAsyncFunction() ``` #### UNION / UNION ALL Combines results from multiple queries. `UNION` removes duplicates; `UNION ALL` keeps them. Column names must match. ```cypher WITH 1 AS x RETURN x UNION WITH 2 AS x RETURN x // [{ x: 1 }, { x: 2 }] WITH 1 AS x RETURN x UNION ALL WITH 1 AS x RETURN x // [{ x: 1 }, { x: 1 }] ``` #### Multi-Statement Queries Multiple statements can be separated by semicolons. Only declaration statements - `CREATE VIRTUAL`, `DELETE VIRTUAL` (alias: `DROP VIRTUAL`), `REFRESH VIRTUAL`, `LET`, `UPDATE`, and `MERGE INTO` - may appear before the last statement. The last statement can be any valid query. ```cypher CREATE VIRTUAL (:Person) AS { UNWIND [{id: 1, name: 'Alice'}, {id: 2, name: 'Bob'}] AS r RETURN r.id AS id, r.name AS name }; CREATE VIRTUAL (:Person)-[:KNOWS]-(:Person) AS { UNWIND [{left_id: 1, right_id: 2}] AS r RETURN r.left_id AS left_id, r.right_id AS right_id }; MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN a.name AS from, b.name AS to ``` The `Runner` also exposes a `metadata` property with counts of virtual nodes/relationships created and deleted: ```javascript const runner = new FlowQuery("CREATE VIRTUAL (:X) AS { RETURN 1 AS id }; MATCH (n:X) RETURN n"); await runner.run(); console.log(runner.metadata); // { virtual_nodes_created: 1, virtual_relationships_created: 0, // virtual_nodes_deleted: 0, virtual_relationships_deleted: 0, // info: { node_labels: ["X"], relationship_types: [], sources: [], ... } } ``` #### Caching Virtual Entities: `STATIC` and `REFRESH` By default, every `MATCH` against a virtual node or relationship re-executes its backing sub-query. For expensive sources (HTTP endpoints, large CSV files) you can opt in to persistent caching with the `STATIC` keyword: ```cypher CREATE STATIC VIRTUAL (:Country) AS { LOAD JSON FROM 'https://restcountries.com/v3.1/all?fields=name,cca2,population' AS c RETURN c.cca2 AS code, c.name.common AS name }; ``` The sub-query runs once on first access and the result is reused for every subsequent query in the same process - across `Runner` instances. STATIC virtual entities are protected: re-running `CREATE STATIC VIRTUAL (:Country)` without first dropping the existing entry raises an error. Use `DROP VIRTUAL (:Country)` (an alias for `DELETE VIRTUAL`) to remove it. To refresh on a schedule, add a `REFRESH EVERY ` clause. Supported units are `SECOND[S]`, `MINUTE[S]`, `HOUR[S]`, and `DAY[S]`: ```cypher CREATE STATIC VIRTUAL (:Country) AS { LOAD JSON FROM 'https://restcountries.com/v3.1/all?fields=name,cca2,population' AS c RETURN c.cca2 AS code, c.name.common AS name } REFRESH EVERY 1 HOUR; ``` Refresh is lazy: the cache is re-populated on the first access after the TTL elapses; no background timers are scheduled. `REFRESH EVERY` requires `STATIC` (caching must be enabled to refresh). To force an immediate refresh from anywhere in a query, use `REFRESH VIRTUAL (...)`: ```cypher REFRESH VIRTUAL (:Country); MATCH (c:Country) RETURN c.name ``` `REFRESH VIRTUAL` works on both nodes and relationships and clears the cache so that the next access re-executes the backing sub-query. #### Refreshable `LET` Bindings: `REFRESH EVERY`, `REFRESH BINDING`, `DROP BINDING` `LET` bindings live for the lifetime of the process, just like virtual nodes and relationships, and the same caching primitives apply. A plain `LET name = { ... }` evaluates the sub-query once, when the `LET` statement executes, and stores the result in the global binding store; all subsequent reads return that cached value without re-running the sub-query. To opt into TTL-based re-evaluation, add a trailing `REFRESH EVERY ` clause: ```cypher LET users = { LOAD JSON FROM 'https://example.com/users.json' AS u RETURN u.id AS id, u.name AS name } REFRESH EVERY 5 MINUTES; LOAD JSON FROM users AS u RETURN u.id AS id, u.name AS name ``` The sub-query still runs eagerly at `LET` time and the result is cached just like a plain binding; the `REFRESH EVERY` clause additionally arranges for the next read after the TTL has elapsed to re-execute the sub-query. Refreshable bindings (those with a `REFRESH EVERY` clause) cannot be silently overwritten: re-running `LET name = { ... } REFRESH EVERY ...` without first dropping the existing binding raises an error, and so does `UPDATE name = ...` or `MERGE INTO name ...` against the same name. `LET ... REFRESH EVERY` requires a sub-query right-hand side (an expression like `42 REFRESH EVERY 1 MINUTE` is rejected). To force an immediate refresh outside the TTL schedule, use `REFRESH BINDING`: ```cypher REFRESH BINDING users; LOAD JSON FROM users AS u RETURN u.id AS id ``` To remove any binding (plain or refreshable), use `DROP BINDING`: ```cypher DROP BINDING users; ``` `UPDATE` and `MERGE INTO` against a refreshable binding are blocked because the mutation would be invisibly overwritten by the next refresh. Use `REFRESH BINDING name` to re-evaluate the source, or `DROP BINDING name` and redefine the binding plainly. ### WHERE Clause Filters rows based on conditions. Supports the following operators: | Operator | Example | | -------------------- | -------------------------------------------------------------- | | Comparison | `=`, `<>`, `>`, `>=`, `<`, `<=` | | Logical | `AND`, `OR`, `NOT` | | Null checks | `IS NULL`, `IS NOT NULL` | | List membership | `IN [...]`, `NOT IN [...]` | | String matching | `CONTAINS`, `NOT CONTAINS` | | String prefix/suffix | `STARTS WITH`, `NOT STARTS WITH`, `ENDS WITH`, `NOT ENDS WITH` | ```cypher UNWIND range(1,100) AS n WITH n WHERE n >= 20 AND n <= 30 RETURN n UNWIND ['apple', 'banana', 'grape'] AS fruit WITH fruit WHERE fruit CONTAINS 'ap' RETURN fruit // [{ fruit: 'apple' }, { fruit: 'grape' }] UNWIND ['apple', 'apricot', 'banana'] AS fruit WITH fruit WHERE fruit STARTS WITH 'ap' RETURN fruit WITH fruit WHERE fruit IN ['banana', 'date'] RETURN fruit WHERE age IS NOT NULL ``` ### ORDER BY Sorts results. Supports `ASC` (default) and `DESC`. Can use aliases, property access, function expressions, or arithmetic. ```cypher UNWIND [3, 1, 2] AS x RETURN x ORDER BY x DESC // [{ x: 3 }, { x: 2 }, { x: 1 }] // Multiple sort keys RETURN person.name AS name, person.age AS age ORDER BY name ASC, age DESC // Sort by expression (expression values are not leaked into results) UNWIND ['BANANA', 'apple', 'Cherry'] AS fruit RETURN fruit ORDER BY toLower(fruit) // Sort by arithmetic expression RETURN item.a AS a, item.b AS b ORDER BY item.a + item.b ASC ``` #### DISTINCT Removes duplicate rows from `RETURN` or `WITH`. ```cypher UNWIND [1, 1, 2, 2] AS i RETURN DISTINCT i // [{ i: 1 }, { i: 2 }] ``` ### Bindings (`LET` / `UPDATE` / `MERGE INTO`) A **binding** is a named, mutable value that persists across statements in a multi-statement query. Bindings live in a flat per-query namespace, are introduced with `LET`, wholesale-replaced with `UPDATE`, row-filtered with `UPDATE … AS x DELETE WHERE …`, and per-row upserted/merged with `MERGE INTO … USING … ON … WHEN …`. Once bound, the value can be referenced anywhere an expression is allowed (e.g. as the source of `LOAD JSON FROM`, `UNWIND`, `MERGE INTO`'s `USING`, or directly inside an expression). #### LET `LET name = ` introduces a new binding. The right-hand side can be any expression or a braced sub-query whose final `RETURN` provides the value. ```cypher LET data = [{id: 1, name: 'Alice'}, {id: 2, name: 'Bob'}]; LET threshold = 10; LET fresh = { UNWIND [1, 2, 3] AS n RETURN n AS n }; LOAD JSON FROM data AS d WITH d WHERE d.id >= threshold OR d.id <= 2 RETURN d.id AS id, d.name AS name ``` `LET` fails if the binding already exists - use `UPDATE` to overwrite. #### UPDATE `UPDATE name = ` replaces the value of an existing binding wholesale. Works for any value (scalars, maps, arrays). ```cypher LET counter = 0; UPDATE counter = counter + 1; RETURN counter AS counter // [{ counter: 1 }] ``` `UPDATE` fails if the binding doesn't exist - use `LET` first. #### UPDATE ... AS alias DELETE WHERE ... Filters rows out of an array binding in place. The alias names each row during predicate evaluation. ```cypher LET users = [ {id: 1, name: 'Alice', expired: false}, {id: 2, name: 'Bob', expired: true} ]; UPDATE users AS u DELETE WHERE u.expired; LOAD JSON FROM users AS u RETURN u.id AS id, u.name AS name // [{ id: 1, name: 'Alice' }] ``` #### MERGE INTO ... USING ... ON ... WHEN ... SQL-style keyed merge. For each row of the source, find matching rows in the target (by key or predicate) and apply per-row branches: ``` MERGE INTO target [AS t] USING [AS s] ON | [WHEN MATCHED THEN UPDATE SET ] [WHEN MATCHED THEN DELETE] [WHEN NOT MATCHED THEN INSERT []] ``` ```cypher // Upsert by key: replace listed fields on matches; append unmatched source rows LET users = [{id: 1, name: 'Alice'}, {id: 2, name: 'Bob'}]; MERGE INTO users USING [{id: 2, name: 'Bobby'}, {id: 3, name: 'Charlie'}] ON id WHEN MATCHED THEN UPDATE SET .id, .name WHEN NOT MATCHED THEN INSERT; // users → [{id:1,name:'Alice'}, {id:2,name:'Bobby'}, {id:3,name:'Charlie'}] // Composite key MERGE INTO rows USING incoming ON (tenant, id) WHEN MATCHED THEN UPDATE SET .v WHEN NOT MATCHED THEN INSERT; // Per-row expressions across target (u) and source (s) aliases MERGE INTO users AS u USING incoming AS s ON id WHEN MATCHED THEN UPDATE SET .name = s.name + ' (' + u.name + ')' WHEN NOT MATCHED THEN INSERT {id: s.id, name: 'New: ' + s.name}; // Predicate-based join MERGE INTO users AS u USING incoming AS s ON u.tenant = s.tenant AND u.email = s.email WHEN MATCHED THEN UPDATE SET .v = s.v WHEN NOT MATCHED THEN INSERT; // Tombstone delete: rows in target that also appear in source are removed MERGE INTO users USING [{id: 2}, {id: 3}] ON id WHEN MATCHED THEN DELETE; ``` Notes: - The source may be any expression (array literal, binding name) or a braced sub-query. When the source is a bare binding name, give it an alias: `USING incoming AS s`. - `ON id` is shorthand for the equality predicate `t.id = s.id`; `ON (a, b)` requires equality on every listed key. Anything else is treated as a Boolean predicate evaluated per `(target, source)` pair. - `WHEN MATCHED THEN UPDATE SET .field` overwrites only the listed fields, preserving the rest from the existing row. `SET .field = expr` evaluates `expr` per matched pair, with target and source aliases in scope. - `WHEN NOT MATCHED THEN INSERT` (no row expression) appends the source row as-is. `INSERT { … }` appends an explicit row expression instead. - A `MERGE INTO` must declare at least one `WHEN` clause. Branches are independent - omit `WHEN NOT MATCHED` to skip insertion, omit `WHEN MATCHED` to skip updates/deletes. ### Expressions #### Arithmetic `+`, `-`, `*`, `/`, `^` (power), `%` (modulo). Standard precedence applies; use parentheses to override. ```cypher RETURN 2 + 3 * 4 AS result // 14 RETURN (2 + 3) * 4 AS result // 20 ``` #### String Concatenation The `+` operator concatenates strings. ```cypher RETURN "hello" + " world" AS result // "hello world" ``` #### List Concatenation The `+` operator concatenates lists. ```cypher RETURN [1, 2] + [3, 4] AS result // [1, 2, 3, 4] ``` #### Negative Numbers ```cypher RETURN -1 AS num // -1 ``` #### Associative Arrays (Maps) Create inline maps. Keys can be reserved keywords. ```cypher RETURN {name: "Alice", age: 30} AS person RETURN {return: 1}.return AS aa // 1 ``` #### Property Access Dot notation or bracket notation for nested lookups. Bracket notation supports range slicing. ```cypher person.name person["name"] numbers[0:3] // first 3 elements numbers[:-2] // all but last 2 numbers[2:-2] // slice from index 2, excluding last 2 numbers[:] // full copy ``` #### F-Strings Python-style formatted strings with embedded expressions. ```cypher WITH "world" AS w RETURN f"hello {w}" AS greeting // Escape braces with double braces: {{ and }} RETURN f"literal {{braces}}" AS result // "literal {braces}" ``` #### CASE Expression ```cypher RETURN CASE WHEN num > 1 THEN num ELSE null END AS ret ``` #### Equality as Expression `=` and `<>` return `1` (true) or `0` (false) when used in RETURN. ```cypher RETURN i=5 AS isEqual, i<>5 AS isNotEqual ``` ### List Comprehensions Filter and/or transform lists inline. ```cypher // Map: [variable IN list | expression] RETURN [n IN [1, 2, 3] | n * 2] AS doubled // [2, 4, 6] // Filter: [variable IN list WHERE condition] RETURN [n IN [1, 2, 3, 4, 5] WHERE n > 2] AS filtered // [3, 4, 5] // Filter + Map: [variable IN list WHERE condition | expression] RETURN [n IN [1, 2, 3, 4] WHERE n > 1 | n ^ 2] AS result // [4, 9, 16] // Identity (copy): [variable IN list] RETURN [n IN [10, 20, 30]] AS result // [10, 20, 30] ``` ### Predicate Functions (Inline Aggregation) Aggregate over a list expression with optional filtering. ```cypher // sum(variable IN list | expression WHERE condition) RETURN sum(n IN [1, 2, 3] | n WHERE n > 1) AS sum // 5 RETURN sum(n IN [1, 2, 3] | n) AS sum // 6 RETURN sum(n IN [1+2+3, 2, 3] | n^2) AS sum // 49 ``` ### Boolean Predicate Functions Test list elements against a condition. Follow standard Cypher syntax. ```cypher // any - true if at least one element matches RETURN any(n IN [1, 2, 3] WHERE n > 2) // true // all - true if every element matches RETURN all(n IN [2, 4, 6] WHERE n > 0) // true // none - true if no element matches RETURN none(n IN [1, 2, 3] WHERE n > 5) // true // single - true if exactly one element matches RETURN single(n IN [1, 2, 3] WHERE n > 2) // true // In a WHERE clause UNWIND [[1,2,3], [4,5,6]] AS nums WITH nums WHERE any(n IN nums WHERE n > 4) RETURN nums // [4, 5, 6] ``` ### Aggregate Functions Used in `RETURN` or `WITH` to group and reduce rows. Non-aggregated expressions define grouping keys. Aggregate functions cannot be nested. | Function | Description | | ------------------------ | ------------------------------------------------------------------- | | `sum(expr)` | Sum of values. Returns `0` for empty input, `null` for null input. | | `avg(expr)` | Average. Returns `null` for null input. | | `count(expr)` | Count of rows. | | `count(DISTINCT expr)` | Count of unique values. | | `min(expr)` | Minimum value (numbers or strings). | | `max(expr)` | Maximum value (numbers or strings). | | `collect(expr)` | Collects values into a list. | | `collect(DISTINCT expr)` | Collects unique values. Works with primitives, arrays, and objects. | ```cypher UNWIND [1, 1, 2, 2] AS i UNWIND [1, 2, 3, 4] AS j RETURN i, sum(j) AS sum, avg(j) AS avg // [{ i: 1, sum: 20, avg: 2.5 }, { i: 2, sum: 20, avg: 2.5 }] UNWIND ["a", "b", "a", "c"] AS s RETURN count(DISTINCT s) AS cnt // 3 ``` ### Scalar Functions | Function | Description | Example | | ------------------------------ | -------------------------------------- | -------------------------------------- | | `size(list)` | Length of list or string | `size([1,2,3])` → `3` | | `range(start, end)` | Inclusive integer range | `range(1,3)` → `[1,2,3]` | | `round(n)` | Round to nearest integer | `round(3.7)` → `4` | | `rand()` | Random float 0–1 | `round(rand()*10)` | | `log(n)` | Natural logarithm (base e) | `log(10)` → `2.302...` | | `log10(n)` | Base-10 logarithm | `log10(1000)` → `3` | | `pow(base, exp)` | Base raised to the power of exponent | `pow(2, 3)` → `8` | | `split(str, delim)` | Split string into list | `split("a,b",",")` → `["a","b"]` | | `join(list, delim)` | Join list into string | `join(["a","b"],",")` → `"a,b"` | | `replace(str, from, to)` | Replace all occurrences | `replace("hello","l","x")` → `"hexxo"` | | `toLower(str)` | Lowercase | `toLower("Hello")` → `"hello"` | | `trim(str)` | Strip whitespace | `trim(" hi ")` → `"hi"` | | `substring(str, start[, len])` | Extract substring | `substring("hello",1,3)` → `"ell"` | | `toString(val)` | Convert to string | `toString(42)` → `"42"` | | `toInteger(val)` | Convert to integer | `toInteger("42")` → `42` | | `toFloat(val)` | Convert to float | `toFloat("3.14")` → `3.14` | | `tojson(str)` | Parse JSON string to object | `tojson('{"a":1}')` → `{a: 1}` | | `stringify(obj)` | Pretty-print object as JSON | `stringify({a:1})` | | `string_distance(a, b)` | Normalized Levenshtein distance (0–1) | `string_distance("kitten","sitting")` | | `keys(obj)` | Keys of a map | `keys({a:1,b:2})` → `["a","b"]` | | `properties(node_or_map)` | Properties of a node or map | `properties(n)` | | `type(val)` | Type name string | `type(123)` → `"number"` | | `coalesce(val, ...)` | First non-null argument | `coalesce(null, 42)` → `42` | | `head(list)` | First element | `head([1,2,3])` → `1` | | `tail(list)` | All but first element | `tail([1,2,3])` → `[2,3]` | | `last(list)` | Last element | `last([1,2,3])` → `3` | | `id(node_or_rel)` | ID of a node or type of a relationship | `id(n)` | | `elementId(node)` | String ID of a node | `elementId(n)` → `"1"` | | `labels(node)` | Labels of a node as an array | `labels(n)` → `["Person"]` | All scalar functions propagate `null`: if the primary input is `null`, the result is `null`. ### Temporal Functions | Function | Description | | ------------------------------------------------- | ------------------------------------------------------------------ | | `datetime()` | Current UTC datetime object | | `datetime(str)` | Parse ISO 8601 string (e.g. `'2025-06-15T12:30:45.123Z'`) | | `datetime({year, month, day, hour, minute, ...})` | Construct from map | | `date()` / `date(str)` / `date({...})` | Date only (no time fields) | | `time()` | Current UTC time | | `localtime()` | Current local time | | `localdatetime()` / `localdatetime(str)` | Current or parsed local datetime | | `timestamp()` | Current epoch milliseconds (number) | | `duration(str)` | Parse ISO 8601 duration (`'P1Y2M3DT4H5M6S'`, `'P2W'`, `'PT2H30M'`) | | `duration({days, hours, ...})` | Construct duration from map | **Datetime properties:** `year`, `month`, `day`, `hour`, `minute`, `second`, `millisecond`, `epochMillis`, `epochSeconds`, `dayOfWeek` (1=Mon, 7=Sun), `dayOfYear`, `quarter`, `formatted`. **Date properties:** `year`, `month`, `day`, `epochMillis`, `dayOfWeek`, `dayOfYear`, `quarter`, `formatted`. **Duration properties:** `years`, `months`, `weeks`, `days`, `hours`, `minutes`, `seconds`, `totalMonths`, `totalDays`, `totalSeconds`, `formatted`. ```cypher WITH datetime() AS now RETURN now.year AS year, now.quarter AS q RETURN date('2025-06-15').dayOfWeek AS dow // 7 (Sunday) RETURN duration('P2W').days AS d // 14 ``` ### Graph Operations #### CREATE VIRTUAL (Nodes) Defines a virtual node label backed by a sub-query. ```cypher CREATE VIRTUAL (:Person) AS { UNWIND [{id: 1, name: 'Alice'}, {id: 2, name: 'Bob'}] AS record RETURN record.id AS id, record.name AS name } ``` #### CREATE VIRTUAL (Relationships) Defines a virtual relationship type between two node labels. Must return `left_id` and `right_id`. ```cypher CREATE VIRTUAL (:Person)-[:KNOWS]-(:Person) AS { UNWIND [{left_id: 1, right_id: 2}] AS record RETURN record.left_id AS left_id, record.right_id AS right_id } ``` #### DELETE VIRTUAL Removes a virtual node or relationship definition. ```cypher DELETE VIRTUAL (:Person) DELETE VIRTUAL (:Person)-[:KNOWS]-(:Person) ``` #### MATCH Queries virtual graph data. Supports property constraints, `WHERE` clauses, and relationship traversal. ```cypher MATCH (n:Person) RETURN n.name AS name MATCH (n:Person {name: 'Alice'}) RETURN n MATCH (a:Person)-[:KNOWS]->(b:Person) RETURN a.name, b.name ``` **Unlabeled node matching:** Omit the label to match all nodes in the graph. ```cypher MATCH (n) RETURN n // all nodes MATCH (n {name: 'Alice'}) RETURN n // all nodes with name='Alice' ``` **ORed node labels:** Match nodes with any of the specified labels. ```cypher MATCH (n:Person|Animal) RETURN n, labels(n) AS lbls ``` **Leftward direction:** `<-[:TYPE]-` reverses traversal direction. ```cypher MATCH (m:Person)<-[:REPORTS_TO]-(e:Person) RETURN m.name AS manager, e.name AS employee ``` **Variable-length relationships:** `*`, `*0..3`, `*1..`, `*2..` ```cypher MATCH (a:Person)-[:KNOWS*]->(b:Person) RETURN a.name, b.name // 0+ hops MATCH (a:Person)-[:KNOWS*1..]->(b:Person) RETURN a.name, b.name // 1+ hops MATCH (a:Person)-[:KNOWS*0..3]->(b:Person) RETURN a.name, b.name // 0–3 hops ``` **ORed relationship types:** ```cypher MATCH (a)-[:KNOWS|FOLLOWS]->(b) RETURN a.name, b.name ``` **Pattern variable:** Capture the full path as a variable. ```cypher MATCH p=(:Person)-[:KNOWS]-(:Person) RETURN p AS pattern ``` **Pattern in WHERE:** Check existence of a relationship in a WHERE clause. ```cypher MATCH (a:Person), (b:Person) WHERE (a)-[:KNOWS]->(b) RETURN a.name, b.name MATCH (a:Person) WHERE NOT (a)-[:KNOWS]->(:Person) RETURN a.name ``` **Subquery Expressions:** `EXISTS`, `COUNT`, and `COLLECT` evaluate a full subquery as an expression. The subquery can reference outer-scope variables and supports the complete FlowQuery pipeline (MATCH, WITH, WHERE, UNWIND, LOAD, etc.). ```cypher // EXISTS - returns true if the subquery produces any rows MATCH (p:Person) WHERE EXISTS { MATCH (p)-[:KNOWS]->(friend:Person) WHERE friend.age > 30 } RETURN p.name // NOT EXISTS - negate with NOT MATCH (p:Person) WHERE NOT EXISTS { MATCH (p)-[:KNOWS]->(:Person) } RETURN p.name // COUNT - returns the number of rows the subquery produces MATCH (p:Person) WHERE COUNT { MATCH (p)-[:KNOWS]->(:Person) } > 2 RETURN p.name // COUNT in RETURN MATCH (p:Person) RETURN p.name, COUNT { MATCH (p)-[:KNOWS]->(:Person) } AS friendCount // COLLECT - returns a list of single-column values from the subquery MATCH (p:Person) RETURN COLLECT { MATCH (p)-[:KNOWS]->(friend:Person) RETURN friend.name } AS friends // COLLECT with IN MATCH (p:Person) WHERE 'Alice' IN COLLECT { MATCH (p)-[:KNOWS]->(f:Person) RETURN f.name } RETURN p.name ``` **Node reference reuse across MATCH clauses:** ```cypher MATCH (a:Person)-[:KNOWS]-(b:Person) MATCH (b)-[:KNOWS]-(c:Person) RETURN a.name, b.name, c.name ``` #### OPTIONAL MATCH Like `MATCH` but returns `null` for unmatched nodes instead of dropping the row. Property access on `null` nodes returns `null`. ```cypher MATCH (a:Person) OPTIONAL MATCH (a)-[:KNOWS]->(b:Person) RETURN a.name AS name, b.name AS friend // Persons without KNOWS relationships get friend=null ``` Chained optional matches propagate `null`: ```cypher OPTIONAL MATCH (u)-[:REPORTS_TO]->(m1:Employee) OPTIONAL MATCH (m1)-[:REPORTS_TO]->(m2:Employee) // If m1 is null, m2 is also null ``` #### Graph Utility Functions | Function | Description | | ------------------------- | -------------------------------------------------------------------------------- | | `nodes(path)` | List of nodes in a path | | `relationships(path)` | List of relationships in a path | | `properties(node_or_rel)` | Properties map (excludes `id` for nodes, `left_id`/`right_id` for relationships) | | `schema()` | Introspect registered virtual node labels and relationship types | ```cypher MATCH p=(:City)-[:CONNECTED_TO]-(:City) RETURN nodes(p) AS cities, relationships(p) AS rels CALL schema() YIELD kind, label, type, from_label, to_label, properties, sample RETURN kind, label, properties ``` #### Filter Pass-Down (Parameter References) Virtual node/relationship definitions can reference `$paramName` or `$args.paramName` to receive filter values from `MATCH` constraints and `WHERE` equality predicates. This enables dynamic data loading (e.g., API calls parameterized by match constraints). ```cypher CREATE VIRTUAL (:Todo) AS { LOAD JSON FROM f"https://api.example.com/todos/{coalesce($id, 1)}" AS todo RETURN todo.id AS id, todo.title AS title } // $id receives the value 3 from the constraint MATCH (t:Todo {id: 3}) RETURN t.title // Also extracted from WHERE equality MATCH (t:Todo) WHERE t.id = 3 RETURN t.title ``` `$`-prefixed identifiers are **only** allowed inside virtual definitions. Non-equality operators in `WHERE` (`>`, `<`, `CONTAINS`, etc.) are **not** extracted as pass-down parameters. `OR` predicates are also not extracted. ### Reserved Keywords as Identifiers Reserved words (`return`, `with`, `from`, `to`, etc.) can be used as: - Variable aliases: `WITH 1 AS return RETURN return` - Property keys: `data.from`, `data.to` - Map keys: `{return: 1}` - Node labels and relationship types: `(:Return)-[:With]->()` ### Introspection Discover all registered functions (built-in and custom): ```cypher WITH functions() AS funcs UNWIND funcs AS f RETURN f.name, f.description, f.category ``` --- ## Quick Cheat Sheet ``` ┌─────────────────────────────────────────────────────────────┐ │ CLAUSE SYNTAX │ ├─────────────────────────────────────────────────────────────┤ │ RETURN expr [AS alias], ... [WHERE cond] │ │ │ [ORDER BY expr [ASC|DESC], ...] [LIMIT n] │ │ WITH expr [AS alias], ... [WHERE cond] │ │ UNWIND list AS var │ │ LOAD JSON FROM url [HEADERS {...}] [POST {...}] AS alias │ │ CALL func() [YIELD field, ...] │ │ query1 UNION [ALL] query2 │ │ stmt1; stmt2; ... stmtN -- multi-statement │ │ LIMIT n │ ├─────────────────────────────────────────────────────────────┤ │ BINDINGS │ ├─────────────────────────────────────────────────────────────┤ │ LET name = expr | { subquery } -- new binding │ │ UPDATE name = expr | { subquery } -- replace existing │ │ UPDATE name AS x DELETE WHERE cond -- row-filter binding │ │ MERGE INTO target [AS t] │ │ USING [AS s] │ │ ON key | (k1,k2,...) | predicate │ │ [WHEN MATCHED THEN UPDATE SET .f [= expr], ...] │ │ [WHEN MATCHED THEN DELETE] │ │ [WHEN NOT MATCHED THEN INSERT [ {row-expr} ]] │ ├─────────────────────────────────────────────────────────────┤ │ GRAPH OPERATIONS │ ├─────────────────────────────────────────────────────────────┤ │ CREATE VIRTUAL (:Label) AS { subquery } │ │ CREATE VIRTUAL (:L1)-[:TYPE]-(:L2) AS { subquery } │ │ DELETE VIRTUAL (:Label) │ │ DELETE VIRTUAL (:L1)-[:TYPE]-(:L2) │ │ MATCH (n:Label {prop: val}), ... [WHERE cond] │ │ MATCH (n) ... -- unlabeled (all) │ │ MATCH (n:L1|L2) ... -- ORed node labels │ │ MATCH (a)-[:TYPE]->(b) -- rightward │ │ MATCH (a)<-[:TYPE]-(b) -- leftward │ │ MATCH (a)-[:TYPE*0..3]->(b) -- variable length │ │ MATCH (a)-[:T1|T2]->(b) -- ORed types │ │ MATCH p=(a)-[:TYPE]->(b) -- pattern variable │ │ OPTIONAL MATCH (a)-[:TYPE]->(b) -- null if no match │ ├─────────────────────────────────────────────────────────────┤ │ WHERE OPERATORS │ ├─────────────────────────────────────────────────────────────┤ │ = <> > >= < <= │ │ AND OR NOT │ │ IS NULL · IS NOT NULL │ │ IN [...] · NOT IN [...] │ │ CONTAINS · NOT CONTAINS │ │ STARTS WITH · NOT STARTS WITH │ │ ENDS WITH · NOT ENDS WITH │ │ EXISTS { subquery } · NOT EXISTS { subquery } │ │ COUNT { subquery } · COLLECT { subquery } │ ├─────────────────────────────────────────────────────────────┤ │ EXPRESSIONS │ ├─────────────────────────────────────────────────────────────┤ │ + - * / ^ % -- arithmetic │ │ "str" + "str" -- string concat │ │ [1,2] + [3,4] -- list concat │ │ f"hello {expr}" -- f-string │ │ {key: val, ...} -- map literal │ │ obj.key · obj["key"] -- property access │ │ list[0:3] · list[:-2] -- slicing │ │ CASE WHEN cond THEN v ELSE v END -- conditional │ │ DISTINCT -- deduplicate │ ├─────────────────────────────────────────────────────────────┤ │ LIST COMPREHENSIONS │ ├─────────────────────────────────────────────────────────────┤ │ [x IN list | expr] -- map │ │ [x IN list WHERE cond] -- filter │ │ [x IN list WHERE cond | expr] -- filter + map │ ├─────────────────────────────────────────────────────────────┤ │ AGGREGATE FUNCTIONS │ ├─────────────────────────────────────────────────────────────┤ │ sum(x) avg(x) count(x) min(x) max(x) collect(x) │ │ count(DISTINCT x) · collect(DISTINCT x) │ │ sum(v IN list | expr [WHERE cond]) -- inline predicate │ │ any(v IN list WHERE cond) -- true if any match │ │ all(v IN list WHERE cond) -- true if all match │ │ none(v IN list WHERE cond) -- true if none match │ │ single(v IN list WHERE cond) -- true if one match │ ├─────────────────────────────────────────────────────────────┤ │ SCALAR FUNCTIONS │ ├─────────────────────────────────────────────────────────────┤ │ size range round rand split join replace │ │ log log10 pow │ │ toLower trim substring toString toInteger toFloat │ │ tojson stringify string_distance keys properties │ │ type coalesce head tail last id elementId labels │ ├─────────────────────────────────────────────────────────────┤ │ TEMPORAL FUNCTIONS │ ├─────────────────────────────────────────────────────────────┤ │ datetime() date() time() localtime() localdatetime() │ │ timestamp() duration() │ │ Properties: .year .month .day .hour .minute .second │ │ .millisecond .epochMillis .dayOfWeek .quarter .formatted │ ├─────────────────────────────────────────────────────────────┤ │ GRAPH FUNCTIONS │ ├─────────────────────────────────────────────────────────────┤ │ nodes(path) relationships(path) properties(node) │ │ schema() functions() │ ├─────────────────────────────────────────────────────────────┤ │ PARAMETER PASS-DOWN (inside virtual definitions only) │ ├─────────────────────────────────────────────────────────────┤ │ $paramName · $args.paramName │ │ coalesce($id, defaultValue) -- with fallback │ └─────────────────────────────────────────────────────────────┘ ``` --- ## Extending FlowQuery with Custom Functions FlowQuery supports extending its functionality with custom functions using the `@FunctionDef` decorator. You can create scalar functions, aggregate functions, predicate functions, and async data providers. ### Creating a Custom Scalar Function Scalar functions operate on individual values and return a result: ```typescript import { Function, FunctionDef } from "flowquery/extensibility"; @FunctionDef({ description: "Doubles a number", category: "scalar", parameters: [{ name: "value", description: "Number to double", type: "number" }], output: { description: "Doubled value", type: "number" }, }) class Double extends Function { constructor() { super("double"); this._expectedParameterCount = 1; } public value(): number { return this.getChildren()[0].value() * 2; } } ``` Once defined, use it in your queries: ```cypher WITH 5 AS num RETURN double(num) AS result // Returns: [{ result: 10 }] ``` ### Creating a Custom String Function ```typescript import { Function, FunctionDef } from "flowquery/extensibility"; @FunctionDef({ description: "Reverses a string", category: "scalar", parameters: [{ name: "text", description: "String to reverse", type: "string" }], output: { description: "Reversed string", type: "string" }, }) class StrReverse extends Function { constructor() { super("strreverse"); this._expectedParameterCount = 1; } public value(): string { const input = String(this.getChildren()[0].value()); return input.split("").reverse().join(""); } } ``` Usage: ```cypher WITH 'hello' AS s RETURN strreverse(s) AS reversed // Returns: [{ reversed: 'olleh' }] ``` ### Creating a Custom Aggregate Function Aggregate functions process multiple values and return a single result. They require a `ReducerElement` to track state: ```typescript import { AggregateFunction, FunctionDef, ReducerElement } from "flowquery/extensibility"; class ProductElement extends ReducerElement { private _value: number = 1; public get value(): number { return this._value; } public set value(v: number) { this._value *= v; } } @FunctionDef({ description: "Calculates the product of values", category: "aggregate", parameters: [{ name: "value", description: "Number to multiply", type: "number" }], output: { description: "Product of all values", type: "number" }, }) class Product extends AggregateFunction { constructor() { super("product"); this._expectedParameterCount = 1; } public reduce(element: ReducerElement): void { element.value = this.firstChild().value(); } public element(): ReducerElement { return new ProductElement(); } } ``` Usage: ```cypher UNWIND [2, 3, 4] AS num RETURN product(num) AS result // Returns: [{ result: 24 }] ``` ### Creating a Custom Async Data Provider Async providers allow you to create custom data sources that can be used with `LOAD JSON FROM`: ```typescript import { AsyncFunction, FunctionDef } from "flowquery/extensibility"; @FunctionDef({ description: "Provides example data for testing", category: "async", parameters: [], output: { description: "Example data object", type: "object" }, }) class GetExampleData extends AsyncFunction { async *generate(): AsyncGenerator { yield { id: 1, name: "Alice" }; yield { id: 2, name: "Bob" }; } } ``` Usage: ```cypher LOAD JSON FROM getExampleData() AS data RETURN data.id AS id, data.name AS name // Returns: [{ id: 1, name: "Alice" }, { id: 2, name: "Bob" }] ``` ### Using Custom Functions with Expressions Custom functions integrate seamlessly with FlowQuery expressions and can be combined with other functions: ```cypher // Using custom function with expressions WITH 5 * 3 AS num RETURN addhundred(num) + 1 AS result // Using multiple custom functions together WITH 2 AS num RETURN triple(num) AS tripled, square(num) AS squared ``` ### Introspecting Registered Functions You can use the built-in `functions()` function to discover registered functions including your custom ones: ```cypher WITH functions() AS funcs UNWIND funcs AS f WITH f WHERE f.name = 'double' RETURN f.name AS name, f.description AS description, f.category AS category ``` ## Examples ### Virtual Org Chart This single multi-statement query creates a virtual graph for a fictitious company - complete with employees, skills, phone numbers, and a management chain - then queries it to produce an org chart. [Try live!](https://microsoft.github.io/FlowQuery/?rZXPbtNAEMbFhUMOReXAeW4OwkH521ZBQgolrVQlSogDQlRVtdjTeKm9a603Bavqw_AAPEVfDO16s_EmEVElfPHuzGf7N9-M7UN4BfX-MM0SXiC-hkEA9zUAgJdwqc_quKdRH1o-MJJiH7yACAKnMTLP1-kf_PucykSlTocTE6weEWZEyBSZ7IM3_IXhUtI71MIs5kxd-KbV6PV6jWar2fR8yG9pkuR9uPQCKYjEReH54I2QRCjymGZqd0YZYSF6Vw--C9q2oGMiwmUOM3qHgnh-FfTLFPgNDNmCMkRB2ULTuKBOchO03Ww5oAMRxlRiKJdCVeadJnwZqcUYmeT6JlukHUs6FbQgMCUSEw2yRToVPFqG0prrkLrJTdLOBumMkyglWVbW5Q0YSQpJw1xtPn_dpuxayguSYg4fBOe3Sl2lDJBRLqyhBuVJfrYdynmRYRAKmqmyvGkhY67mzTsXJIs_jbYxexZzRBl8i4np6f_G7LhmLnMNGBS5xFR7-BHvJlm-DXhkAQepeoMueMxyXVQF0LQSxoSRxW7Afd12fQxCsUw1FpEEdLNzqjkDSW4x5kmEAsaLVG4TH1viOU8ff-dwTkT4-Ids9p7fyJ9EoGvrk0ztuqYi0fVtDMEcc7nzPTqxnGdE0pTAIGkEpMDI9XYX55Moew7leibHI2vwlGaYUIaq_xrySn1SBYZcRHp_YDZvaaQyNPJXAVWCCqmzDa7wVWK1XlcPsNKt61DK9c7eSFejcnphw2U5Kl6uag_vaofuP6Fx2Z8Np5PZPLieT64a-_8WCd7Ia_MlFnQRl5tWtWtW0tkv6VYl7Z2S3n7JUVXS2Sk5rkq6OyUnruTfHTZXqbRZWttXN9GXmrWy_gXUce1v7RnUt_x_X08XoqI50A_FcnzWxyAANBrfKOwsWYVcjxNWR8ikK2NkNOUUVR9SjpNJm2mqpMtImU8XwqXUvmVcyHzOa88dBN9U9Bc) ```cypher CREATE VIRTUAL (:Employee) AS { UNWIND [ {id: 1, name: 'Sara Chen', jobTitle: 'CEO', department: 'Executive', phone: '+1-555-0100', skills: ['Strategy', 'Leadership', 'Finance']}, {id: 2, name: 'Marcus Rivera', jobTitle: 'VP of Engineering', department: 'Engineering', phone: '+1-555-0201', skills: ['Architecture', 'Cloud', 'Mentoring']}, {id: 3, name: 'Priya Patel', jobTitle: 'VP of Product', department: 'Product', phone: '+1-555-0301', skills: ['Roadmapping', 'Analytics', 'UX']}, {id: 4, name: 'James Brooks', jobTitle: 'Senior Engineer', department: 'Engineering', phone: '+1-555-0202', skills: ['TypeScript', 'Python', 'GraphQL']}, {id: 5, name: 'Lin Zhang', jobTitle: 'Senior Engineer', department: 'Engineering', phone: '+1-555-0203', skills: ['Rust', 'Systems', 'DevOps']}, {id: 6, name: 'Amara Johnson', jobTitle: 'Product Manager', department: 'Product', phone: '+1-555-0302', skills: ['Scrum', 'Data Analysis', 'Stakeholder Mgmt']}, {id: 7, name: 'Tomás García', jobTitle: 'Software Engineer', department: 'Engineering', phone: '+1-555-0204', skills: ['React', 'TypeScript', 'Testing']}, {id: 8, name: 'Fatima Al-Sayed', jobTitle: 'Software Engineer', department: 'Engineering', phone: '+1-555-0205', skills: ['Python', 'ML', 'Data Pipelines']} ] AS record RETURN record.id AS id, record.name AS name, record.jobTitle AS jobTitle, record.department AS department, record.phone AS phone, record.skills AS skills }; CREATE VIRTUAL (:Employee)-[:REPORTS_TO]-(:Employee) AS { UNWIND [ {left_id: 2, right_id: 1}, {left_id: 3, right_id: 1}, {left_id: 4, right_id: 2}, {left_id: 5, right_id: 2}, {left_id: 6, right_id: 3}, {left_id: 7, right_id: 4}, {left_id: 8, right_id: 4} ] AS record RETURN record.left_id AS left_id, record.right_id AS right_id }; MATCH (e:Employee) OPTIONAL MATCH (e)-[:REPORTS_TO]->(mgr:Employee) RETURN e.name AS employee, e.jobTitle AS title, e.department AS department, e.phone AS phone, e.skills AS skills, mgr.name AS reportsTo ORDER BY e.department, e.name ``` Output: | employee | title | department | phone | skills | reportsTo | | --------------- | ----------------- | ----------- | ----------- | ---------------------------------------- | ------------- | | Fatima Al-Sayed | Software Engineer | Engineering | +1-555-0205 | [Python, ML, Data Pipelines] | James Brooks | | James Brooks | Senior Engineer | Engineering | +1-555-0202 | [TypeScript, Python, GraphQL] | Marcus Rivera | | Lin Zhang | Senior Engineer | Engineering | +1-555-0203 | [Rust, Systems, DevOps] | Marcus Rivera | | Marcus Rivera | VP of Engineering | Engineering | +1-555-0201 | [Architecture, Cloud, Mentoring] | Sara Chen | | Tomás García | Software Engineer | Engineering | +1-555-0204 | [React, TypeScript, Testing] | James Brooks | | Sara Chen | CEO | Executive | +1-555-0100 | [Strategy, Leadership, Finance] | null | | Amara Johnson | Product Manager | Product | +1-555-0302 | [Scrum, Data Analysis, Stakeholder Mgmt] | Priya Patel | | Priya Patel | VP of Product | Product | +1-555-0301 | [Roadmapping, Analytics, UX] | Sara Chen | You can further explore the graph - for example, find the full management chain from any employee up to the CEO: ```cypher MATCH (e:Employee)-[:REPORTS_TO*1..]->(mgr:Employee) WHERE e.name = 'Tomás García' RETURN e.name AS employee, collect(mgr.name) AS managementChain // [{ employee: "Tomás García", managementChain: ["James Brooks", "Marcus Rivera", "Sara Chen"] }] ``` Or find each manager's direct reports: ```cypher MATCH (dr:Employee)-[:REPORTS_TO]->(mgr:Employee) RETURN mgr.name AS manager, collect(dr.name) AS directReports ORDER BY manager ``` Or find all employees who share a skill: ```cypher MATCH (a:Employee), (b:Employee) WHERE a.id < b.id WITH a, b, [s IN a.skills WHERE s IN b.skills] AS shared WHERE size(shared) > 0 RETURN a.name AS employee1, b.name AS employee2, shared AS sharedSkills ORDER BY size(shared) DESC ``` ### Virtual Country Borders Graph This example pulls live data from the public [REST Countries](https://restcountries.com/) API and projects it into a `(:Country)-[:BORDERS]-(:Country)` virtual graph in a single semicolon-chained query, then ranks European countries by how many direct neighbors they have. [Try live!](https://microsoft.github.io/FlowQuery/?nZBRa4MwFIXf8yvumx1YXelbRxlWM-qwCtGujDGKxrQVrJFoB2Xsv49Eq3Ure1iekuvxnu8cD0dA-amoRcYqmMMnAgDwAsuB5zDw4YkEK9AOdV1WM9MUrKo7tUH50fyYGhMzzvPHXcbytJoX8ZHplMZTXbB9xgu95OUpj2t5TbhImag0sEKgyofgaE18oIb8A9pjhZClOoL-UEOulX5HXsjvymWoaOy6Da37UNOzKM0V2lDXgl52tU_09YCQTbAVYXhxSbS2PBjNbFXH-U4Kb5bXt_s7dpa2aduIXbY-UJ9lGGDA_zfa-G22CIiDSfg-_h_w2t-4vnPVjGzlVpSc7eqtzJMo8Gx_kE-Ft7Iiewmj-DZW0o3RZokJhvhSwBw0fBK8ZBpqzeKuqwb1rDeXUaJCFSzbHxIutmqIlAUsXn_MwcGhjTx35UYwuf8G) A single shared `LET` binding fetches the data once. Both virtuals (the `(:Country)` nodes and the `(:Country)-[:BORDERS]-(:Country)` relationships) are projected from the same binding, so there is exactly one HTTP round-trip per Runner invocation. Add `REFRESH EVERY 1 HOUR` to the `LET` if you want the cache to auto-refresh on a schedule. ```cypher LET countries = { LOAD JSON FROM 'https://restcountries.com/v3.1/all?fields=name,cca3,region,population,borders' AS c RETURN c.cca3 AS id, c.name.common AS name, c.region AS region, c.population AS population, c.borders AS borders }; CREATE VIRTUAL (:Country) AS { LOAD JSON FROM countries AS c RETURN c.id AS id, c.name AS name, c.region AS region, c.population AS population }; CREATE VIRTUAL (:Country)-[:BORDERS]-(:Country) AS { LOAD JSON FROM countries AS c UNWIND c.borders AS b RETURN c.id AS left_id, b AS right_id }; MATCH (a:Country)-[:BORDERS]-(b:Country) WHERE a.region = 'Europe' RETURN a.name AS country, count(b) AS neighbor_count ORDER BY neighbor_count DESC LIMIT 10 ``` ## Lineage and Provenance FlowQuery exposes two complementary forms of lineage: - **Structural lineage** (`metadata.info`) - what labels, types, properties, and sources the parsed query touches. Available without running the query and with zero runtime overhead. - **Row-level provenance** (`runner.provenance`) - opt-in via `{ provenance: true }`; for every emitted result row, the concrete node ids and relationship hops bound to it, their matched property values, and (for virtual-backed records) the inner sub-query lineage that produced them. Combined, they let you trace every cell of a result back to the source record, the source virtual, and ultimately the URL or call-site that backs it. ### Statement Info: Labels, Properties, and Source Lineage `metadata.info` carries a `StatementInfo` describing the _structure_ the query touches - independent of execution. It captures: - The node labels and relationship types referenced. - The data sources backing the underlying virtual definitions. - The node/relationship properties **consumed** by the query - `alias.prop` accesses anywhere in `MATCH`, `WHERE`, `WITH`, `RETURN`, `ORDER BY`, or function arguments, plus inline pattern properties like `(u:User {id: 'rick.o'})`. - The properties **declared** by each virtual's `RETURN` clause via `info.declared`, so you can validate that a query references only declared properties. - Literal values supplied for properties at the call site via `info.nodes[Label].literal_values` - collected from inline pattern properties and from equality / `IN` predicates such as `WHERE u.id = 'rick.o'` or `WHERE u.id IN ['a', 'b']`. This is useful for governance, lineage UIs, query-cost estimation, schema validation, or routing decisions before the query runs. The same `StatementInfoCrawler` can also be used directly on any parsed AST without going through a `Runner`: ```javascript import { StatementInfoCrawler } from "flowquery"; const crawler = new StatementInfoCrawler(); const info = crawler.crawl(parsedAst); ``` For end-to-end lineage from a property to its data source, use the per-entity `nodes` and `relationships` maps: ```javascript const runner = new FlowQuery(` CREATE VIRTUAL (:City) AS { LOAD JSON FROM "https://example.com/cities" AS c RETURN c.id AS id, c.name AS name, c.country AS country }; CREATE VIRTUAL (:City)-[:FLIGHT]-(:City) AS { LOAD JSON FROM "https://example.com/flights" AS f RETURN f.left_id AS left_id, f.right_id AS right_id, f.airline AS airline }; MATCH (a:City {name: 'NYC'})-[r:FLIGHT]->(b:City) WHERE b.country IN ['US', 'CA'] RETURN a.name AS origin, b.name AS destination, r.airline AS airline `); const { info } = runner.metadata; console.log(info.nodes); // { // City: { // properties: ["country", "name"], // sources: ["https://example.com/cities"], // literal_values: { country: ["US", "CA"], name: ["NYC"] } // } // } console.log(info.relationships); // { // FLIGHT: { // properties: ["airline"], // sources: ["https://example.com/flights"], // literal_values: {} // } // } console.log(info.declared.nodes.City); // { properties: ["country", "id", "name"], sources: ["https://example.com/cities"] } console.log(info.sources); // ["https://example.com/cities", "https://example.com/flights"] ``` `StatementInfo` resolves sources and declared schemas for **any** virtual the query touches - both inline `CREATE VIRTUAL` clauses and previously-registered virtuals reached via `MATCH` or `DELETE`. The flat `node_labels`, `relationship_types`, `sources`, `node_properties`, and `relationship_properties` fields stay in sync with the per-entity `nodes` / `relationships` maps and are convenient for quick aggregate checks. Only purely literal AST subtrees end up in `literal_values` - values that depend on parameters, references, f-strings, or subqueries are skipped. ### Row-level Provenance: Node and Relationship IDs Behind Each Result `StatementInfo` describes the _structural_ lineage of a query - which labels, types, and sources back it. To get the _row-level_ lineage - which concrete node ids and relationship `(left_id, right_id, type)` hops actually flowed into each result row - pass `{ provenance: true }` when constructing the runner and read `runner.provenance`: ```javascript const fq = new FlowQuery( ` MATCH (a:City {name: 'NYC'})-[r:FLIGHT]->(b:City) RETURN a.name AS origin, b.name AS destination `, null, null, { provenance: true } ); await fq.run(); fq.results; // [{ origin: 'NYC', destination: 'LAX' }, { origin: 'NYC', destination: 'YYZ' }] fq.provenance; // [ // { // nodes: [ // { alias: 'a', label: 'City', id: 'nyc' }, // { alias: 'b', label: 'City', id: 'lax' } // ], // relationships: [ // { alias: 'r', type: 'FLIGHT', // hops: [{ left_id: 'nyc', right_id: 'lax', type: 'FLIGHT' }], // path: ['nyc', 'lax'] } // ], // rows: [/* per-input-row segments, see below */] // }, // { nodes: [...], relationships: [{ alias: 'r', type: 'FLIGHT', // hops: [{ left_id: 'nyc', right_id: 'yyz', type: 'FLIGHT' }], // path: ['nyc', 'yyz'] }], rows: [...] } // ] ``` Semantics: - `runner.provenance` is aligned **by index** with `runner.results`. - Each `NodeBinding.id` preserves the original scalar type of the underlying record (a numeric id stays a number). - Anonymous nodes / relationships from the pattern are included with `alias: null`. - Variable-length matches (`[:T*m..n]`) populate `hops` with every traversed edge in path order. - Every `RelationshipBinding` also carries a `path` field listing every visited node id in order: `[hops[0].left_id, hops[0].right_id, hops[1].right_id, …]`. For single-hop matches `path` has exactly two entries; for variable-length matches `path.length === hops.length + 1`. - `OPTIONAL MATCH` misses surface as `id: null` for the unmatched node and an empty `hops: []` for the unmatched relationship. - `ORDER BY` and `LIMIT` permute and truncate `provenance` in lockstep with `results`. - Aggregate `RETURN` (e.g. `count`, `collect`, `sum`) unions all contributing bindings into the output group's provenance, deduplicated per `(alias, id)` for nodes and per `(alias, hops)` for relationships. - `UNION ALL` concatenates branch provenance; `UNION` keeps the first branch's lineage for deduplicated rows. - Aggregating `WITH` clauses carry lineage forward. Inside the group, the contributing bindings (the upstream `MATCH` nodes and relationships) are deduplicated and frozen; any subsequent `MATCH` adds its own live bindings on top, so the final `RETURN` row's provenance shows both the pre-aggregation sources and the post-aggregation bindings. Chained aggregating `WITH` clauses compose transitively - the original ids and hops survive every aggregation hop. When the option is omitted or set to `false`, the runner has zero provenance overhead and `runner.provenance` returns an empty array. #### Per-Input-Row Segments: Aligning `collect()` with its Sources Each `RowProvenance` also carries a `rows` array: one **segment** per input row that contributed to the result. A segment is just the `{ nodes, relationships }` slice for that single contributing row. For non-aggregate rows `rows` always has length 1 and mirrors the top-level `nodes`/`relationships`. For aggregate rows the array positionally aligns with array-valued aggregates such as `collect`: ```cypher MATCH (a:City)-[:FLIGHT]->(b:City) RETURN a.country AS country, collect(b.name) AS destinations ``` ```javascript fq.results[0]; // { country: 'US', destinations: ['LAX', 'YYZ'] } fq.provenance[0].rows.length; // 2 fq.provenance[0].rows[0].nodes; // contributed LAX: includes b = lax fq.provenance[0].rows[1].nodes; // contributed YYZ: includes b = yyz ``` This lets you map each element of a `collect`/`sum`/`avg` result back to the exact node / relationship ids that produced it. #### Property-Level Lineage Each `NodeBinding` and `RelationshipHop` produced under `{ provenance: true }` also carries the matched **property values** alongside the ids: ```javascript const fq = new FlowQuery(query, null, null, { provenance: true }); await fq.run(); fq.provenance[0].nodes[0]; // { // alias: 'a', label: 'City', id: 'nyc', // properties: { name: 'New York', country: 'US' } // } fq.provenance[0].relationships[0].hops[0]; // { // left_id: 'nyc', right_id: 'lax', type: 'FLIGHT', // properties: { airline: 'AA' } // } ``` - `NodeBinding.properties` is a shallow copy of the matched record with `id` and `_label` stripped. `RelationshipHop.properties` is a shallow copy of the matched relationship's user-visible properties. #### Threading Lineage Through Virtual Sub-Queries A `CREATE VIRTUAL (:X) AS { ... }` block wraps an inner FlowQuery that produces the synthesised records exposed under the `:X` label. By default, a downstream `MATCH (x:X)` only sees the synthesised row's `id` - the upstream query that produced it is opaque. When `{ provenance: true }` is set, the inner runner's `RowProvenance` is threaded onto every binding whose record came from a virtual. Each `NodeBinding` and each `RelationshipHop` gains an optional `source: RowProvenance` field carrying the inner row's full lineage - recursively, when a virtual matches another virtual: ```typescript import { Runner } from "flowquery"; // Virtual graph: derived city = US-only subset of SrcCity. await new Runner(` CREATE VIRTUAL (:SrcCity) AS { UNWIND [ { id: 'nyc', country: 'US' }, { id: 'lhr', country: 'UK' } ] AS c RETURN c.id AS id, c.country AS country } `).run(); await new Runner(` CREATE VIRTUAL (:DerivedCity) AS { MATCH (s:SrcCity) WHERE s.country = 'US' RETURN s.id AS id } `).run(); const fq = new Runner(`MATCH (d:DerivedCity) RETURN d.id AS id`, null, null, { provenance: true }); await fq.run(); fq.provenance[0].nodes[0]; // { // alias: 'd', label: 'DerivedCity', id: 'nyc', // source: { // nodes: [{ alias: 's', label: 'SrcCity', id: 'nyc' }], // relationships: [] // } // } ``` Semantics: - The `source` field is **omitted** when the binding's record did not come from a virtual sub-query (e.g. records from `UNWIND … RETURN` inside the virtual produce a `source` with empty `nodes` and `relationships`, signalling "lineage was threaded but no graph slots were bound at this level"). - Sub-query lineage is **recursive**: a virtual that matches another virtual carries nested `source` chains all the way down. - Provenance mode bypasses the static-virtual cache because each invocation must produce fresh records to back the lineage weak-map. Static caching continues to apply when `provenance` is off. #### Data Sources and LET Chaining `LOAD` operations contribute row-level `data_sources` entries on each emitted `RowSegment`: one `{ source, source_provenance? }` per loaded record. `source` is the URL, file URI, async-function name, or `let://` reference; `source_provenance` is the inner `RowProvenance` of the source row when the `LOAD` was reading from a `LET`-bound dataset whose right-hand side itself produced lineage. The same chain surfaces structurally. `info.sources` and the per-label `info.nodes[Label].sources` arrays follow `LOAD FROM ` references and emit a `let://` entry, plus the underlying sources the `LET` sub-query touched (when both definitions are visible to the crawler in the same query). ```cypher LET cities = { LOAD JSON FROM 'file:///data/cities.json' AS c RETURN c.id AS id, c.name AS name }; CREATE VIRTUAL (:City) AS { LOAD JSON FROM cities AS c RETURN c.id AS id, c.name AS name }; MATCH (c:City) RETURN c.name AS name ``` ```javascript runner.metadata.info.nodes.City.sources; // ['file:///data/cities.json', 'let://cities'] runner.provenance[0].nodes[0].source.data_sources[0]; // { // source: 'let://cities', // source_provenance: { // nodes: [], relationships: [], // data_sources: [{ source: 'file:///data/cities.json' }] // } // } ``` #### Column-Level Lineage: Tracing Each Result Cell to Its Source `runner.info.returns` (added to `StatementInfo`) maps every output column to the `alias.property` accesses that compose it. Combined with row provenance it gives you per-cell traceability - value → source binding → node id → source URL - without any runtime AST inspection. For a query ```cypher MATCH (c:City)-[f:FLIGHT]->(d:City) WHERE c.country = 'US' RETURN c.name AS origin, d.name AS destination, f.airline AS airline ``` `runner.info.returns` is: ```javascript { origin: { references: [{ alias: 'c', kind: 'node', labels: ['City'], property: 'name' }], kind: 'property' }, destination: { references: [{ alias: 'd', kind: 'node', labels: ['City'], property: 'name' }], kind: 'property' }, airline: { references: [{ alias: 'f', kind: 'relationship', labels: ['FLIGHT'], property: 'airline' }], kind: 'property' } } ``` `kind` summarises how the column was built: | `kind` | Meaning | | -------------- | ------------------------------------------------------------------------- | | `'literal'` | Pure literal expression, no bindings (e.g. `42 AS answer`). | | `'property'` | Direct `alias.property` projection (or pass-through). | | `'expression'` | Computed from one or more `alias.property` accesses. | | `'aggregate'` | Aggregate function (`count`, `sum`, `collect`, …); see `aggregate` field. | Notes: - The map is keyed by the column's output alias (the part after `AS`), falling back to `expr0`, `expr1`, … for unnamed columns. - `count(c)` and similar aggregates over a bare binding produce `kind: 'aggregate'`, `aggregate: 'count'`, and an empty `references` list (no specific property is read). Use `count(c.id)` to surface a property reference. - Multi-label intersection matches (`MATCH (n:A:B)`) populate every label in `references[i].labels`. - For queries composed with `UNION` / `UNION ALL`, each output column's lineage is the **union** of the references contributed by every branch (deduplicated by `alias`, `kind`, `property`, and `labels`). The column `kind` is `aggregate` if any branch aggregates, the shared kind when all branches agree, and `expression` otherwise. Node labels, relationship types, and accessed properties are likewise collected from every branch. #### Combining Lineage and Provenance: `traceRow()` and `lineage()` `info.returns` (structural) and `runner.provenance` (runtime) are kept as separate streams so each is useful on its own. When you want both joined per cell, the `Runner` exposes two convenience methods. `runner.traceRow(rowIndex)` returns one `CellTrace` per output column for a single row, pairing the column's structural lineage with the node / relationship bindings whose alias matches it. The matched property value is extracted for you (including the built-ins `id`, `left_id`, `right_id`, `type`): ```javascript const runner = new FlowQuery( ` MATCH (c:City)-[f:FLIGHT]->(d:City) WHERE c.country = 'US' RETURN c.name AS origin, d.name AS destination, f.airline AS airline `, null, null, { provenance: true } ); await runner.run(); const trace = runner.traceRow(0); // trace.origin = { // column: 'origin', // value: 'New York', // lineage: { references: [{ alias: 'c', kind: 'node', labels: ['City'], property: 'name' }], kind: 'property' }, // bindings: [{ // reference: { alias: 'c', kind: 'node', labels: ['City'], property: 'name' }, // value: 'New York', // node: { alias: 'c', label: 'City', id: 'nyc', properties: { name: 'New York', country: 'US' }, source: {...} } // }] // } // trace.airline.bindings[0].relationship.hops[0].properties.airline === 'AA' ``` `runner.lineage()` is the one-shot equivalent over the entire result set: ```javascript const report = runner.lineage(); // report.columns: same shape as info.returns (structural per-column). // report.rows[i]: same shape as traceRow(i) (per-cell trace). ``` `bindings` is empty for literal columns, when the runner was constructed without `{ provenance: true }`, or when a row's bindings don't intersect the column's references. Aggregate columns like `collect(c.id)` surface one binding per contributing input row, so `trace.ids.bindings.map(b => b.value)` aligns with the collected array. `info`, `provenance`, and `metadata` remain available unchanged - `traceRow` / `lineage` are purely additive helpers. ## Contributing This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit [Contributor License Agreements](https://cla.opensource.microsoft.com). When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA. This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.