[Security Issue] pem_private_key not redacted in Spark Logical Plan UI #525

Loudegaste · 2023-09-08T12:39:24Z

Hi,
we are using the snowflake spark connector to push data from foundry to snowflake. We noticed that the pem_private_key is not redacted from the Query Plan and therefore leaking.

We expect that the pem_private_key is redacted, just as the 'sfurl' in the screenshot.

We first raised the issue to the Foundry team. After review they concluded that this issue came from the Spark connector itself and should therefore be processed here.

Python version: 3.8.*
Pyspark version: 3.2.1

Here is the code used for the spark connector:

{
        "sfURL": config["snowflake_account"],
        "sfUser": "...",
        "pem_private_key": key,
        "role": "...",
        "sfWarehouse": config["warehouse"],
        "sfDatabase": config["database"],
        "sfSchema": config["schema"],
    }

inp.dataframe().write.format(SNOWFLAKE_SOURCE_NAME).options(
                **connection_parameters
            ).option("dbtable", f'"{raw_table_name}"').mode("overwrite").save()

The text was updated successfully, but these errors were encountered:

Loudegaste · 2023-09-26T12:47:20Z

Hi,
following up on this, further experimentation on our side has revealed that this seems to be a non-deterministic issue. While making multiple runs of exactly the same pipeline, the pem_private_key is sometime redacted and sometime not. So far I couldn't find any factor predicting the behaviour.

rshkv · 2023-09-29T16:35:21Z

@Loudegaste, neither Snowflake's connector nor Foundry seem to do anything additional about redacting the pem_private_key. They just rely on Spark's built-in redaction mechanism.

Spark, when rendering the query plan, just goes through SQLConf.redact which redacts based on the config values for spark.sql.redaction.options.regex and spark.redaction.regex. The former defaults to (?i)url and the latter is overridden in Foundry to include additional keywords.

I wonder if the non-determinism you see is explained by the fact that Spark, when redacting, looks for sensitive keywords not just in the config key but also in the config value. If the pem_private_key differs between runs, you may sometimes see it redacted because it happens to contain the string url in that run.

Loudegaste · 2023-10-04T11:39:43Z

Hi @rshkv,
thanks for the reply. That would mean the issue needs to be raised with Spark directly ?
Btw, do you think then that spark.redaction.string.regex could provide a work around in the mean time ?
We've actually tried to change spark.redaction.regex to include 'pem' and this didn't solve the issue.

Loudegaste · 2023-10-05T09:44:11Z

As @rshkv suggested, the keys do indeed get redacted when they contain "url" as a substring. This allows us to have an ugly workaround by adding "url" at the end of the key being used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Issue] pem_private_key not redacted in Spark Logical Plan UI #525

[Security Issue] pem_private_key not redacted in Spark Logical Plan UI #525

Loudegaste commented Sep 8, 2023 •

edited

Loading

Loudegaste commented Sep 26, 2023

rshkv commented Sep 29, 2023

Loudegaste commented Oct 4, 2023

Loudegaste commented Oct 5, 2023

[Security Issue] pem_private_key not redacted in Spark Logical Plan UI #525

[Security Issue] pem_private_key not redacted in Spark Logical Plan UI #525

Comments

Loudegaste commented Sep 8, 2023 • edited Loading

Loudegaste commented Sep 26, 2023

rshkv commented Sep 29, 2023

Loudegaste commented Oct 4, 2023

Loudegaste commented Oct 5, 2023

Loudegaste commented Sep 8, 2023 •

edited

Loading