Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glue 4.0 native delta jar is not being loaded by default #185

Open
royraposonjr opened this issue Jun 5, 2023 · 1 comment
Open

Glue 4.0 native delta jar is not being loaded by default #185

royraposonjr opened this issue Jun 5, 2023 · 1 comment

Comments

@royraposonjr
Copy link

I have a problem with using native delta table support even after adding DATALAKE_FORMATS: delta to my environment which adds the jars for delta-core. It still cant import delta module.

A workaround I found is to add --py-files /home/glue_user/aws-glue-libs/datalake-connectors/delta-2.1.0/delta-core_2.12-2.1.0.jar to my arguments and on pytest I added spark.sparkContext.addPyFile("/home/glue_user/aws-glue-libs/datalake-connectors/delta-2.1.0/delta-core_2.12-2.1.0.jar").

Is there a way to automatically load the jars or am I missing something?

@mo2menelzeiny
Copy link

mo2menelzeiny commented Aug 3, 2023

Well, it was a bit tricky, I tried including the jars for core and storage but it didn't register the classes for some reason. I ended up including it in the packages which did work for me.

here is an example of my pytest fixture that initiates the spark session for the tests

from awsglue.context import GlueContext
from pyspark.sql import SparkSession
import pytest


@pytest.fixture(scope="session", autouse=True)
def glue_context():
    spark_session = (
        SparkSession
        .builder
        .config("spark.jars.packages", "io.delta:delta-core_2.12:2.1.0")
        .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
        .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
        .getOrCreate()
    )

    glue_context = GlueContext(spark_session.sparkContext)
    yield glue_context

    spark_session.stop()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants