Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1651234: Fix create_dataframe throwing an exception for structured dtypes #2240

Merged
merged 9 commits into from
Sep 20, 2024

Conversation

sfc-gh-jrose
Copy link
Contributor

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1651234

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
  3. Please describe how your code solves the related issue.

    This change modifies the logic for session.create_dataframe to add support for StructType object as well as removes it's dependency on to_object and to_array. Those calls create semi-structured columns which are incompatible with iceberg. By switching to a cast full type information is retained instead.

Copy link

github-actions bot commented Sep 5, 2024

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

@sfc-gh-jrose sfc-gh-jrose requested a review from a team September 5, 2024 16:41
@sfc-gh-jrose sfc-gh-jrose changed the title SNOW-1651234: Fix create_dataframe throwing an exepction for structured dtypes SNOW-1651234: Fix create_dataframe throwing an exception for structured dtypes Sep 5, 2024
@sfc-gh-jrose sfc-gh-jrose marked this pull request as ready for review September 5, 2024 17:26
@sfc-gh-jrose sfc-gh-jrose requested a review from a team as a code owner September 5, 2024 17:26
pytest.skip("Test requires iceberg support and structured type support.")

_, __, expected_schema = STRUCTURED_TYPES_EXAMPLES[True]
table_name = f"snowpark_structured_dtypes_{uuid.uuid4().hex[:5]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, any reason why we are using this naming format instead os standard function Utils.random_table_name()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason. Other tests in this file did it this way so I was mostly just copying style.

@sfc-gh-jrose
Copy link
Contributor Author

Defering this change until structured type support is rolled out completely.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 10, 2024
@sfc-gh-yixie
Copy link
Collaborator

We can continue to merge this PR since it doesn't hurt anything.

CHANGELOG.md Outdated
@@ -49,6 +49,7 @@
- Fixed a bug where calling `DataFrame.to_snowpark_pandas_dataframe` without explicitly initializing the Snowpark pandas plugin caused an error.
- Fixed a bug where using the `explode` function in dynamic table creation caused a SQL compilation error due to improper boolean type casting on the `outer` parameter.
- Fixed a bug where using `to_pandas_batches` with async jobs caused an error due to improper handling of waiting for asynchronous query completion.
- Fixed a bug in `session.create_dataframe` that caused a sql error when creating an iceberg table with structured datatypes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better remove this one because there are other problems with create_dataframe.

@sfc-gh-jrose sfc-gh-jrose reopened this Sep 11, 2024
@sfc-gh-jrose sfc-gh-jrose added the NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md label Sep 11, 2024
@sfc-gh-jrose sfc-gh-jrose merged commit 5514105 into main Sep 20, 2024
35 checks passed
@sfc-gh-jrose sfc-gh-jrose deleted the jrose_snow_1651234_structured_create_dataframe branch September 20, 2024 22:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants