Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SNOW-1649172]: Fix loc set when setting DataFrame row with Series value #2213

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

sfc-gh-rdurrani
Copy link
Contributor

@sfc-gh-rdurrani sfc-gh-rdurrani commented Sep 3, 2024

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1649172

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
  3. Please describe how your code solves the related issue.

When doing df.loc[x] = series, an error occurs because series does not have the same number of columns as the dataframe being set. Instead, the Series should be transposed and set, regardless of whether it has an equal number of rows as the dataframe has columns.

@sfc-gh-rdurrani sfc-gh-rdurrani requested a review from a team as a code owner September 3, 2024 19:02
@sfc-gh-rdurrani sfc-gh-rdurrani added the NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs label Sep 3, 2024
@sfc-gh-rdurrani sfc-gh-rdurrani enabled auto-merge (squash) September 3, 2024 19:30
# Conflicts:
#	CHANGELOG.md
#	src/snowflake/snowpark/modin/pandas/series.py
#	tests/integ/modin/frame/test_loc.py
@@ -1832,6 +1832,15 @@ def loc():
viper 0 0
sidewinder 0 0

Setting the values with a Series item.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-helmeleegy this is the example I added

@sfc-gh-azhan
Copy link
Collaborator

sfc-gh-azhan commented Sep 19, 2024

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1649172

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code

      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages

    • I am adding a new telemetry message

    • I am adding new credentials

    • I am adding a new dependency

    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.

  3. Please describe how your code solves the related issue.
    Please write a short description of how your code change solves the related issue.

Please describe what is the problem.

@sfc-gh-rdurrani sfc-gh-rdurrani enabled auto-merge (squash) September 19, 2024 21:41
Copy link
Collaborator

@sfc-gh-azhan sfc-gh-azhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please describe what was the issue?



@sql_count_checker(query_count=1, join_count=3)
def test_df_iloc_full_set_row_from_series_int_and_string_indexes():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can combine this one into the previous one.

if isinstance(df, pd.DataFrame):
df.loc[:] = series
else:
if index == [0, 1, 2]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you'd better just compare the result with the expected result. These steps here can be confusing.

@@ -1039,6 +1039,9 @@ def __setitem__(
)
if item_is_2d_array:
item = pd.DataFrame(item)
frame_is_df_and_item_is_series = isinstance(item, pd.Series) and isinstance(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens for item is a Index or list? Are they matching with pandas? Can you verify too?

original_index = index
# If `item` is from a Series (rather than a Dataframe), flip the series item values to apply them
# across columns rather than rows.
if frame_is_df_and_item_is_series and (columns == slice(None) or len(columns) > 1): # type: ignore[arg-type]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you wrap it into a function and use function name to brief what this method does?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean (columns == slice(None) or len(columns) > 1)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this type: ignore[arg-type] actually indicate something is wrong. You didn't consider all type cases.

item, col_len, move_index_to_cols=True
)

if is_scalar(index):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if index is not scalar?

original_index = index
# If `item` is from a Series (rather than a Dataframe), flip the series item values to apply them
# across columns rather than rows.
if frame_is_df_and_item_is_series and (columns == slice(None) or len(columns) > 1): # type: ignore[arg-type]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done in _set_2d_labels_helper_for_frame_item

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants