Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept Dictionary to Allow Updating Properties in trino__create_table_as #430

Open
1 task done
james-long-mx opened this issue Aug 27, 2024 · 3 comments · May be fixed by #431
Open
1 task done

Accept Dictionary to Allow Updating Properties in trino__create_table_as #430

james-long-mx opened this issue Aug 27, 2024 · 3 comments · May be fixed by #431
Labels
enhancement New feature or request

Comments

@james-long-mx
Copy link

Describe the feature

In some cases, it is not desirable for the properties defined in a model to be used when creating tables in a situation where the config context variable is defined by a given model. An example would be the case where a model specifies a location in its properties dictionary, but the running of the dbt model necessitates a temporary table to be created. As currently implemented, this temporary table is written to the same location as specified in the model. This is problematic when a location can only support one table being written there as is the case in Trino running with an Iceberg connector, for example. My proposed solution would allow for the trino__create_table_as macro to accept an optional argument named "properties_update_dict" that is defaulted to None where a dictionary could be supplied to modify what values are saved to the _properties variable through a call of the update method. Allowing this optional argument would allow a user to override the trino__create_table_as macro by defining a macro of the same name in his or her project that can serve as a wrapper around dbt.trino__create_table_as and supply the properties_update_dict parameter if desired. This would allow the overriden macro to stay in line with the built-in macro as updates are made while allowing the flexibility to adjust the behavior, which is presently not possible without copying the built-in macro and sidestepping the revision control of the built-in macro.

Describe alternatives you've considered

The present design does not allow for alternatives aside from overriding the macro in one's own project.

Who will benefit?

This will certainly benefit Trino/Iceberg users who are presently unable to use some incremental materializations and snapshots on account of writing multiple tables to a shared location not being allowed. As a general practice, others will benefit by not unnecessarily writing temporary files to the same location as their persisted table. The change is very simple and self-contained. I will create a PR.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@james-long-mx james-long-mx added the enhancement New feature or request label Aug 27, 2024
@damian3031
Copy link
Member

@james-long-mx Thank you for raising this issue and for your proposed solution.

We are aware of the issue with location and are researching the best possible way to address it.

However, the solution proposed in the PR doesn't seem to expose any mechanism for the end user to utilize it. Since macro would need to be overridden anyway, I'm unsure of the added value this change brings.

If I’ve misunderstood any part, could you provide more details and a specific use case scenario?

@james-long-mx
Copy link
Author

How I envision it being used is templated below.

{% macro trino__create_table_as(temporary, relation, sql, replace=False) -%} {%- set properties_update_dict = {} -%} {{ dbt.trino__create_table_as(temporary, relation, sql, replace, properties_update_dict) }} {% endmacro %}

The overridden macro is mostly just a wrapper function so the user can put logic to set what key-value pairs are stored in properties_update_dict. They then subsequently call the built-in macro while passing in properties_update_dict. In my use case, I would pull in the location property from the config and use a regular expression-based substitution to modify it when temporary is equal to True. I just left my proposed solution for the code to be flexible. Given that the trino__create_table_as macro does other things, I thought it best to limit the amount of overriding of the built-in macro. By doing this, changes to the built-in macro are pulled into a user's project even if they override the macro. This prevents versioning headaches.

@james-long-mx
Copy link
Author

In my current code, I have copied the built-in trino__create_table_as macro and added the following lines to the top:

{%- set _properties = config.get('properties') -%} {%- if temporary and 'location' in _properties.keys() -%} {%- set new_location = modules.re.sub(pattern='(.+?)(/?\')', string=_properties['location'], repl='\\1__dbt_tmp\\2') -%} {%- do _properties.update({'location': new_location}) -%} {%- endif -%}

My PR gives developers the freedom to make whatever changes they like to the properties rather than limiting it to what I've done in my code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@damian3031 @james-long-mx and others