Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Ibis has no way to convert UDFs to substrait plan #6425

Closed
1 task done
Anindyadeep opened this issue Jun 12, 2023 · 1 comment
Closed
1 task done

feat: Ibis has no way to convert UDFs to substrait plan #6425

Anindyadeep opened this issue Jun 12, 2023 · 1 comment
Labels
feature Features or general enhancements

Comments

@Anindyadeep
Copy link

Anindyadeep commented Jun 12, 2023

Ibis is doing some incredible work by integrating substrait for generating substrait plan of the user's query to support cross DB operations in python.

Suppose we have a table like this :

┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┓
┃ cust_id ┃ income1 ┃ income2 ┃ income3  ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━┩
│ int64   │ float64 │ float64 │ float64  │
├─────────┼─────────┼─────────┼──────────┤
│       1 │ 20000.0 │ 3560.57 │      nan │
│       2 │ 34546.9 │ 6000.66 │   1000.0 │
│       3 │ 75430.2 │ 8111.01 │      nan │
│       4 │ 55430.2 │ 8111.01 │   1200.0 │
│       5 │     nan │ 8111.01 │      nan │
│       6 │     nan │     nan │ 100000.0 │
└─────────┴─────────┴─────────┴──────────┘

Right now we define udf's in ibis like this

import ibis.expr.datatypes as dt 
from ibis.backends.pandas.udf import udf

@udf.analytic(input_type = [dt.double, dt.double], output_type=dt.double)
def function(c1, c2):
    return c1 + c2 

And hence we can apply this function to our tables like this

function(table.income1, table.income2)

And applying this function returns this

┏━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ AnalyticVectorizedUDF() ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ float64                 │
├─────────────────────────┤
│                23560.57 │
│                40547.56 │
│                83541.21 │
│                63541.21 │
│                     nan │
│                     nan │
└─────────────────────────┘

Even we can mutate our existing table to add a new column with this function.

mutate_expression = table.mutate(
    added = function(table.income1, table.income2)
)

Before coming to the main problem, consider this, I have a simple expression like this

expression = table.income1 + table.income2

And now I can generate the substrait plan of this expression using this code :

from ibis_substrait.compiler.core import SubstraitCompiler

compiler = SubstraitCompiler()
expression = table.income1 + table.income2
substrait_plan = compiler.compile(table.mutate(expression))

Hence I can get the substrait plan. But when I am trying to get the substrait plan through an user defined function then I am getting this error:

udf_expression = table.mutate(
    added = function(table.income1, table.income2)
)

substrait_plan_udf = compiler.compile(table.mutate(udf_expression))

Doing this gives me the error : KeyError: 'AnalyticVectorizedUDF'.

I even thought that substrait might also not provide the support for now. But it seems like substrait do support :

  • UserDefined defined type
  • ParameterizedUserDefined type
  • UserDefined relation

But Not user defined relations.

This concludes that ibis is not supporting generating substrait plans for user defined functions. But it will be awesome if we have one.

P.S I have already posted to ibis-substrait github repo. Though they did't have a template for feature. Lemme know I will delete this issue here, if needed.

Thanks

Describe the solution you'd like

The solution I would like to have is a substrait plan similar like how we have for expression for user defined functions too.

What version of ibis are you running?

5.1.0

What backend(s) are you using, if any?

The backends I am using right now is

  • DuckDB
  • Pandas
  • SQL

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Anindyadeep Anindyadeep added the feature Features or general enhancements label Jun 12, 2023
@gforsyth
Copy link
Member

Thanks for raising this @Anindyadeep ! I've responded over in ibis-project/ibis-substrait#644 which I think is the more relevant place to continue this discussion, so I'm going to close out this issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
None yet
Development

No branches or pull requests

2 participants