Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support other stateful ML artifacts like transformers #179

Closed
gbolmier opened this issue Jul 14, 2021 · 4 comments
Closed

Support other stateful ML artifacts like transformers #179

gbolmier opened this issue Jul 14, 2021 · 4 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@gbolmier
Copy link
Contributor

/kind feature

What happened:

ML models often require stateful transformers to process data for them (e.g. standard scaler). Unfortunately, this kind of artifact isn't supported as of now.

Also some ML frameworks aren't supported, yet? Especially frameworks that don't use specific serialization formats, but rely on e.g. the pickle protocol.

I'm not familiar with OCI stuff or the internals of registries, what's the process and the effort to add support for new frameworks or new serialization formats?

What you expected to happen:

Extended support to broader kinds of ML artifacts.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

@caicloud-bot caicloud-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 14, 2021
@gaocegege
Copy link
Member

Hi @gbolmier

ML models often require stateful transformers to process data for them (e.g. standard scaler). Unfortunately, this kind of artifact isn't supported as of now.

Do you mean the model with transformer structure, or some transformation functions to process the data?

Also some ML frameworks aren't supported, yet? Especially frameworks that don't use specific serialization formats, but rely on e.g. the pickle protocol.

https://github.com/kleveross/ormb/blob/master/pkg/model/format.go The format is defined here. You can add a new format pickle.

And, welcome contributions!

@gbolmier
Copy link
Contributor Author

Hi @gaocegege, thanks a lot for the prompt answer.

Do you mean the model with transformer structure, or some transformation functions to process the data?

I'm referring to the second (e.g. standard scaler, pca, tf-idf vectorizer). These transformers are closely tied to the model, they often have hyperparameters that impact the model's performance and a state updated while processing the training data (like models). The model's performance on unseen data is dependent on the transformers used during the training phase, that's why stateful transformers are persisted to further process unseen data in the same way they processed the training data.

https://github.com/kleveross/ormb/blob/master/pkg/model/format.go The format is defined here. You can add a new format pickle.

And, welcome contributions!

Thanks a lot for the pointer, cool this looks pretty straightforward.

Follow-up question, let's say I want to share and publish some transformers tied to my ML model, do I have to create similar tree structures for each transformer along the model one?

$ tree .
.
├── sklearn_model
│   ├── model
│   │   └── sklearn_model.joblib
│   └── ormbfile.yaml
├── sklearn_transformer_a
│   ├── model
│   │   └── transformer_a.joblib
│   └── ormbfile.yaml
└── sklearn_transformer_b
    ├── model
    │   └── transformer_b.joblib
    └── ormbfile.yaml

6 directories, 6 files

If that's the case, could we make it more convenient in practice?

@gaocegege
Copy link
Member

If that's the case, could we make it more convenient in practice?

What's your favorite srtructure? As you know, OCI supports layer-based storage like Docker Image, maybe we could discuss it further.

@gbolmier
Copy link
Contributor Author

Actually, it's not really the structure which is inconvenient, it's more about writing the ormbfile.yaml artifact config file. I opened a separate issue (#180) to discuss this further. I'm closing this one as nothing prevents users to publish other ML artifacts like transformers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants