Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fixes #1489] Add meta flag for 'unsafe' queries #1519

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mdconaway
Copy link

@mdconaway mdconaway commented Aug 26, 2017

Add meta flag for 'unsafe' queries

This closes issue #1489

This closes issue #1489
@sailsbot
Copy link

Hi @mdconaway! It looks like your pull request title doesn’t quite conform to our guidelines. Please edit the title so that it starts with [proposal], [patch], [fixes #], or [implements #]. Once you've fixed it, post a comment below (e.g. "ok, fixed!") and we'll take a look!

@mdconaway mdconaway changed the title Add meta flag for 'unsafe' queries Fixes #1489 Aug 26, 2017
@sailsbot
Copy link

Hi @mdconaway! It looks like your pull request title doesn’t quite conform to our guidelines. Please edit the title so that it starts with [proposal], [patch], [fixes #], or [implements #]. Once you've fixed it, post a comment below (e.g. "ok, fixed!") and we'll take a look!

@mdconaway mdconaway changed the title Fixes #1489 fixes #1489 Aug 26, 2017
@sailsbot
Copy link

Hi @mdconaway! It looks like your pull request title doesn’t quite conform to our guidelines. Please edit the title so that it starts with [proposal], [patch], [fixes #], or [implements #]. Once you've fixed it, post a comment below (e.g. "ok, fixed!") and we'll take a look!

@mdconaway mdconaway changed the title fixes #1489 [fixes #1489] Add meta flag for 'unsafe' queries Aug 26, 2017
@mdconaway
Copy link
Author

How 'bout meow?

@mdconaway
Copy link
Author

Hey guys, any input on this pull request?
@mikermcneil @sgress454

@sgress454
Copy link
Member

@mdconaway Sorry it's taken so long to circle back to this. I think we're all in agreement that it's worthwhile to have the ability to turn off normalization of attributes in criteria. @mikermcneil and I were hoping it was something that could be done at a slightly higher level than what you're proposing, because otherwise it's easy to miss things that don't touch your specific use-case. For example, in your PR you added the new meta argument to one of the normalizeCriteria calls in forge-stage-two-query, but missed the other one, probably because you weren't doing a "populate where" query.

Having said that, and looking at your PR, I'm not sure there is a way to do this at much of a higher level. What @mikermcneil proposed avoids having to alter normalize-constraint, but you still have to normalize the sort clause as you've done, plus there's all the files you have to touch just to pass meta down.

We'll put our heads together for a bit and get back to you...

@mikermcneil
Copy link
Member

Hey @mdconaway! So went over this a bit with Scott and meow I'm really seeing the big picture:

• skipping criteria normalization (skipConstraintNormalization)
• removing sort normalizations that enable sorting by nested key (separate thing)
• turning off attribute->column name mapping (separate thing)

I'm thinking we should split this up a bit first-- can we make this PR about skipCriteriaNormalization first? It'd be good to knock this out one piece at a time. (Then once we've got the more fine-grained / testable stuff together, we could always add some kind of "unsafe" flag for aggregating the options)

That said, I'm wondering: would it make sense for this use case to just use .native()?

@mdconaway
Copy link
Author

@sgress454 and @mikermcneil,

I believe it may help clarify things a bit if I explain our greater system architecture and expand on the behavior we are actually looking for based on this pull request.

My team and I work in a large (read: global) private network where we develop a great number of applications for the various departments in our organization. We have been working very hard to achieve a few specific things:

  1. Build a pure micro-service infrastructure
  2. Language unification (Writing all of our backend code in JavaScript to enable fluid transition of front-end and back-end devs throughout the organization)
  3. 12-factor server development patterns to enable automated cloud deployment of our APIs
  4. Well defined REST data interchange formats (we use Ember REST Adapter specifications)
  5. Maximize DRY, and minimize custom code in our API servers
  6. Build pure API servers, and well-designed SPA's to interact with the API servers

We currently run dozens of separate Sails.js server instances, and in each instance we handle all of our CRUD interaction using this library:
https://github.com/mdconaway/sails-ember-rest

Which is a Sails 1.0 compatible library I have developed and now currently maintain that enables the Ember REST Adapter to just work with Sails.js and Waterline.

What the sails-ember-rest library achieves for our team is that we don't typically "write" controllers. Many of our applications have dozens of data models, and those models can ride on Mongo, MySQL, or both in a single micro-service. No matter the model, or its underlying DB adapter, we handle all of our crud actions in our controller files with the following code:

import { controller } from 'sails-ember-rest';

export default new controller();

We have even added the ability to execute policy-like events after a CRUD action through the idea of "service interrupts" introduced by the sails-ember-rest library here.

As a result, 90% of the code in most of our micro-services is just policies and model schemas. This has a huge benefit to us, because it means that the CRUD actions for our services have very predictable behavior, and the only custom code we are typically writing per service is for special actions like file uploads, and policies that control who can acccess what data at what action end-point.

Any use of .native() is typically a non-starter for us, because it means we would have to discard the sails-ember-rest controller and build something that will continually progress towards divergent behavior. (Though we do use things like .native() in custom endpoints that do complex analytics using things like the Mongo aggregation pipeline)

Most of the time, matching left hand side query values to their corresponding schema values is totally acceptable in our services, but there are outliers. We do have some models that are extremely complicated, and as a result we need to do searches and sorts against nested key values. With that being said, we absolutely do not want server behavior when searching a nested key to diverge from the behavior when searching a root key and we do not want to make our front-end SPA's aware of the backend DB type as that would have an impact on their model hooks and a whole other cascade of things like our common paginator route in Ember. We need to keep all of the other features of the waterline language, such as the query forge expanding incoming queries like:

{
  a: ['b', 'c']
}

into:

{
  a: {
    in: ['b', 'c']
  }
}

We also do not want to lose the power of waterline which allows us to transition models to completely different databases overnight if we see performance gains using a different DB type without changing any application level code. Writing anything with .native() creates potentially large amounts of refactor work for a server in the event that we decide to move to a new database.

When we can land a merge into waterline that allows us to selectively disable left-hand-side key validation as a query traverses the forge, I will also be adding a way to toggle that feature to the sails-ember-rest library so that we can use that query feature without having to write custom controller actions.

The TLDR version:
-We need a way to disable left-hand-side key validation on query and sort objects while still retaining all of the other niceties of the waterline query forge.
-We specifically don't want anything in the query forge to be skipped

I hope that helps clarify things a bit! Please let me know if you have any questions about what I have explained above as I haven't finished my morning coffee yet.

@mdconaway
Copy link
Author

@sgress454 @mikermcneil ,

If you guys are ok with it, I can modify this pull request to also take care of "populate where."

I am willing to personally support (with code and unit test writing) other internal refactors of waterline later, but my team's near term need is primarily just allowing the loose left-hand keys while retaining everything else in waterline.

If you guys can accept the unsafe flag for now (or want me to change it to another name so that you can reserve it for another purpose) it would be highly beneficial if we could merge this sooner rather than later. I am more than willing to put together a war-plan with you guys over slack or some other service so that we can collectively refactor some of the other query forge internals over a longer timeline that isn't as mission critical for our team.

<- will work for merge

@mikermcneil
Copy link
Member

@mdconaway really appreciate you taking the time to detail the background here, and staying positive :)

we need to do searches and sorts against nested key values

I recognize you're going for a short-term solution here since you need this asap, and your fork seems to be an appropriate solution for that. So first off, don't worry-- it's totally fine for you to use that (I don't anticipate there being any more major upheavals in how the files related to the query forge are structured any time in the near to medium-term future).


As far as what we merge into core, I have to make sure whatever we do here also lines up with the plan going forward, which includes supporting this kind of usage.

I'm currently in the middle of a vaguely-related effort: changing the default behavior so that it avoids destructively mutating arguments from userland by default (but still allowing destructive mutation if configured to do so, via a meta key). Thus I'm trying my best not to conflict with what's in this PR, but it looks like it might be unavoidable... I just took another look through to get a sense of which of my changes might cause a problem, and the issue is that I also need to send through meta to some of the deeper-nested utility functions. So before I do that and mess anything up, I wanted to figure out what do to support your use case longer term.

I'm wondering if it'd maybe be cleaner/faster to just to go ahead and take the plunge into allowing dot notation (foo.bar.baz) instead of attribute names for constraints. Or something equivalent anyway. (And obviously this would only be usable for adapters that support it.)

But first, just to make 100% sure:
• is the nested foo.bar.baz-style syntax for eq constraints the only blocker right now?
• is it just for looking up stuff within type: 'json' and type: 'ref' attributes?
• how attached are you to dot notation as far as syntax? e.g. rather than dog.collar.color: 'red', another option would be something like dog: { whose: { collar: {whose: {color: 'red'}} }. I'm not saying that's better or worse per se, but I'm just trying to get a sense. As you probably noticed when exploring the stage 3 query stuff, there are a few edge cases (e.g. through tables) that make anything that relies on dots a little worrisome (and potentially bug-prone)

I'm heading out the door right now, but I'm planning to take a deeper dive into this tomorrow while I have some time, so we can hopefully have a resolution by the end of this weekend.

(refs https://github.com/balderdashy/waterline/issues/1489)

mikermcneil added a commit that referenced this pull request Oct 1, 2017
@mikermcneil
Copy link
Member

@mdunisch I should add: besides lookups inside type: ‘json’ and type: ‘ref’ attributes, I figure lookups inside out-of-schema attributes would fit in the same category.

The important thing to me is that we fail with a straightforward error message if someone tries to use this syntax to do a lookup within an association (whether singular or plural). ie I want to make sure what you’re suggesting is purely for embeds.

Doing this inside an association is a different story (although eerily inverse to this) and involves a WHERE sub query, which is only supported in databases with a concept of joins and foreign keys.

I think it could be pretty nice to use dot notation for both use cases— but we need to be able to offer up purposeful errors for when people inevitably try to do that in the mean time between when this works for embeds and when it works for associations.

We also have to consider embedded arrays, though it’s pretty easy just to mimic MongoDB there

Some things to consider, using a pseudo-real example:

await Facility.find({
  //for embedded dicts:
  ‘address.city’: ‘Austin’,
  // what about embedded arrays?
  //...
  //for singular associations, deep:
  // (which might contain nested embeds as well)
  ‘organization.createdBy.name’: ‘Lee’,
  // for plural associations?
})

@mdconaway
Copy link
Author

Hey @mikermcneil ,

Thanks for the input!

I would say that our most common use case for dot notation is searching and sorting against embedded objects within sails-mongo/MongoDB in production and sails-disk/NeDB for our local dev/test servers. That being said, if you are open to allowing waterline to accept dot notation queries/sorts against attributes with type 'json' or type 'ref' that will likely solve all of our outstanding issues related to this pull request. It also seems like that would actually be the more sane approach as well, since the type of json kind of implicitly means that you would need to send a dot notation query to use it for anything meaningful.

It could be nice to allow dot notation searches against values that are not within the schema, but we typically try to keep at least our root level keys defined so we could usually declare something like 'json' at the schema level. For other teams that have really loose schemas for a reason, I can see them arguing that they would want to use dot notation outside of the schema.

Regarding this syntax:

{dog: { whose: { collar: {whose: {color: 'red'}} }}}

We could probably make that work as well, but it seems more verbose and it would require a refactoring of some of our code in other places like Ember. I am not opposed to it, but I would need to have the deeper significance of this syntax explained to fully endorse it.

To this point I have never really used where objects against relationships (many<->many) but I think that would be a really powerful thing for us to take advantage of in our MySQL apps. Specifically reducing some parent record set by searching some child record set in one query. Is this currently supported in Waterline ~0.13, or is it something that is on the development timeline?

And now for something totally unrelated

sails-mysql:
I have noticed that the sails-mysql adapter doesn't support waterline 'or' syntax. Is this true or am I totally missing something?

sails-mongo:
I noticed the following pull request over in the sails-mongo repo and I think it is something that would be useful for the community.
balderdashy/sails-mongo#464

I don't believe it can be merged into sails-mongo as is, because it would force all 'contains' queries to be case insensitive. What plans do you guys have regarding case insensitive searches as part of the waterline query syntax for adapters like sails-mongo? Is there anything I can help with in order to make a better pull request for this feature? I don't mind contributing to this one if you point me in the right direction.

@mikermcneil
Copy link
Member

I have noticed that the sails-mysql adapter doesn't support waterline 'or' syntax. Is this true or am I totally missing something?

It should, and does, unless I'm missing something

What plans do you guys have regarding case insensitive searches as part of the waterline query syntax for adapters like sails-mongo?

Let's bring that discussion to the other PR. My focus as far as open source stuff this weekend is pretty saturated with the forge stuff, and then there are some priorities we need to look at from flagship, but Scott or I will try to take a look at that other PR in the upcoming weeks. In general, our plan is to make Waterline query syntax completely defer to the adapter as far as case-sensitivity (prior to 0.13, we made a decision on that in core, and it caused performance issues for folks working on large projects with databases like postgresql. That's why we moved to avoid any kind of core normalization related to case-sensitivity in the latest version)

@mikermcneil
Copy link
Member

mikermcneil commented Oct 1, 2017

So re: this kind of querying:

There's a lot of things supported by Mongo in this department. My instinct is to avoid most of them (that's what native queries are for).

So I took a look at the subset of what's also supported elsewhere these days (e.g. PostgreSQL, MySQL). And, like the last time I looked a couple of years ago, it's all over the map -_-

So to begin with, we've got to make a call on how this works at the adapter level. For now, I think we're best off supporting arbitrary deep targets for constraints, with the expectation that the RHS of the constraint syntax is standard Waterline query language. If the data queried at runtime happens to be an array, that's ok-- but it's up to the adapter as far as what that actually means-- and database-specific syntax (e.g. specifying $all in your userland code will not be allowed-- use a native query for that). Other than that, deep targets are fine as long as the deep target begins with an out-of-schema attribute or an attribute with type:'json' or type: 'ref'. For example the constraint 'foo.bar.baz': … is ok as long as the foo attribute fits that definition, and as long as the RHS is a valid RHS for a Waterline constraint (e.g. a primitive like 3, 'sheep', false, or null to indicate an equivalency constraint, or a dictionary like {contains: 'haggis'} to indicate a complex constraint)

@mdconaway would that do the trick?

mikermcneil added a commit that referenced this pull request Oct 1, 2017
…targets' (in preparation for 'deep targets', see #1519)
@mdconaway
Copy link
Author

That sounds good to me, provided this also supports 'in' queries like:

{
    'foo.bar.baz': ['a', 'b', 'c']
}

And other thing like date comparisons using >= and <= etc.

@mikermcneil
Copy link
Member

@mdconaway righto, it'd support anything you can currently do with a shallow target.

  'foo.bar.baz': ['a', 'b', 'c']

technically would be recognized as shorthand notation and expand into

'foo.bar.baz': {
  in: ['a', 'b', 'c']
}

but it would still work

@mikermcneil
Copy link
Member

mikermcneil commented Oct 1, 2017

@mdconaway something to be aware of though:

if you were to write 'foo.bar.baz': ['a', 'b', 'c'] intending it to be a Mongo array deep equality check, you'd be disappointed/surprised-- instead what you're actually getting is an "in" query. The same thing could be said for shallow targets though, because that's the way it works today if you did foo: ['a','b','c']. Since we don't know if foo is an array or not, we can't really do anything smarter-- and I think that's probably for the best

@mdconaway
Copy link
Author

No complaints here! I don't think I've ever found a good use case for array equality as of yet.

@mikermcneil
Copy link
Member

I believe 897b891 does the trick as far as the where clause.

I'm open to reusing the same meta key to allow this stuff in the sort clause, but I'd like to get this working w/ just where first, for clarity.

@mdconaway would you mind updating your PR so that it contains just the tests? (it's ok that any involving sort will fail at the moment). Also, if you wouldn't mind taking this (just the where clause bit) for a spin with your db, it'd be good to know it actually works once the code actually makes it past all of the checks and balances :)

@mdconaway
Copy link
Author

Removed all changes except tests. What is the near to mid-term requirement for dealing with dot notation sorting?

@mikermcneil
Copy link
Member

What is the near to mid-term requirement for dealing with dot notation sorting?

First off I just want to make sure what we have so far actually works in Mongo.

Assuming that's good, then I think all we have to do is basically make the changes you did in fs3q and then bring in the comparable stuff in normalizeSortClause()

@mikermcneil
Copy link
Member

mikermcneil commented Oct 1, 2017

@mdconaway well I went ahead and tried it out and a cursory test worked for me:

image

(if it's also working for you in a real world setting then I'll go ahead and finish up the sort stuff)

@mikermcneil
Copy link
Member

@mdconaway ok I just did some basic tests as of 7f43bee and this is now working for me for both the sort and where use cases. Would you take it for a spin when you have a moment?

@mikermcneil
Copy link
Member

mikermcneil commented Oct 2, 2017

@mdconaway actually nevermind-- I tested this with .populate() and noticed that currently, deep sort doesn't work when it's in the subcriteria. Any ideas?

e.g.

Province.find()
.populate('zones', {
  select: ['id', 'cachedWeather'],
  sort: 'cachedWeather.name DESC'
})
.meta({enableExperimentalDeepTargets:true})
.log()

Also, I haven't tried this yet, but we'll need to test sorting on multiple comparators-- e.g.:

Province.find()
.populate('zones', {
  select: ['id', 'x', 'y', 'cachedWeather'],
  sort: [
    { 'cachedWeather.main.temp': 'ASC' },
    { 'cachedWeather.name': 'DESC' },
    { x: 'ASC' },
    { y: 'ASC' }
  ]
})
.meta({enableExperimentalDeepTargets:true})
.log()

@mikermcneil
Copy link
Member

mikermcneil commented Oct 2, 2017

well looks like the s2q is fine:

Stage 2 query:  { method: 'find',
  using: 'province',
  criteria: 
   { where: {},
     limit: 9007199254740991,
     skip: 0,
     sort: [],
     select: [ '*' ],
     omit: [] },
  populates: 
   { zones: 
      { select: [ 'id', 'x', 'y' ],
        sort: [ { 'cachedWeather.name': 'ASC' } ],
        where: {},
        limit: 9007199254740991,
        skip: 0,
        omit: [] } },
  meta: { enableExperimentalDeepTargets: true } } 

...but the s3q is wrong:

image

(comes through as undefined)

@mikermcneil
Copy link
Member

@mdconaway ok that's all fixed I think

sails> debug: @mikermcneil is loafing around in the same zone as last time: ( 82, 59 )
sails> Province.find().omit(['createdAt']).populate('zones', {select: ['x','y', 'cachedWeather'], sort: 'cachedWeather.name ASC' }).meta({enableExperimentalDeepTargets:true}).log()Running with `.log()`...

undefined
sails> 
- - - - - - - - - - - - - - - - - - - - - - - -
Finished successfully.

Result:

[ { zones: 
     [ { id: '59d1bb303e160f68048de73b',
         x: 82,
         y: 53,
         cachedWeather: 
          { coord: { lon: -97.7, lat: 36.81 },
            weather: [ { id: 800, main: 'Clear', description: 'clear sky', icon: '01n' } ],
            base: 'stations',
            main: 
             { temp: 293.4,
               pressure: 1012,
               humidity: 60,
               temp_min: 292.15,
               temp_max: 294.15 },
            visibility: 16093,
            wind: { speed: 6.2, deg: 140, gust: 8.7 },
            clouds: { all: 1 },
            dt: 1506917700,
            sys: 
             { type: 1,
               id: 2220,
               message: 0.1672,
               country: 'US',
               sunrise: 1506947241,
               sunset: 1506989517 },
            id: 4542557,
            name: 'Medford',
            cod: 200 } },
       { id: '59d1bb303e160f68048de73e',
         x: 82,
         y: 56,
         cachedWeather: 
          { coord: { lon: -97.7, lat: 33.81 },
            weather: [ { id: 800, main: 'Clear', description: 'clear sky', icon: '01n' } ],
            base: 'stations',
            main: 
             { temp: 293.4,
               pressure: 1013,
               humidity: 68,
               temp_min: 292.15,
               temp_max: 295.15 },
            visibility: 16093,
            wind: { speed: 2.6, deg: 130 },
            clouds: { all: 1 },
            dt: 1506916500,
            sys: 
             { type: 1,
               id: 2565,
               message: 0.1682,
               country: 'US',
               sunrise: 1506947177,
               sunset: 1506989585 },
            id: 4714865,
            name: 'Nocona',
            cod: 200 } },
       { id: '59d1bb303e160f68048de741',
         x: 82,
         y: 59,
         cachedWeather: 
          { coord: { lon: -97.7, lat: 30.41 },
            weather: [ { id: 800, main: 'Clear', description: 'clear sky', icon: '01n' } ],
            base: 'stations',
            main: 
             { temp: 296.48,
               pressure: 1013,
               humidity: 78,
               temp_min: 296.15,
               temp_max: 297.15 },
            visibility: 16093,
            wind: { speed: 2.06, deg: 143 },
            clouds: { all: 1 },
            dt: 1506916800,
            sys: 
             { type: 1,
               id: 2557,
               message: 0.0038,
               country: 'US',
               sunrise: 1506947110,
               sunset: 1506989656 },
            id: 4740584,
            name: 'Wells Branch',
            cod: 200 } } ],
    updatedAt: 1506918788275,
    id: '59d1c0765ef90368e0886d88' } ]
- - - - - - - - - - - - - - - - - - - - - - - -

@mdconaway
Copy link
Author

Thanks for working on this! I'll try to test it out later today. If it's not in NPM and accessible through our NPM mirror I will have to run the test at home so I have access to GitHub.

Is this flag necessary to enable functionality?

{enableExperimentalDeepTargets:true}

@mdconaway
Copy link
Author

Hey @mikermcneil ,

Just got done running some tests here, and it looks like commit f0337a2 fits the bill for our typical use cases. Are you planning on leaving the

{
    enableExperimentalDeepTargets:true
}

meta flag around for a while? Or is it safe to assume the meta flag won't make it to main?

@mikermcneil
Copy link
Member

around for a while?

@mdconaway I imagine we'll turn the feature on by default eventually, I feel pretty good about what's there. Probably doesn't make sense to enable it by default until there is an adapter-level concept of whether an adapter supports this or not though (e.g. mysql should be able to say that it doesn't, and Postgresql and Mongo should be able to say that they do)

But yeah, while I'm not 100% comfortable documenting this and calling it "officially supported" yet, the only blockers IMO are the other missing accoutrements. (Also it'd be nice if this worked for select and omit too, but that could always be added later.)

@luislobo
Copy link
Contributor

Hi @mikermcneil I like this one, any info if it will stay as a meta option?

@adamjw3
Copy link

adamjw3 commented Nov 14, 2018

Get an error when trying to find in nested data in postman on an api created in sales

trying to do

{
"job.description": "u"
}

job is an object and description is a property of that object. How can this be done?

@vaghela943
Copy link

Deep Populate not work in Sails V1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants