[Zope-DB] Dynamically generate sql-query in ZSQL Method (and more...)

Wed Apr 7 18:16:25 EDT 2004

Charlie Clark wrote:
> thank you for some very good, constructive criticisms. I think that, by 
> induction, you're wrong - if ZSQL was really that bad it would have been 
> replaced by something better. 

In some ways I think it has, in the form of the ZODB.  Of course, this 
is the intention of a lot of people behind Zope, who feel an object 
database is a more natural fit for Zope.  When it comes to persistence 
in Zope, ZSQL certainly isn't the dominant technique.  (I have no idea 
how relatively popular ZSQL is in Zope applications, but I'm thinking of 
dominance in other terms)

> In your first e-mail you hit the nail on the 
> head - the combination of ZSQL, DTML and in my opinion to a lesser extent 
> the ZMI can be really nasty. But then again not as nasty as some of the 
> alternatives out there.
> 
> First of all: three cheers for Zope and ZSQL! Referring back to last week's 
> ZODBCDA thread: there are a lot of people coming to Zope and particularly 
> Plone with little programming experience. Zope opens up a whole new world 
> of possibilities for them. Not being a born programmer myself I am very 
> happy with the way Zope and Python encourage modular development with 
> frequent refactoring so much so that it has become natural for me. This 
> fits in well with the demands placed on the incredible variety of 
> applications that are springing up around the world all the time. But there 
> are certainly pitfalls! I just wonder if they are as deep with Zope as they 
> with competing technologies. PHP is my pet hate for a very similar reason: 
> it allows you to make a mess easily and I've yet to find a real saving 
> grace for it.

Well, by induction PHP must have a saving grace, because it is quite 
popular :)  I would say that a PHP installation is very easy to 
maintain, and the environment is reasonably reliable even against the 
most aggregiously bad programming you can throw at it.  (Applications 
aren't necessarily reliable, but the system remains so)

It also has a low barrier to entry, it being so concept-poor there's 
really very little to learn to get started.

> Three reasons to really like ZSQLs
> 1) you can test them on their own
> 2) they are modular so they encourage reuse kind of by definition
> 3) as objects in the ZMI they are flexible and can be moved and changed as 
> required.
> 
> This doesn't mean that your criticisms aren't valid so let's go through 
> them.
> 
> On 2004-04-07 at 01:12:12 [+0200], Ian Bicking wrote:
> 
>>On Apr 6, 2004, at 3:23 PM, Jim Penny wrote:
>>
>>>Ian has told you how to do it.  Now, I will tell you, don't do it.
>>
>>I would disagree with you -- not for some principled reason, but for a
>>very practical reason: Z SQL Methods suck.  A lot.  A huge, huge
>>amount.  They are horrible.  Generating your own SQL inline sucks.  A
>>lot.  It is horrible.  *Sometimes* the inline method is less horrible.
>>These are the unfortunate choices foisted upon us by the environment.
> 
> This is rhetoric! What is your preferred alternative? Dymanic code is 
> *always* going to present a problem in that is only debuggable at runtime.

My prefered alternative would be SQLObject (http://sqlobject.org).  But 
then I wrote it, and parents must love their children -- though 
SQLObject does deal with most of the issues I brought up about ZSQL (or 
avoids the issues in other ways).  The experience with that also makes 
me feel pretty comfortable with generating SQL, which is how this all 
started...

At the moment, though, I'm doing work in Zope, where SQLObject won't 
work too well (at least Zope 2), and I end up maintaining legacy code, 
where I get to suffer through other people's choices (some of which 
aren't the fault of the original authors, since the code preceded ZPT 
and such things).

>>Note also that well designed SQL generation in Python isn't horrible.
>>A Python script like:
>>
>>## select_all
>>## parameters: table_name
>># re isn't available in the default Zope installation; I'd actually
>>usually
>># do this with an extension method...
>>import re
>>assert re.search(r'^[a-zA-Z_][a-zA-Z0-9_]*$', table_name), "Invalid
>>table name: %s" % table_name
>>return container.raw_sql(sql='SELECT * FROM %s' % table_name)
> 
>  
> Thanx for the code!
> 
> 
>>That isn't so bad.  In fact, I would assert it is much better than the
>>Z SQL analog.
> 
> 
> No need to assert. This can be checked AFAIK as this is all open source.

By the Z SQL analog, I mean 'SELECT * FROM <dtml-var table_name>', which 
is potentially subject to SQL injection attacks, and protecting it would 
be quite annoying (maybe you'd have to create an assert_matches_regex 
script, and use <dtml-call> to check it).

>>There's several ways to write a query, depending on how many of the Z
>>SQL features you use (e.g., dtml-and, dtml-sqltest, dtml-sqlgroup,
>>etc).  They all look quite different, and some will feel very foreign
>>to someone who knows SQL but not Z SQL.
> 
> 
> Yes, but this in my opinion is down to the Zope Book not being updated. The 
> only DTML I have in my own ZSQLs is <dtml-if> </dtml-if>, <dtml-sqlvar> and 
> <dtml-var>. The rest is completely unnecessary cruft and should be 
> deprecated and buried quickly and quietly although some might want to 
> preserve <dtml-sqldelimiter>. I would contend that <dtml-sqlvar> is a 
> life-saver for many out there and I would personally like to thank whoever 
> came up with it for making me aware of the need to check what is being 
> inserted.
> 
> However, things could be improved. The ZSQL interface is clumsy and 
> unfinished and if we're going to have type checking it should be when 
> variables are declared. A suggestion: parametes should no longer be entered 
> in a single text box but must be declared individually along with there 
> type. Type and other checking could then occur in the background improving 
> legibility and, hopefully, safety.
> 
> ie.
> ## name, string
> ## age, int
> ## dob, date
> 
> INSERT INTO person (?, ?, ?) VALUES (name, age, dob)
> 
> or something similar-
>  
> 
>>It uses DTML, or DTML-like constructs.  Most of the things that are bad
>>about DTML for templates are also bad for DTML generating SQL.  You
>>could write entire books about why DTML is bad.
> 
> 
>>It usually looks like normal SQL, but is far enough away from SQL to be
>>difficult to work with in source form, since it uses <dtml-sqlvar>
>>instead of like parameters like ? or even %s.  It's usually a long way
>>from Z SQL source to something you can feed to EXPLAIN.
> 
> 
> Indeed. But I wouldn't limit this to DTML. I'd throw in CFM, PHP, ASP, JSP, 
> AppleScript and SQL itself for that matter. Going beyond placeholders and 
> even the very simplest of control structures is going to cause problems in 
> the end. SQL is my opinion only good for declarative work and even then is 
> inconsistent enough to be the subject of many books, though I'd cite Fabian 
> Pascal as one of the more salient critics.

I would agree that ZSQL is a questionable place for control structures; 
but then, ZSQL isn't easy to use inside other non-ZSQL control 
structures (e.g., inside a Python Script).  But more on that later.

Also, I would prefer if ZSQL was more like Page Templates, where the 
form of the markup was in line with the form of the source language. 
(Of course, SQL markup is much different than HTML, so the result would 
look different)

>>It is difficult to debug.  There's one largely hidden feature in being
>>able to pass src__ in as a keyword argument to see what the SQL source
>>actually is.  That's about all the debugging you get.  AFAIK, there's
>>not good log of the SQL queries sent, how long they took, etc.
> 
> 
> This is usually down to how the stuff has been coded. If there are 
> <dtml-tests> and <dtml-groups> in there you're right, but otherwise I find 
> ZSQL eminiently maintainable.

How do I see what queries are being run from where?  How do I see how 
long they took?  What transaction those queries belonged to?  These are 
things I'm very interested in, but I don't know how to get at them from 
ZSQL.

>>Connection paths are pretty much hard-coded.  You can't tell a method
>>to use a different connection.  This would be useful if you wanted to
>>use database-level security, and so you need two separate connections
>>with separate users.  You might be able to do this with acquisition,
>>but whole books could be written on why acquisition is bad.  Actually,
>>it would probably be a chapter in the DTML book.
> 
> 
> I don't think you mean what you say here. I think you mean that you can't 
> change the connection at runtime; you certainly can change the connection 
> in a ZSQL. Acquisition isn't so much as bad by default but by use through 
> opacity. The resulting problems are one of the reasons why it's not in Zope 
> 3 if I understand things correctly.

Yes, can't change it at runtime, at least without using Acquisition. 
And whenever something requires Acquisition, somewhere inside I rewrite 
that to mean it's impossible ;)  (Mostly because of negative experiences 
when I have tried to use Acquisition).

>>It has crufty parameters.  Put "a, b, c" in , and you get parameters
>>with names like "a,", "b," and "c".  Default parsing seems wonky too.
>>
>>It doesn't signal errors properly -- if you forget to pass in a keyword
>>argument, it's pretty much ignored.  Or if you pass in extra arguments.
>>  Why have a parameter list at all?
> 
> 
> This is not my experience or at least not with INSERTs. I think you may be 
> referring to SELECT something FROM a RELATION WHERE a_critermium IS 'met'
> The default behaviour here is similar to that in SQL anyway.

Perhaps this is my misperception from using code that used defaults for 
parameters excessively.

>>It doesn't deal with failures very well.  Why not give me the SQL in
>>the exception when there's an error?  It doesn't deal with any of the
>>hard parts of failure in general, connection failures, concurrency
>>conflicts, etc.
> 
> Surely this is the DA that does this? Most of the ones that I've worked 
> with do? mxODBCZopeDA and Zpsycopg certainly do though some colour 
> highlighting might be nice.

Now that I think about it, yes, the SQL is in there.  I know I've often 
been frustrated by the error reports, but now I can't remember exactly 
what my problems have been.

>>You can't nest or factor your methods well.  This is largely DTML's
>>fault, but Z SQL makes it worse since all methods are concrete.  You
>>can't save values, go through control structures, etc., except using
>>plain DTML constructs which have their own issues.  For instance,
>>consider turning a list of items into SQL for use with IN, like
>>(<dtml-in items prefix=loop><dtml-unless
>>loop_start>,</dtml-unless><dtml-sqlvar loop_item type=int></dtml-in>).
>>That's lame, *and* you can't abstract out the lameness.  Well, maybe
>>with the use of src__ and other stuff, but that's it's own lameness.
>>(Hmm... I see now there's an option to dtml-sqlvar to handle this
>>particular case... so many options, but I can't use any of them from
>>the outside, nest dtml expressions, etc)
> 
> 
> sorry, can't see the real application of this. I think that ZSQL is 
> probably next on the list to be stripped down now that we have 
> PythonScripts and ZPT. But seeing as assuming you use it carefully you 
> shouldn't have too many problems. Using it carefully means do not use DTML 
> for control structures. Call a PythonScript to do the work.

You can't do it in a Python Script, because in this example you have a 
loop and a <dtml-sqlvar> inside it, and there's no way to get something 
like <dtml-sqlvar> from inside a Python script.  Well, you could define 
ZSQL methods like:

# method "sqlvar_string":
<dtml-sqlvar input type=string>
# method "sqlvar_int":
<dtml-sqlvar input type=int>

Then use them like:

lst = []
for x in list_of_stuff:
     lst.append(container.sqlvar_string(input=x, src__=1))
text = '(%s)' % ', '.join(lst)

Obviously doing such a thing would be a monstrosity.  But if you want to 
programmatically create SQL -- which is very valid desire! -- you'd have 
to do such things.  Or do it using DTML constructs.  (Some people would 
say it's an invalid desire, that this kind of SQL generation qualifies 
as self-modifying code, but obviously I disagree ;)

>>Oh, on the subject, every issue is dealt with using another dtml-sqlvar
>>option or another tag, there's not a good set of orthogonal functions.
>>
>>SQL Brains are cool, if a still little under powered.  But they don't
>>see updates unless you go twiddle all sorts of forms.
>>
>>You can't set any attributes on result rows.  Even if you know why it
>>might not be a good idea.  Even if you try really hard.  (Though you
>>can add a dummy column to your query, and then reassign the column)
>>This is annoying particularly when using brains.
> 
> 
> You have the results.names() call to any results set.

But then I lose my precious brains!  That's the only thing that makes me 
happy about ZSQL.

>>Caching is stupid.  You can cache, but you can't invalidate the cache.
>>There's a product out there that provides a cacheable SQL method, but
>>it's a whole separate product and item type, and you can't easily go
>>back and forth, or even upgrade.
> 
> 
> You can always make a call to the global cache. But you're right being able 
> to flush individuals caches would be useful and doable.

Are caches actually per-thread?  If they were, then it seems like cache 
invalidation would be a bit more challenging.

It would be great if you could indicate that the execution of particular 
methods would invalidate the caches of other methods (e.g., an update 
invalidates some related selects).

>>All methods return result objects with multiple rows.  It's common to
>>see:
>>
>>   <dtml-in "get_some_object(id=whatever)"><dtml-var
>>some_column>...</dtml-in>
>>
>>When in fact you only expect their to be one row in the result.  This
>>is where DTML's suckiness gets involved (dtml-in/dtml-with, all the
>>flattened namespaces, etc).
> 
> 
> DTML apart - please use ZPT or PythonScripts - all Python DB calls return a 
> list of tuples. What really annoys me is when it's an empty list in which 
> case
> results[0].attribute generates an index error. This has to be encapsulated 
> not in <tal:conditon here/myQuery tal:content="python 
> here.myQuery[0].attribute"/> but by checking the length of the list!!!

I consider the need to do result[0] to be a leaky abstraction.  I want 
function calls to return objects, but ZSQL methods return result sets. 
Which are objects, but they aren't the object I would naturally return 
-- so the *caller* must know that the target is a ZSQL method, and work 
around that (by doing [0] on the result).  Ditto keyword arguments -- 
positional arguments might be the intuitive way to pass a parameter, but 
if the target is a ZSQL method then you can't do that.  This doesn't 
just apply to DTML, the same problems occur when you call ZSQL from a 
Python script.

This particular example is why I wrap all ZSQL methods in Python 
scripts.  Then the script can catch the IndexError, and maybe return 
None, or raise a more useful error (like, "object with ID whatever 
wasn't found"), whichever makes more sense.

>>You start needing all sorts of methods for little things.  Like, say
>>you want to fetch a value from a sequence, like "SELECT
>>nextval('sequence_name') as seq".  Do you create a new method for this?
> 
> Of course: one method for one act. Call this through a (possibly generic) 
> script which does the necessary work. Avoid DMTL.
> 
> 
>>  What an annoying process!  (ZMI partially at fault)  And you get
>>lameness like <dtml-in get_sequence_name><dtml-var seq></dtml-in>  Oh,
>>it hurts me just to type this!  Or container.get_sequence_name()[0][0].
>>  This leads to code being highly coupled.  You want to put a Python
>>Script in the middle for some reason?  Now you have to return something
>>like [[id]].  Or [{'seq': id'}]?  Only one way to find out... oh, the
>>coupled pain!  (This is why I keep a firm Python Script barrier between
>>my Z SQL methods and my "good code" -- I don't want Z SQL cruftiness to
>>infect everything else with it's lameness)
> 
> 
> Again, the problem doesn't really seem to be ZSQL but rather DTML.
>  
> 
>>No positional parameters.  Seems like an okay idea, except for all the
>>methods that are fetch-row-with-this-id, where positional is better.
>>(Something else for the Python Script barrier)
> 
> 
> Explicit is always better but maybe you have a particular example to show 
> what you mean? Tuple unpacking perhaps?

The most common case is a ZSQL method where you are returning a single 
row based on the primary key.  It is more natural to pass the primary 
key in positionally, instead of with a keyword.  Certainly this is what 
I would expect when calling a Python script, and I don't want to have to 
know what type of object I'm calling.  That adds coupling to the code.

>>SQL methods are often tightly bound to specific scripts, templates,
>>etc.  But it's hard to express this, and they are just more cruft in
>>the ZMI.  If something is tightly coupled, I want to know it, I want to
>>know that changes effect only one place, or that I shouldn't use it
>>from the outside.  At least inline SQL gives you this.
> 
> 
> I agree with you on this. One of the drawbacks with it being so easy to 
> change things in Zope is that you often forget one method or another. 
> Adding dependency management to the ZMI would be a real boon, I think.

A more general Zope critique which also applies here: there's no 
distinction between *very* public (i.e., public web page), internally 
public (an "API"), and private.

>>dtml-sqlvar doesn't have any good types.  type=date please?  Some
>>attempt at making database more compatible would be nice.  Strings and
>>floats are easy.  Solving difficult problems is actually useful, just
>>solving easy problems is a programming cop-out.
> 
> 
> Is type checking in the programming language actually necessary, I wonder? 
> Surely the database will raise a TypeError if appropriate. In addition it 
> is the work of individual DAs do to type mapping so maybe type-checking 
> should be removed from ZSQL altogther and ZSQL / ZopeDAs should just go 
> about preventing SQL injections?

I'd like it to deal with whatever database compatibility that is 
necessary.  I've had problems in the past when passing in DateTime (or 
mxDateTime?) objects and turning them into strings, as they may not be 
formatted properly for the database.  Like strings, the DA should know 
the appropriate way to encode dates.

>>Doesn't do anything with database introspection.  Again, I know it's
>>hard, but hard is useful.  And it's not that hard.
> 
> What, in particular, are you looking for?

What databases exist, what tables exist, what columns are in those 
tables, things like that.  In a database-neutral way.  This would be an 
important component of a decent test pane.  The code to do this is 
pretty minimal too -- it's like ten or fifteen lines per database in 
SQLObject (but each database is very different).

>>The test pane is really lame.  I won't go into all of why it's lame.
>>It could be useful, but it's only distracting.  (I might *think* I can
>>just test a quick query in it, but I'm always wrong)
> 
> 
> I think you're in a minority here. It needs fixing.

You mean majority, yes?

>>I think that's it.  I'd say it felt good to vent, except I'm going to
>>have to work with Z SQL methods again tomorrow, and venting doesn't
>>make up for that.
> 
> 
> Oh, come on! Think positive: you might have to be fixing an even more 
> tightly coupled PHP system including migration from one RDBMS to another 
> with the prospect of literally hundreds of changes to be made and really 
> pathetic error reporting.
> 
> And now the call to arms: if ZSQL is stuck then we can fix it. That is 
> after all the beauty of Open Source, isn't it? It's possible that within 
> Zope Corporation they have enhanced ZSQLs that fix all their woes but they 
> haven't chosen to release them. I've already discussed some of these issues 
> with Marc-André in respect to the mxODBCZopeDA which would allow for 
> "enhanced" ZSQLs to allow quoting to be done by the driver. However, any 
> such product would out of necessary be tied to our DA, with the sincere 
> hope that others would follow suit.
> 
> As has been said several times the whole Zope/RDBMS part needs reworking 
> and we can do this if we want to. It's unlikely to happen otherwise.
> 
> I would like to tentatively suggest a BOF (birds of a feather) on this at 
> EuroPython assuming others interested are also intending to attend. I'm not 
> sure whether further, detailed discussion should occur on this list but I 
> suspect that there are not yet enough conspirators to warrant setting up 
> another list. Does anyone know what the state of Zope 3 is in this respect?

Well, the application Alan Runyan has been developing on Zope 3 (I don't 
actually know the name of it) is using SQLObject, which they are 
deploying soon (or perhaps they've partially deployed it already).  I 
think they've been pretty happy with SQLObject, and I'm open to 
continuing changes to make it work better in Zope 3.

I would actually be ecstatic if SQLObject could also work in Zope 2, but 
I'm not sure what that would involve.  There's the whole Python version 
issue -- SQLObject requires new-style classes, which was a problem when 
I was first developing SQLObject, but perhaps is improved now with Zope 
2.7.  I assume it would take significant work even in 2.7, to deal with 
the security issues and other Zope-isms, many of which aren't a problem 
when using SQLObject in Zope 3.  Really I don't know Zope well enough to 
even know where to start, or how feasible it is.

SQLObject is also significantly different from ZSQL, in that it's more 
of an object-relational mapper, where ZSQL is very relational.  Which I 
like, but which wouldn't suit everyone.

   Ian