[Modeling-users] Database Connections

Discussion:

Duncan McGreggor

2004-08-04 17:14:16 UTC

Hey Folks,

We're building an application that uses Modeling, and a couple
questions came up in a meeting today. The biggest concern was about
sessions (what they called database pooling or connection pooling). I
did have a look at EditorContextSessioning, and let them know about it,
and it looks like it will suite our needs. However, I wanted to get
some explicit feedback, just to be sure ;-)

Application Activity: there's going to be lots of querying, searching,
and updating (the last to a lesser degree).
Application Framework: Apache + mod_python with the mod_python Session
object for session management

What they meant by pooling is not what Oracle developers usually mean;
they want to be sure that for every HTTP Session, and all of it's HTTP
requests, there is a single connection to the database that is reused
for the entire length of the session lifetime.

Is that, in a nutshell, what we get when we use EditorContextSessioning?

Thanks!

Duncan

Sebastien Bigaret

2004-08-04 17:48:04 UTC

Permalink

Hi Duncan,

Before answering and since I've never used the mod_python and any of
its session management tools, could you please tell:

- which session management module you intend to use?
(http://www.modpython.org/FAQ/faqw.py?req=all#3.8)

- how does it work? I mean, from a process or thread perspective, can
you be more explicit?

In short, the answer depends on the answer of those questions; the
EditorContextSessioning component only provides an easy way to register
and retrieve the EC based on a session id. The number of connection to
the database is independent to this component, and is usually 1 (one)
per process (I really mean process here, not threads, a DB connection is
shared by threads within a single process). That's why I asked about the
way mod_python + the session management module you'll use handle
sessions wrt processes.

-- Sébastien.

Post by Duncan McGreggor
Hey Folks,
We're building an application that uses Modeling, and a couple questions came
up in a meeting today. The biggest concern was about sessions (what they
called database pooling or connection pooling). I did have a look at
EditorContextSessioning, and let them know about it, and it looks like it will
suite our needs. However, I wanted to get some explicit feedback, just to be
sure ;-)
Application Activity: there's going to be lots of querying, searching, and
updating (the last to a lesser degree).
Application Framework: Apache + mod_python with the mod_python Session object
for session management
What they meant by pooling is not what Oracle developers usually mean; they
want to be sure that for every HTTP Session, and all of it's HTTP requests,
there is a single connection to the database that is reused for the entire
length of the session lifetime.
Is that, in a nutshell, what we get when we use EditorContextSessioning?
Thanks!
Duncan

Sebastien Bigaret

2004-08-04 20:55:32 UTC

Permalink

And BTW, I suspect that if you've already given a try to the
ECSessioning module, or thought aboyut using it, then all the sessions
probably lie in the same process --or it would have been useless.

So, now, I see the moment coming where you'll ask whether it's
possible to have a dedicated db-connection per EC; is that it?

Post by Duncan McGreggor
What they meant by pooling is not what Oracle developers usually mean;
they want to be sure that for every HTTP Session, and all of it's HTTP
requests, there is a single connection to the database that is reused
for the entire length of the session lifetime.

In other words, does it mean that "for each session's EC there exists
one and only one connection to the db"??

If this is the case, then we'll need an other ObjectStoreCoordinator,
assigning a dedicated DBContext to each EC... Hmmm, just thinking loudly
here, okay, I'll wait til you confirm the need for such a feature.

-- Sébastien.

Post by Duncan McGreggor
Hi Duncan,
Before answering and since I've never used the mod_python and any of
- which session management module you intend to use?
(http://www.modpython.org/FAQ/faqw.py?req=all#3.8)
- how does it work? I mean, from a process or thread perspective, can
you be more explicit?
In short, the answer depends on the answer of those questions; the
EditorContextSessioning component only provides an easy way to register
and retrieve the EC based on a session id. The number of connection to
the database is independent to this component, and is usually 1 (one)
per process (I really mean process here, not threads, a DB connection is
shared by threads within a single process). That's why I asked about the
way mod_python + the session management module you'll use handle
sessions wrt processes.
-- Sébastien.

Duncan McGreggor

2004-08-06 20:51:21 UTC

Permalink

Post by Sebastien Bigaret
And BTW, I suspect that if you've already given a try to the
ECSessioning module, or thought aboyut using it, then all the sessions
probably lie in the same process --or it would have been useless.
So, now, I see the moment coming where you'll ask whether it's
possible to have a dedicated db-connection per EC; is that it?

Yes, that was the direction I was heading with my questions. However,
this may not truly be the business need. More details:

* There are probably never (famous last words) going to be more than 20
simultaneous
* When a user logs in, a connection to the database will be opened for
them (via an editing context)
* This connection need to be kept open for this user until their
session expires in order to minimize the overhead of stale open
database connections on the database server

Does/can the EC keep an open connection to the database?

Post by Sebastien Bigaret

In other words, does it mean that "for each session's EC there exists
one and only one connection to the db"??

Yes. That makes sense, though, right? I mean, if there is a web app
with which a user could potentially make hundreds of queries to the
database, we'd want to keep their connection open instead of pounding
the db with new connections, right?

Post by Sebastien Bigaret
If this is the case, then we'll need an other ObjectStoreCoordinator,
assigning a dedicated DBContext to each EC... Hmmm, just thinking loudly
here, okay, I'll wait til you confirm the need for such a feature.

How much work is entailed in this? What would the other OSC do?

Duncan McGreggor

2004-08-06 21:05:02 UTC

Permalink

Post by Duncan McGreggor
* There are probably never (famous last words) going to be more than
20 simultaneous

I meant 20 sim. users ;-)

Sebastien Bigaret

2004-08-07 13:35:02 UTC

Permalink

Duncan McGreggor <***@adytumsolutions.com> wrote:
[snip]

Yes, that was the direction I was heading with my questions. However, this
* There are probably never (famous last words) going to be more than 20
simultaneous
* When a user logs in, a connection to the database will be opened for them
(via an editing context)
* This connection need to be kept open for this user until their session
expires in order to minimize the overhead of stale open database connections
on the database server
Does/can the EC keep an open connection to the database?

Fine. The framework keeps it opened by default --see documentation for
MDL_TRANSIENT_DB_CONNECTION at
http://modeling.sourceforge.net/UserGuide/env-vars-core.html
(the pdf is still wrong, I updated that section yesterday)

Post by Sebastien Bigaret

In other words, does it mean that "for each session's EC there exists
one and only one connection to the db"??

Yes. That makes sense, though, right? I mean, if there is a web app with
which a user could potentially make hundreds of queries to the database,
we'd want to keep their connection open instead of pounding the db with new
connections, right?

Yes --but the need to keep a database connection opened, or the need to
have an opened database connection per user/session is quite different.

How much work is entailed in this? What would the other OSC do?

It would assign specific elements of the Database (DBContext, DBChannel)
and Adaptor (AdaptorContext, AdaprotChannel) layers per EC. At least,
that's what I initially thought. There may be another way to ensure
different fetchs could run concurrently --but if you can accept that any
fecth waits for the running one to finish, it's probably not worth the
effort, or am I still missing something?

-- Sébastien.

Duncan McGreggor

2004-08-09 01:37:09 UTC

Permalink

Post by Sebastien Bigaret

Post by Duncan McGreggor
Does/can the EC keep an open connection to the database?

Awesome.

Post by Sebastien Bigaret

Post by Duncan McGreggor

Post by Sebastien Bigaret
In other words, does it mean that "for each session's EC there exists
one and only one connection to the db"??

Yes. That makes sense, though, right? I mean, if there is a web app with
which a user could potentially make hundreds of queries to the database,
we'd want to keep their connection open instead of pounding the db with new
connections, right?

Yes --but the need to keep a database connection opened, or the need to
have an opened database connection per user/session is quite different.

Since the basic requirements were to have minimal connects to the
database server, the default behaviour of keeping the connection open
is just fine.

However, what happens when I use EditingContextSessioning? It seems to
create an EC keyed off a session id. Will that keep one complete
editing context per user session (and thus one opened database
connection per session)? or am I missing something?

Post by Sebastien Bigaret

Post by Duncan McGreggor

How much work is entailed in this? What would the other OSC do?

It would assign specific elements of the Database (DBContext,
DBChannel)
and Adaptor (AdaptorContext, AdaprotChannel) layers per EC. At least,
that's what I initially thought. There may be another way to ensure
different fetchs could run concurrently --but if you can accept that any
fecth waits for the running one to finish, it's probably not worth the
effort, or am I still missing something?

There is no business need or user demand for this app that will require
super-speeds and concurrent queries per session, so I think it's fine
without going through this trouble... yet ;-)

Thanks for your help! I'm sure this will be a good one for the
archives, too... lots of session-driven apps out there that can benefit
from Modeling...

d

Sebastien Bigaret

2004-08-09 13:55:07 UTC

Permalink

Since the basic requirements were to have minimal connects to the database
server, the default behaviour of keeping the connection open is just fine.
However, what happens when I use EditingContextSessioning? It seems to create
an EC keyed off a session id. Will that keep one complete editing context per
user session (and thus one opened database connection per session)? or am I
missing something?

By default ECSessioning creates an EC per session id, right --but then,
each EC has the same parent ObjectStore, an ObjectStoreCoordinator by
default, which in turn forwards the request (e.g. a fetch) to the
appropriate elements underneath it.

In a word: you get as much ECs as you want, but only one DB-connection
is opened (and left opened).

Post by Sebastien Bigaret

Post by Duncan McGreggor

How much work is entailed in this? What would the other OSC do?

There is no business need or user demand for this app that will require
super-speeds and concurrent queries per session, so I think it's fine without
going through this trouble... yet ;-)

Fine :) but that's an idea we should keep in mind. That will probably
be requested in the future.

Oh and BTW, Soif hasn't made the remark yet ;) but there is a
potential problem with caching, since there is no way for now to
"forget" an object and its corresponding snapshot --except by deleting
all ECs that reference it. Keep that in mind, I'll get back on the
subject before the next release I guess.

-- Sébastien.

Duncan McGreggor

2004-08-04 23:49:02 UTC

Permalink

Post by Sebastien Bigaret
Hi Duncan,
Before answering and since I've never used the mod_python and any of
- which session management module you intend to use?
(http://www.modpython.org/FAQ/faqw.py?req=all#3.8)

You bet, and sorry about leaving out that detail. We are using
mod_python 3.1, and using the built-in session management. Here are the
links:
http://www.modpython.org/live/current/doc-html/pyapi-sess.html
http://www.modpython.org/live/current/doc-html/pyapi-sess-classes.html

Post by Sebastien Bigaret
- how does it work? I mean, from a process or thread perspective, can
you be more explicit?

Sure; here's a description of how the python interpreter and something
called "subinterpreters" work in mod_python:
http://www.modpython.org/live/current/doc-html/pyapi-interps.html

I'm no expert on the internals of mod_python, but here's a quick
run-down of a typical series of steps apache + a simple mod_python
setup go through during the lifetime of a request:
http://www.modpython.org/live/current/doc-html/tut-what-it-do.html

We've been running a default mod_python apache configuration: 1 main
process, and multiple child processes are spawned as needed. I don't
and have never programmed in threads and don't really understand it, so
you'll have to forgive my terminology, if incorrect...

I guess we could setup mod_python to run a single interpreter
(http://www.modpython.org/live/current/doc-html/dir-other-pi.html), I
would want to do that only if we had to though... I think we would
loose many of the performance gains our client sought by going with
mod_python in the first place...

Post by Sebastien Bigaret
In short, the answer depends on the answer of those questions; the
EditorContextSessioning component only provides an easy way to register
and retrieve the EC based on a session id.

Ah... this is good information.

Post by Sebastien Bigaret
The number of connection to
the database is independent to this component, and is usually 1 (one)
per process (I really mean process here, not threads, a DB connection is
shared by threads within a single process). That's why I asked about the
way mod_python + the session management module you'll use handle
sessions wrt processes.

You're out of my depth here, but I can dig some more if the above info
doesn't answer these questions...

Thanks!

Duncan

s***@larsen-b.com

2004-08-06 15:24:01 UTC

Permalink

Take caution to one thing:
Modeling heavly cache objects. So to be sure to have it working you have 2
options:
- use a single python interpeter (using the ad-hoc apache2 config)
- or disable Editing Context caches ..

In this too approach the main drawnback is performance. I think the best
way to solve this is to use something that give you a finer gain. I think
quixote for example can fix that. I mean: You have a 1 process that fork
and re-use process. By this way you can have Modeling caches (perhaps
need extra lock to avoid crash) and don't bother w/ the forking needed
at every request. (import Modeling is time consumming ..)

Look at http://www.larsen-b.com/Article/130.html for little deeper explain of
the problem.

I haven't do this kind of stuff for a while.. but i think Modeling need
something like Durus do .. a server process.. :)

Bye Bye

Sebastien Bigaret

2004-08-06 17:45:10 UTC

Permalink

Post by s***@larsen-b.com
Modeling heavly cache objects. So to be sure to have it working you have 2
- use a single python interpeter (using the ad-hoc apache2 config)
- or disable Editing Context caches ..
In this too approach the main drawnback is performance. I think the best
way to solve this is to use something that give you a finer gain. I think
quixote for example can fix that. I mean: You have a 1 process that fork
and re-use process. By this way you can have Modeling caches (perhaps
need extra lock to avoid crash) and don't bother w/ the forking needed
at every request. (import Modeling is time consumming ..)

I'm not sure this is exact --I've read your post there, and I do not
agree w/ your analysis: I can be wrong of course since my experience w/
mod_python is really thin, but to my understanding there is a global
namespace available w/ mod-python, the global namespace that exists
within the subinterpreter dedicated to an application.

And since the sub-interpreter is created once and never destroyed
(unless apache is restarted :) time needed for imports should not be
a problem.

[Note: I have only experimented w/ mod_python 3.1.3 for Apache 2.0,
maybe it's different from v2.7.10 for Apache 1.x?]

Post by s***@larsen-b.com
Look at http://www.larsen-b.com/Article/130.html for little deeper explain of
the problem.
I haven't do this kind of stuff for a while.. but i think Modeling need
something like Durus do .. a server process.. :)

Interesting indeed, I'll have a look at that. Do you have experience
with this? How do they serialize objects that are transmitted between
the server and its client?

-- Sébastien.

s***@larsen-b.com

2004-08-09 22:49:33 UTC

Permalink

On Friday 06 August 2004 21:44, Sebastien Bigaret wrote:

[snip]

First .. i'm very busy w/ computer off, right now so ..

Anyways

Post by Sebastien Bigaret
I'm not sure this is exact --I've read your post there, and I do not
agree w/ your analysis: I can be wrong of course since my experience w/
mod_python is really thin, but to my understanding there is a global
namespace available w/ mod-python, the global namespace that exists
within the subinterpreter dedicated to an application.

First, i won't talk about apache1 but take care that apache2 isn't really
stable. I use it in production right now but i disabled the thread support.
I get to much trouble w/ it. They is 3 kind of process handling in apache2
fork / thread / mixed (i don't recall it but i hope it's true ..:)

I won't go deeper in explain but cut/paste the answer i got about MPServlet
(which use mod_python + apache2) from Daniel Popowich:

"""
Your questions are related to each other and are frequently asked on
the mod_python discussion list. The problem with maintaining global
data with mod_python (and servlets doesn't solve this issue) is
tightly related to how apache maintains processes and threads. If you
are on linux or other unix-like OS, then apache uses the prefork MPM
by default. This is a multi-process MPM where each request is handled
by a separate process, so you end up with N python interpreters, where
N is the number of processes apache has forked. In such an
environment, the only way to have global data is through some external
storage: sessions, db, shared memory, etc.

For unix, there is the worker MPM which is a hybrid
multi-process/multi-threaded server. With worker you can control
apache such that you have ONE process with multiple threads. I know
one person who is using servlets in such a manner. (This is the
default on windows with the winnt MPM.) In such an environment you
can have global data shared across requests, but of course, you'll
have to wrap your data in mutexes of some sort to avoid threads
stepping on each other.

If the data you want shared across requests is tied to a particular
user, then sessions are an excellent solution. Light-weight and
pretty fast. Servlets has builtin support for sessions; see the api
doc or the tutorial.
"""

In short, using if you're using the fork behaviour, you can't share
data. So in modeling => time consuming import .. and multiple
(with no way to reduce without restart ).. interpreter .. not talking
about the trouble you can get if you have some multiple virtual host
etc etc ..

If you want to share data, you need to use the multi-thread, but
you have to play w/ python thread/mutexes. And everybody around
know I don't want to play w/ thread. In modeling, i don't know if
you can do something like this. I mean, share a 'global' editing
context, and garantues that it's thread safe. (without of course
ending w/ a lock/unlock around each http request). I use SQObject
for a couple of things right now, and I get exactly this kind of
troubles. For example SQLite or MySQL DB Api aren't thread safe.
(I get weird issues about lost connection in MySQL) ..

So that's the dilemma (the post in my blog, is about all kind
of web frameworks, but here is example) If you don't want to
share objects, ok but you will loose time because you can use
some cache behaviours ..
I you want to share, it's ok, but you need to play w/ generator
(àla Twisted) or to play w/ the mutexes. And i didn't find a way to
do it the nice way, because most of python modules doesn't
scale very well in threaded env.

In my point of view, quixote offer a good way to handle this.
Because it use the same prefork / reuse process that apache
does. So you don't have to deal w/ threadObuggy modules, and
you can really handle multiples connections without any other
artifact, and without loosing time in big lock/unlock.

(I know Zope can handle multiple cnx without too much pain
by threading. But Zope's threads aren't really used to do
multiple task at the same time. It's just a way to accept http
request without locking ..)

Post by Sebastien Bigaret

Post by s***@larsen-b.com
Look at http://www.larsen-b.com/Article/130.html for little deeper
explain of the problem.
I haven't do this kind of stuff for a while.. but i think Modeling need
something like Durus do .. a server process.. :)

Interesting indeed, I'll have a look at that. Do you have experience
with this? How do they serialize objects that are transmitted between
the server and its client?

I played w/ it in a recent project, it's more simple than ZODB + ZEO ..
but i don't know how it works.

While finishing this mail, i remember that's a modeling mailing list, and
need to apologize to other that don't want to hear about http handling

Hope this helps. / I know i need to go to english course sooooonnnn :)

Sebastien Bigaret

2004-08-06 17:34:07 UTC

Permalink

Duncan McGreggor <***@adytumsolutions.com> wrote:
[snipped]

We've been running a default mod_python apache configuration: 1 main process,
and multiple child processes are spawned as needed. I don't and have never
programmed in threads and don't really understand it, so you'll have to
forgive my terminology, if incorrect...
I guess we could setup mod_python to run a single interpreter
(http://www.modpython.org/live/current/doc-html/dir-other-pi.html), I would
want to do that only if we had to though... I think we would loose many of the
performance gains our client sought by going with mod_python in the first
place...

I do not think so, because requests can be handled concurrently within a
single interpreter. And that's what I believe it is done --quoted from
http://www.modpython.org/live/current/doc-html/pyapi-interps.html:

Default behaviour is to name interpreters using the Apache virtual
server name (ServerName directive). This means that all scripts in
the same virtual server execute in the same subinterpreter, but
scripts in different virtual servers execute in different
subinterpreters with completely separate namespaces

I've not played a lot with mod_python (just a few hours, in fact), but
AFAIK this means that a given application will always be served by the
same interpreter. And in such a configuration:

- the ECSessioning works,
- so each session will get its own EC,
- you'll basically get one connection to the database for all.

Now, this can be fine... or not, depending on the type of application
you're currently building (I mean, its characteristics, esp. if you can
expect heavy load).

For example, say that some of fetches can take some time --can you wait
for a fetch to be finished before serving others? If your app. does not
fetch a lot, maybe it's not a pb, but if most of its requests end up
with one or more ec.fetch() then the default behaviour will probably not
be suitable.

And again, could you be more explicit whether in your first mail you
meant that "for each session's EC there exists one and only one
connection to the db"??

-- Sébastien.

Duncan McGreggor

2004-08-06 21:03:02 UTC

Permalink

Post by Sebastien Bigaret
Now, this can be fine... or not, depending on the type of application
you're currently building (I mean, its characteristics, esp. if you can
expect heavy load).

We're expecting very light load.

Post by Sebastien Bigaret
For example, say that some of fetches can take some time --can you wait
for a fetch to be finished before serving others?

We are expecting that there will be some delays perceived by users
during regular use as you describe here. This is okay :-)

Post by Sebastien Bigaret
If your app. does not
fetch a lot, maybe it's not a pb, but if most of its requests end up
with one or more ec.fetch() then the default behaviour will probably not
be suitable.
And again, could you be more explicit whether in your first mail you
meant that "for each session's EC there exists one and only one
connection to the db"??

Sorry about responding so late; did my last email make this more clear,
or should I send along more details/better explanation?