Keep old design from Skytools 2
-
Worker process connects to only 2 databases, there is no everybody-to-everybody communication going on.
-
Worker process only pulls data from queue.
-
No pushing with LISTEN/NOTIFY is used for data transport.
-
Administrative work happens in separate process.
-
Can go down anytime, without affecting anything else.
-
-
Relaxed attitude about tables
-
Tables can be added/removed any time.
-
Inital data sync happens table-by-table, no attempt is made to keep consistent picture between tables during initial copy.
-
New features in Skytools 3
-
Cascading is implemented as generic layer on top of PgQ - Cascaded PgQ.
-
Its goal is to keep identical copy of queue contents in several nodes.
-
Not replication-specific - can be used for any queue.
-
Advanced admin operations: switchover, failover, change-provider, pause/resume.
-
For terminology and technical details see here: set.notes.html.
-
-
New Londiste features:
-
Parallel copy - during inital sync several tables can be copied at the same time. In 2.x the copy already happened in separate process, making it parallel was just a matter of tuning launching/syncing logic.
-
EXECUTE command, to run random SQL script on all nodes. The script is executed in single TX on root, and insterted as event into queue in the same TX. The goal is to emulate DDL AFTER TRIGGER that way. Londiste itself does no locking and no coordination between nodes. The assumption is that the DDL commands itself do enough locking. If more locking is needed is can be added to script.
-
Automatic table or sequence creation by importing the structure from provider node. Activeted with --create switch for add-table, add-seq. By default everything is copied, including Londiste own triggers. The basic idea is that the triggers may be customized and that way we avoid the need to keep track of trigger customizations.
-
Ability to merge replication queues coming from partitioned database. The possibility was always there but now PgQ keeps also track of batch positions, allowing loss of the merge point.
-
Londiste now uses the intelligent log-triggers by default. The triggers were introduced in 2.1.x, but were not on by default. Now they are used by default.
-
-
New interactive admin console. Because long command lines are not very user-friendly, this is an experiment on interactive console with heavy emphasis on tab-completion.
-
New multi-database ticker. It is possible to set up one process that maintains all PgQ databases in one PostgreSQL instance. It will auto-detect both databases and whether they have PgQ installed. This also makes core PgQ usable without need for Python.
-
New cascaded dispatcher script. Previous 3 dispatcher scripts (bulk_loader, cube_dispatcher, table_dispatcher) shared quite a lot of logic for partitionaing, differing on few details. So instead of porting all 3 to cascaded consuming, I merged them.
Minor improvements
-
sql/pgq: ticks also store last sequence pos with them. This allowed also to move most of the ticker functionality into database. Ticker daemon now just needs to call SQL function periodically, it does not need to keep track of seq positions.
-
sql/pgq: Ability to enforce max number of events that one TX can insert. In addition to simply keeping queue healthy, it also gives a way to survive bad UPDATE/DELETE statements with buggy or missing WHERE clause.
-
sql/pgq: If Postgres has autovacuum turned on, internal vacuuming for fast-changing tables is disabled.
-
python/pgq: pgq.Consumer does not register consumer automatically, cmdline switches --register / --unregister need to be used for that.
-
londiste: sequences are now pushed into queue, instead pulled directly from database. This reduces load on root and also allows in-between nodes that do not have sequences.
-
psycopg1 is not supported anymore.
-
PgQ does not handle "failed events" anymore.
Open questions
-
New ticker
-
Name for final executable: pgqd, pgq-ticker, or something else?
-
Should it serialize retry and maint operations on different dbs?
-
Should it serialize ticking?
-
-
Python modules
-
Skytools 3 modules should be parallel installable with Skytools 2. we decided to solve it via loader module (like pygtk). The question is should we have Skytools-specific loader or more generic. And what should the name be?
import skytools_loader skytools_loader.require('3.0')
vs
import pkgloader pkgloader.require('skytools', '3.0')
-
-
PgQ Cascading
-
There are some periodic operations that should be done on root node. Should we let the ticker do them or do we require a Londiste daemon there? Letting ticker do it means initial setup is slightly easier. But requiring separate daemon means user has guarantee that switchover (downgrading root to branch) works without need to start the replication daemon.
-
-
Londiste EXECUTE command
-
Should the scripts have ability to inform Londiste of the tables they operate on, so that nodes that do not have such tables can ignore the script.
-
-
Newadm:
-
Good name for it.
-
-
Is there good reason not to drop following modules:
-
logtriga(), pgq.logtriga() - non-automatic triggers
-
cube_dispatcher, table_dispatcher, bulk_loader - they are merged into queue_loader
-
Further reading
-
Skytools 3 todo list: TODO.html
-
Newadm design and todo list: qadmin.html
-
Technical notes about cascading: set.notes.html
-
Notes for contributors: devnotes.html
-
Python API docs:
-
Database API docs:
-
Londiste Demo: