Gut feeling about priorities:
- High
-
Needer for wider deployment / 3.0-final.
- Medium
-
Good if done, but can be postponed.
- Low
-
Interesting idea, but OK if not done.
High Priority
-
qadmin: register/unregister subconsumer
-
add syntax
-
call db-functions.
-
-
londiste takeover: check if all tables exist and are in sync. Inform user. Should the takeover stop if problems? How can such state be checked on-the-fly? Perhaps londiste missing should show in-copy tables.
-
docs:
-
londiste manpage
-
qadmin manpage
-
pgqd manpage
-
-
cascade: merge leaf creation should check if target queue exists - pgq_node.create_node() should return error if not.
-
cascade takeover: wal failover queue sync. WAL-failover can be behind/ahead from regular replication with partial batch. Need to look up-batched events in wal-slave and full-batches on branches and sync them together. this should also make non-wal branch takeover with branch thats behind the others work - it needs to catch up with recent events.
-
Load top-ticks from branches
-
Load top-tick from new master, if ahead from branches all ok
-
Load non-batched events from queue (ev_txid not in tick_snapshot)
-
Load partial batch from branch
-
Replay events that do not exists
-
Replay rest of batches fully
-
Promote to root
-
-
tests: takeover testing
-
wal behind
-
wal ahead
-
branch behind
-
Medium Priority
-
Split package (pgq-modules-X.Y, pgqd, skytools) [dimitri]
-
tests for things that have not their own regtests or are not tested enough during other tests:
-
pgq.RemoteConsumer
-
pgq.CoopConsumer
-
skytools.DBStruct
-
londiste handlers
-
-
cascade watermark limit nodes. A way to avoid affecting root with lagging downstream nodes. Need to tag some nodes to coordinate watermark with each other and not send it upstream to root.
-
automatic sql upgrade mechanism - check version in db, apply .sql if contains newer version.
-
integrate skytools.checker. It is generic check/repair script that can also automatically apply fixes. Should rebase londiste check/repair on that.
-
londiste add-table: automatic serial handling, --noserial switch? Currently, --create-full does not create sequence on target, even if source table was created with serial column. It does associate column with sequence if that exists, but it requires that it was created previously.
-
pgqd: rip out compat code for pre-pgq.maint_operations() schemas. All the maintenance logic is in DB now.
-
qadmin: merge cascade commands (medium) - may need api redesign to avoid duplicating pgq.cascade code?
-
dbscript: configurable error timeout (currently it’s hardwired to 20s)
-
dbscript: exec_cmd() needs better name.
-
londiste replay: when buffering queries, check their size. Current buffering is by count - flushed if 200 events have been collected. That does not take account that some rows can be very large. So separate counter for len(ev_data) needs to be added, that flushes if buffer would go over some specified amount of memory.
-
cascade status: parallel info gathering. Sequential for-loop over nodes can be slow if there are many nodes or some of them are generally slow (GP, bad network, etc). Perhaps we can use psycopg2 async mode for that. Or threading.
-
developer docs for:
-
DBScript, pgq.Consumer, pgq.CascadedConsumer?
-
Update pgq/sql docs for current pgq.
-
Low Priority
-
dbscript: switch (-q) for silence for cron/init scripts. Dunno if we can override loggers loaded from skylog.ini. Simply redirecting fds 0,1,2 to /dev/null should be enough then.
-
londiste add/copy: merge copy without combined queue? If several partitions need to write data to single place, there are 2 variants:
-
Use --skip-truncate when adding table in partitons. Problem: it repeatedly drops/creates indexes.
-
Use merge-leaf→combined-root cascading logic, for which Londiste has optimized copy without repeated index dropping+creation. Problem: it expects target combined queue where to copy events and which also triggers the additional copy logic.
Should there also be 3) - optimized copy without combined queue? Problem: the additional logic needs to checks tables from all the other nodes, all the time, because it has no clear flag that merging needs to happen. Seems it's better to avoid that.
-
-
londiste: support creating slave from master by pg_dump / PITR. Take full dump from root or failover-branch and turn it into another branch.
-
Rename node
-
Check for correct epoch, fix if possible (only for pg_dump)
-
Sync batches (wal-failover should have it)
-
-
londiste copy: async conn-to-conn copy loop in Python/PythonC. Currently we simply pipe one copy_to() to another copy_from() in blocked manner with large buffer, but that likely halves the potential throughput.
-
qadmin: multi-line commands. The problem is whether we can use python’s readline in a psql-like way.
-
qadmin: recursive parser. Current non-recursive parser cannot express complex grammar (SQL). If we want SQL auto-completion, recursive grammar is needed. This would also simplify current grammar.
-
On rule reference, push state to stack
-
On rule end, pop state from stack. If empty then done.
-