
= Skytools ToDo list =

Gut feeling about priorities:

High::
  Needer for wider deployment / 3.0-final.
Medium::
  Good if done, but can be postponed.
Low::
  Interesting idea, but OK if not done.

== High Priority ==

* qadmin: register/unregister subconsumer
  . add syntax
  . call db-functions.

* londiste takeover: check if all tables exist and are in sync.
  Inform user.  Should the takeover stop if problems?
  How can such state be checked on-the-fly?
  Perhaps `londiste missing` should show in-copy tables.

* docs:
  - londiste manpage
  - qadmin manpage
  - pgqd manpage

* cascade: merge leaf creation should check if target queue exists -
  `pgq_node.create_node()` should return error if not.

* cascade takeover: wal failover queue sync.  WAL-failover can be
  behind/ahead from regular replication with partial batch.  Need
  to look up-batched events in wal-slave and full-batches on branches
  and sync them together.  this should also make non-wal branch takeover
  with branch thats behind the others work - it needs to catch up with
  recent events.
  . Load top-ticks from branches
  . Load top-tick from new master, if ahead from branches all ok
  . Load non-batched events from queue (ev_txid not in tick_snapshot)
  . Load partial batch from branch
  . Replay events that do not exists
  . Replay rest of batches fully
  . Promote to root

* tests: takeover testing
  - wal behind
  - wal ahead
  - branch behind

== Medium Priority ==

* Split package (`pgq-modules-X.Y`, `pgqd`, `skytools`) [dimitri]

* tests for things that have not their own regtests
  or are not tested enough during other tests:
  - pgq.RemoteConsumer
  - pgq.CoopConsumer
  - skytools.DBStruct
  - londiste handlers

* cascade watermark limit nodes.  A way to avoid affecting root
  with lagging downstream nodes.  Need to tag some nodes to coordinate
  watermark with each other and not send it upstream to root.

* automatic sql upgrade mechanism - check version in db, apply .sql
  if contains newer version.

* integrate skytools.checker.  It is generic 'check/repair' script
  that can also automatically apply fixes.  Should rebase londiste
  check/repair on that.

* londiste add-table: automatic serial handling, --noserial switch?  Currently,
  `--create-full` does not create sequence on target, even if source
  table was created with `serial` column.  It does associate column
  with sequence if that exists, but it requires that it was created
  previously.

* pgqd: rip out compat code for pre-pgq.maint_operations() schemas.
  All the maintenance logic is in DB now.

* qadmin: merge cascade commands (medium) - may need api redesign
  to avoid duplicating pgq.cascade code?

* dbscript: configurable error timeout (currently it's hardwired to 20s)

* dbscript: `exec_cmd()` needs better name.

* londiste replay: when buffering queries, check their size.  Current
  buffering is by count - flushed if 200 events have been collected.
  That does not take account that some rows can be very large.
  So separate counter for len(ev_data) needs to be added, that flushes
  if buffer would go over some specified amount of memory.

* cascade status: parallel info gathering.  Sequential for-loop over
  nodes can be slow if there are many nodes or some of them are
  generally slow  (GP, bad network, etc).  Perhaps we can use
  psycopg2 async mode for that.  Or threading.

* developer docs for:
  - DBScript, pgq.Consumer, pgq.CascadedConsumer?
  - Update pgq/sql docs for current pgq.

== Low Priority ==

* dbscript: switch (-q) for silence for cron/init scripts.
  Dunno if we can override loggers loaded from skylog.ini.
  Simply redirecting fds 0,1,2 to /dev/null should be enough then.

* londiste add/copy: merge copy without combined queue?  If several
  partitions need to write data to single place, there are 2 variants:
  1. Use `--skip-truncate` when adding table in partitons.
     Problem: it repeatedly drops/creates indexes.
  2. Use `merge-leaf`->`combined-root` cascading logic, for which
     Londiste has optimized copy without repeated index dropping+creation.
     Problem: it expects target combined queue where to copy
     events and which also triggers the additional copy logic.

  Should there also be 3) - optimized copy without combined queue?
  Problem: the additional logic needs to checks tables from all the other
  nodes, all the time, because it has no clear flag that merging
  needs to happen.  Seems it's better to avoid that.

* londiste: support creating slave from master by pg_dump / PITR.
  Take full dump from root or failover-branch and turn it into
  another branch.
  . Rename node
  . Check for correct epoch, fix if possible (only for pg_dump)
  . Sync batches (wal-failover should have it)

* londiste copy: async conn-to-conn copy loop in Python/PythonC.
  Currently we simply pipe one copy_to() to another copy_from()
  in blocked manner with large buffer,
  but that likely halves the potential throughput.

* qadmin: multi-line commands.  The problem is whether we can
  use python's readline in a psql-like way.

* qadmin: recursive parser.  Current non-recursive parser
  cannot express complex grammar (SQL).  If we want
  SQL auto-completion, recursive grammar is needed.
  This would also simplify current grammar.
  1. On rule reference, push state to stack
  2. On rule end, pop state from stack.  If empty then done.

