= londiste(1) =


== NAME ==

londiste - PostgreSQL replication engine written in python

== SYNOPSIS ==

  londiste.py [option] config.ini command [arguments]

== DESCRIPTION ==

Londiste is the PostgreSQL replication engine portion of the SkyTools suite, 
by Skype. This suite includes packages implementing specific replication 
tasks and/or solutions in layers, building upon each other.

PgQ is a generic queue implementation based on ideas from Slony-I's
snapshot based event batching. Londiste uses PgQ as its transport
mechanism to implement a robust and easy to use replication solution.

Londiste is an asynchronous master-slave(s) replication
system. Asynchronous means that a transaction commited on the master is
not guaranteed to have made it to any slave at the master's commit time; and
master-slave means that data changes on slaves are not reported back to
the master, it's the other way around only.

The replication is trigger based, and you choose a set of tables to
replicate from the provider to the subscriber(s). Any data changes
occuring on the provider (in a replicated table) will fire the
londiste trigger, which fills a queue of events for any subscriber(s) to
care about.

A replay process consumes the queue in batches, and applies all given
changes to any subscriber(s). The initial replication step involves using the
PostgreSQL's COPY command for efficient data loading.

== QUICK-START ==

Basic londiste setup and usage can be summarized by the following
steps:

 1. create the subscriber database, with tables to replicate

 2. install pgq on both databases and launch pgq daemon.

    $ edit pgqadm-master.ini
    $ edit pgqadm-slave.ini
    $ pgqadm pgqadm-master.ini install
    $ pgqadm pgqadm-slave.ini install
    $ pgqadm pgqadm-master.ini ticker -d
    $ pgqadm pgqadm-slave.ini ticker -d

 3. create londiste config file for both databases.

    $ edit londiste-master.ini
    $ edit londiste-slave.ini

 4. create londiste nodes.  this also installs londiste code.

    $ londiste londiste-master.ini create-root master
    $ londiste londiste-slave.ini  create-branch slave --provider="master connstr"

 5. launch londiste daemons for both databases.

    $ londiste londiste-master.ini replay -d
    $ londiste londiste-slave.ini replay -d

 6. add tables to master

    $ londiste londiste-master.ini add-table mytbl1 mytbl2

 7. add tables to slave

    $ londiste londiste-slave.ini add-table mytbl1 mytbl2

To replicate to more than one subscriber database just repeat each of the
described subscriber steps for each subscriber.

== COMMANDS ==

The londiste command is parsed globally, and has both options and
subcommands. Some options are reserved to a subset of the commands,
and others should be used without any command at all.

== GENERAL OPTIONS ==

This section presents options available to all and any londiste
command.

  -h, --help::
	show this help message and exit

  -q, --quiet::
	make program silent

  -v, --verbose::
	make program more verbose


== CASCADING COMMANDS ==

=== install ===

Installs code into provider and subscriber database and creates
queue. Equivalent to doing following by hand:

    CREATE LANGUAGE plpgsql;
    \i .../contrib/txid.sql
    \i .../contrib/pgq.sql
    \i .../contrib/pgq_node.sql
    \i .../contrib/londiste.sql

=== create-root ===

q

=== create-branch ===

q

=== create-leaf ===

q

=== add-table <table name> ... ===

Registers table(s) on the provider database and adds the londiste trigger to 
the table(s) which will send events to the queue.  Table names can be schema 
qualified with the schema name defaulting to public if not supplied.

  --all::
	Register all tables in provider database, except those that are
	under schemas 'pgq', 'londiste', 'information_schema' or 'pg_*'.

=== remove-table <table name> ... ===

Unregisters table(s) on the provider side and removes the londiste triggers 
from the table(s). The table removal event is also sent to the queue, so all 
subscribers unregister the table(s) on their end as well.  Table names can be 
schema qualified with the schema name defaulting to public if not supplied.

=== provider add-seq <sequence name> ... ===

Registers a sequence on provider.

=== provider remove-seq <sequence name> ... ===

Unregisters a sequence on provider.

=== provider tables ===

Shows registered tables on provider side.

=== provider seqs ===

Shows registered sequences on provider side.

== SUBSCRIBER COMMANDS ==

  londiste.py config.ini subscriber <command>

Where command is one of:

=== subscriber install ===

Installs code into subscriber database. Equivalent to doing following
by hand:

    CREATE LANGUAGE plpgsql;
    \i .../contrib/londiste.sql

This will be done under the Postgres Londiste user, if the tables should 
be owned by someone else, it needs to be done by hand.

=== subscriber add <table name> ... ===

Registers table(s) on subscriber side. Table names can be schema qualified 
with the schema name defaulting to `public` if not supplied.

Switches (optional):

  --all::
        Add all tables that are registered on provider to subscriber database
  --force::
        Ignore table structure differences.
  --excect-sync::
        Table is already synced by external means so initial COPY is unnecessary.
  --skip-truncate::
        When doing initial COPY, don't remove old data. 

=== subscriber remove <table name> ... ===

Unregisters table(s) from subscriber. No events will be applied to
the table anymore. Actual table will not be touched. Table names can be 
schema qualified with the schema name defaulting to public if not supplied.

=== subscriber add-seq <sequence name> ... ===

Registers a sequence on subscriber.

=== subscriber remove-seq <sequence name> ... ===

Unregisters a sequence on subscriber.

=== subscriber resync <table name> ... ===

Tags table(s) as "not synced". Later the replay process will notice this
and launch copy process(es) to sync the table(s) again.

=== subscriber tables ===

Shows registered tables on the subscriber side, and the current state of
each table. Possible state values are:

NEW::
  the table has not yet been considered by londiste.

in-copy::
  Full-table copy is in progress.

catching-up::
  Table is copied, missing events are replayed on to it.

wanna-sync:<tick-id>::
  The "copy" process catched up, wants to hand the table over to
  "replay".

do-sync:<tick_id>::
  "replay" process is ready to accept it.

ok::
  table is in sync.


== REPLICATION COMMANDS ==

=== replay ===

The actual replication process. Should be run as daemon with -d
switch, because it needs to be always running.

It's main task is to get batches of events from PgQ and apply
them to subscriber database.

Switches:

  -d, --daemon::
	go background

  -r, --reload::
	reload config (send SIGHUP)

  -s, --stop::
	stop program safely (send SIGINT)

  -k, --kill::
	kill program immidiately (send SIGTERM)

== UTILITY COMMAND ==

=== repair <table name> ... ===

Attempts to achieve a state where the table(s) is/are in sync, compares 
them, and writes out SQL statements that would fix differences.

Syncing happens by locking provider tables against updates and then
waiting until the replay process has applied all pending changes to 
subscriber database. As this is dangerous operation, it has a hardwired 
limit of 10 seconds for locking. If the replay process does not catch up 
in that time, the locks are released and the repair operation is cancelled.

Comparing happens by dumping out the table contents of both sides, 
sorting  them and then comparing line-by-line. As this is a CPU and 
memory-hungry operation, good practice is to run the repair command on a
third machine to avoid consuming resources on either the provider or the
subscriber.

=== compare <table name> ... ===

Syncs tables like repair, but just runs SELECT count(*) on both
sides to get a little bit cheaper, but also less precise, way of
checking if the tables are in sync.

== CONFIGURATION ==

Londiste and PgQ both use INI configuration files, your distribution of
skytools include examples. You often just have to edit the database
connection strings, namely db in PgQ ticker.ini and provider_db and
subscriber_db in londiste conf.ini as well as logfile and pidfile to adapt to
you system paths.

See `londiste(5)`.

== UPGRADING ==

As the skytools software contains code which is run directly from
inside the database server (+PostgreSQL+ functions), installing the
new package version at the OS level is not enough to perform the
upgrade.

It still is possible to upgrade londiste without stopping the service,
how to do this somewhat depends on the specifics version you're
upgrading from and to, please refer to the +upgrade.txt+
documentation, which can be found at this url too:
http://skytools.projects.postgresql.org/doc/upgrade.html[upgrade.html]

== CURRENT LIMITATIONS ==

+londiste+, as a trigger based solution, is not able to replicate
neither
http://www.postgresql.org/docs/current/interactive/ddl.html[DDL] nor
http://www.postgresql.org/docs/current/static/sql-truncate.html[TRUNCATE]
+SQL+ commands.

Please also note that the cascaded replication scenario is still a
+TODO+ item, which means +londiste+ is not yet able to properly handle
the case on its own.

=== DDL ===

If you edit a table definition on the provider, you have to manually
update the table definition on every subscriber replicating the table
data. When adding, renaming or removing columns, replication won't
work for the table until subscriber is updated too, but +londiste+
won't loose any item to replicate, and reapply them once schemas match.

=== TRUNCATE ===

For truncating a table +foo+ which is replicated by +londiste+, the
easier way is to remove +foo+ from subscriber(s), +TRUNCATE+ it on the
provider and add it again on subscriber(s):

    subscriber> londiste.py conf.ini subscriber remove foo
    provider>   psql provider_db -c "truncate foo;"
    subscriber> londiste.py conf.ini subscriber add foo 

Of course, you need to perform the subscriber steps on each of your
subscribers if you have more than one currently replicating the +foo+
table.

=== Cascaded replication ===

+londiste+ is not yet able to handle cascaded replication. What it
means is that if you setup the three servers A, B and C such as some
tables from A are replicated to B and the same table are replicated
from B to C, and if the replication from A to B is stopped, +londiste+
won't be able to have the replication to C ongoing by reconfiguring it
from A to C for you.

== SWITHOVER ==

While using +londiste+ to replicate data from a provider to a
subscriber, it is possible to have the subscriber become the provider.
This can be used, for example, to upgrade from one +PostgreSQL+
version to another (more recent) one, or from a physical setup to
another, for example.

The recommanded procedure to achieve a switchover is the following:

  1. stop all write access to db.
  2. let the londiste apply last changes
  3. set up new queue on slave as provider, add tables
  4. subscribe old master to new master, add tables with --expect-sync
  5. do some DDL things on new master (triggers, etc)
  6. allow write access to new master

== SEE ALSO ==

`londiste(5)`

https://developer.skype.com/SkypeGarage/DbProjects/SkyTools/[]

http://skytools.projects.postgresql.org/doc/londiste.ref.html[Reference guide]

