furinkan/sysadmin/PostgresTuning

Set a statement timeout, to deal with clients that disappear while the query is running


pg_stat_bgwriter: can tune the bgwriter, may need to raise from 4MB/sec throttling limit for big/active databases

memory tuning: shared buffers, max connections

private memory (per process/client): work mem, 

shared_buffers: old mnemonic of 0.25x RAM is obsolete. Now it's workload dependent. Start with a small value, tune with pg_buffercache

work mem: 4mb is good
maintenance work mem: default 64mb, going beyond 1gb likely not helpful. small can be good, better for L2/L3 cache
effective io concurrency: normally 1, worth increasing for SSD/RAID systems



log line prefix: add %t timestamp

log min messages: NOTICE
log min error statement: ERROR
Where to log to: your choice


log duration for long queries: 1000
log duration: set it ON
log checkpoints: log stats of shared memory checkpoints
log connections/disconnections: maybe useful
log lock waits: considered a Wait if it's longer than the deadlock check time (which is every 1sec)
log temp files: see behaviour for temp table/space activity (1024 == 1KiB)

analysis tools, good for top-N reports
* pgfouine
* pgbadger
* PQA

most frequently run queries
highest cumulative run times
longest individual response times

runtime stats:
* track_activities (default on)
* track_counts (default on)
* track_io_timing (default off, should turn ON)

pg_stat_activity: current queries
pg-stat_database: amount of activity on DBs
pg_stat_bgwriter: stats for writer/bg processes
pg_stat_user_tables: usage stats for tables, logical stuff like indexes, tuples, scans, etc
pg_statio_user_tables: same again but at IO layer
pg_stat_user_indexes: usage stats for indexes, can see hot spots and cold spots (cold spots may be indices that you can drop, to remove UPDATE overhead)
_statio_: same again but at IO

track_functions: none|pl|all (not SQL functions, they're inline)


sometimes it's hard to diagnose because it goes away before we get to looking. could be poor plan choice, etc.
in these cases it can be useful to enable auto_explain, it logs plans to the server log.
Gotta load the extension manually, so you need to restart the DB to enable it

under the auto_explain. namespace, you can spec log_min_duration, to analyze it at the same time, to record timing, log_buffers (shared buffer hit ratio), 

pg_stat_statements
loaded at cluster start (shared_preload_libraries)
lots of stats available


other tools to consider, very heavyweight:
log_statement_stats
log_parser_stats
log_planner_stats
log_executor_stats
PLpgsql profiler
dtrace



pg_stat_statements

pg_dump and pg_restore can run in parallel

pg_dump to archive format is really good

dump can convert to script if you want pure SQL, pg_restore can read both though

pg_restore is useful for juggling dumps to restore, you can pick out what you want

when rebuilding the old-master into a new slave, you need to set the timeline target in recovery.conf, otherwise it'll continue on its original timeline only, and wait forever for new WAL segments, when really they're being produced on the the new timeline


pglogical: since 9.4, able to do a near-zero-downtime version upgrade
MeidokonWiki: furinkan/sysadmin/PostgresTuning (last edited 2019-11-20 14:00:39 by furinkan)