Thursday, October 24, 2019

How to import/export CSV data in/from PostgreSQL

In this blog, we are going to see -

How to export data from PostgreSQL tables to CSV files and
How to import data from  csv file into a PostgreSQL  table.

I have created  this table in PG -








Contents of CSV data file -


We can import CSV file into this 'my_table'  table using copy command  -


Table columns orders should be the same which is mentioned in CSV file or vice versa. 
Header keyword is indicating CSV file contain header i.e columns name
Delimiter is used to indicate value is separated using ','

Data has been imported into PG table -


Similarly ,We can perform Export PG table data into CSV file using copy command -


my_export_data.csv is being created

Instead of selecting all the columns , we can select any particular column as well ,in this example
using only id column


my_export_data.csv is showing data of column 'id'


We can also specify condition if any particular data we are looking -for instance id=1



More details can be  found at - COPY command






Tuesday, October 8, 2019

pg_upgrade - Easiest way to upgrade PostgreSQL cluster

In this blog, I am going to provide details steps to do upgrades a PostgreSQL cluster to a different major version (for instance PG v11 to PG v12) .
There are few  conditions which  need to be meet-
A)Both clusters should not be running (so little down time is required but this is the most efficient way) to upgrade PostgreSQL version from X major version to Y major version.
B)A new database cluster (using the new version of initdb) for  PostgreSQL version which we need to migrate.

These below are the available options of pg_upgrade utility -



Lets assume - PG v11 is the current server which contain all data/objects.

and after PG v12 release  which is the latest and the greatest contains new features including critical bug fixes -User wants to migrate.

Note- It is always recommended to do  testing on testing box  before touching the live server.

Prerequisites -
First need to make sure - PG v12 is installed on the same machine where PG v11 is installed/Running.
Shutdown the PG v11 server.

Steps to perform pg_upgrade -

Perform initdb ( initdb utility  can be found inside  bin/ directory of PG v12)
When we run pg_upgrade, we should  provide the following information:
 ->data directory for the old cluster  (-d old_data_directory_path)
 ->data directory for the new cluster  (-D new_data_directory_path)
 ->bin directory for the old version (-b old_bin_directory_path)
 ->bin directory for the new version (-B new_bin_directory_path)

For example:
  pg_upgrade -d <old_data_directory_path> -D <new_data_directory_path> -b  <old_bin_directory_path> -B <new_bin_directory_path>

Things to remember-
a)If somehow pg_upgrade get failed and we want to try it again then it is better to perform initdb again (PG v12) . Don't use the existing PG v12 cluster otherwise if we try to run pg_upgrade again without performing  fresh PG v12 - we can get some weird error like -
"
Setting frozenxid and minmxid counters in new cluster  connection to database failed: 
FATAL:  could not open relation with OID 2610
Failure, exiting
"
b)Always recommend to use dry run (option -c) first
 pg_upgrade -d data11/  -D  data12/  -b  /home/user/pg11/bin  -B /home/user/pg12/bin -c

if everything goes well, will get this message - 
*Clusters are compatible*
"./pg_ctl" -w -D "data12" -o "" -m smart stop >> "pg_upgrade_server.log" 2>&1

c)Feel free to report the issue to  pgsql-bugs@postgresql.org with log files.



Tuesday, September 10, 2019

How to perform PITR(point in time recovery) in PostgreSQL v12

How to perform PITR in PostgreSQL specially in  v12 where recovery.conf file is no more valid.  This blog is all about the setup and recovery.

if some important data got deleted or some table lost which we want to recover - we can get it by performing PITR .

Steps to perform PITR in PG v12
========================

in  PG conf file -   archive_mode and archive_command GUC variable should be set



Make sure - /tmp/archive_directory directory exists on machine . /tmp is not a good location to store the WAL files. 


 Start the server /create table with data -




There are 2 important folders - pg_wal which reside under data directory and archive_directory which we have created explicitly (under /tmp).

pg_wal folder contain all the data history records e.g insert/update/delete all records stored in the files but after a certain limits of these files -it started overriding . To make sure -we have preserve  all the WAL files  before it override - we archive it  by archive_command =cp '%p /tmp/<location/%f' ..
This process can be called 'incremental backup'.

Perform pg_basebackup ( to perform online full backup )











Scenario is -
at X time- User has dropped all the rows  = count is 0 rows
at Y time- User has inserted  50 rows =count is 50 rows
at Z time- User has inserted 1 lakh rows =count is 1 lakh 50  rows

We want to  perform recovery at Y time  where count is 50 rows.


at X time -

at Y time -


















at Z time - 

















X time =2019-09-10 17:53:13
Y time =2019-09-10 17:55:48
Z time =2019-09-10 17:58:05


Now, Time to perform recovery till Y time

Stop the  Server (./pg_ctl -D my_data stop -m i)
Go to folder - backup_data (full backup ,which we have taken earlier)
Open postgresql.conf file and add these below GUC parameters


Recovery is now (in v12) initialised by a file called 'recovery.signal'  - empty file need to be create manually in data directory folder .





Start the server -















recovery.signal file has been automatically removed from data folder .
Connect to psql terminal and check the data















Data has been recovered till Y point . 

Friday, September 6, 2019

How to install PostgreSQL using YUM

Installing any software using YUM is one of the easiest way where it will resolve all the dependencies . YUM is a  command/utility for installing/updating/deleting RHEL (and family like CENTOS/FEDORA)  RPM software packages as well as third party repositories in one single command.

YUM( Yellowdog Updater Modified) V/s RPM (Redhat Package Manager)
YUM is nothing but an another way to install RPMs , it  resolve dependencies easily ,connect to online repositories and can update software whereas with using RPMs -it is not quite easily  possible but advantage of installing using RPMs is that we don't need internet connection but in case of YUM- we cannot do without net.

Installation of PostgreSQL using YUM
============================
Go to yum.postgresql.org webpage (https://yum.postgresql.org/repopackages.php)
We are on Centos 7 64 bit machine , so need to look for repository  package  for  Centos 7-x86_64
Click on "Centos 7 -x86_64"

a file called pgdg-redhat-repo-latest.noarch.rpm -  will be downloaded




Connect to Root user (YUM command cannot be fire from non root user)

Install RPM package using rpm -ivh command






under /etc/yum.repos.d , a file will be created







Open this file and enable - whatever PostgreSQL server version you are looking , in this case - i have enabled PG v11 and rest all others are disabled



Before installing PG , we need to make sure epel-release package is installed on Centos 7.X machine
On my system , it is installed otherwise i need to fire 'yum install epel-release'



EPEL - Extra package for Enterprise Linux  (epel-release) is a open source community repository project which provide adds-on software which is required to install PG successfully. 

Perform yum clean all and yum make cache 

yum clean all -clean all the old cache 
yum makecache -is used to download and make usable all the metadata for the currently enabled yum repos


















Time to install PG v11

yum install postgresql1-server





















Press 'Y'

















PG v11 is installed successfully on machine

rpm -ql <package name> - we can know the location  from where the files have copied




















Initialise the Cluster -

Start the Service - 

check the status -











Connect to psql terminal -


Wednesday, September 4, 2019

Is setting logical replication easy in PostgreSQL ?

Well, Yes but before setting  logical replication , first thing we need to understand is  -
What is streaming replication(SR) ?
What is the difference between physical replication V/s logical replication ?

Streaming replication - gives you the power to continue send the WAL(Write ahead logs) records to the Slaves so that it can be in SYNC and if something unexpected  happened to master
for instance - if its goes down / Data Centre where Master hard disk stored hit by earthquake  , SLAVE can become the saviour and save the world. For more details please refer -PG Wiki

Logical replication - is a method to replicate the data  using replication identity whereas Physical replication is a method to replicate data using byte to byte replication. The advantage of logical replication is that here is no need to replicate ALL the tables but we can select only few specified which we want to replicate also in logical replication - we need not to create Master/Slave relation where Slave is read-only till the time it get promoted in the event of master down.

In logical replication - We have Publication and Subscription mechanism where Publication from where we are taking the data and Subscription -from where we are replicating  it.

Steps by Steps to Setup logical replication  on two database clusters-
================================================
.) PG v12 Beta 3 Sources on Centos 7
.)Perform initdb , change wal_level=logical in postgresql.conf file , start the server and connect to psql terminal  ,Create one table



.)Perform another initdb , start the server and connect to psql terminal  ,create table 'test' (same which we have created in cluster1) - here there is no need to change wal_level.

Case 1-


.) Create Publication on Cluster1 -by default  using option 'all tables'  which means all the tables will be published.


.)Create Subscription on Cluster2

.)logical replication slot is created on cluster1


logical replication service is started in the background







Time to insert data on Cluster1 . data will be replicated in cluster2

























If we want to update/delete data from cluster1 then we need to set replica identify using alter table command.

Case 2 - When we want to  publish  few tables out of X tables 

Lets assume  - We have 5 tables in cluster1 (e.g Table - Table1,Table2,Table3......Table3) and now we only want to publish first 3 tables 

















Connect to cluster2 and first create the tables which we want to subscribe from cluster1 and then create subscription .. if table doesn't exists then create subscription will fail , for ex- 








Create Table3 and now subscription which is based on publication pub will be created successfully 






Insert on table1/table2 which is on Cluster1 .. data will be replicated on Cluster2 side as well ..























Thursday, August 29, 2019

How to run PostgreSQL on Docker ?

Docker is a Open source tool/platform where we can create/deploy/run application by using containers.Docker is similar to VM but in VM - one need to create virtual operating system but in case of Docker  -it will allow applications to use system kernel where it is running on.

PostgreSQL community supports PG (v9,4,v9,5,v9.6,v10,v11 e.t.c) Images
The difference between images and container is -
Image is nothing but a template of instructions and  an instance of an image is called Container.

so here is what is what -
================
Docker- is a tool/platform.
Image- is an application which we want to create/deploy  e.g In this case we want to deploy  PG v11.5(latest released as on 28th/Aug/2019)  Image.
Container - Running instance of an image (template of instructions) .

Steps by Steps guide to deploy PostgreSQL image on Docker 
============================================
0. Centos 7 64 bit machine
1.Go to  https://hub.docker.com/
2.Create a login account (Sign up)


After Pressing continue -it will ask you - your  first name/last name /your profile  and  some other crap questions.  - Press Continue - it will further ask you to check your inbox and click on "Confirm your email with Docker "  , Once you done that -
Congrats - You are registered on the site and your DockerID is created.

Sign up time -





















First thing ,We need to install docker on machine , right now it is not installed locally on my Centos 7










3.To install Docker Enterprise(EE) , Please follow this link - 

Otherwise , just  install Docker CE(Community edition) by using yum command 










Open a command line terminal and start the docker daemon service -












4.Go to https://hub.docker.com and write 'postgres' in search box 















All the official PG server images can be found that and on the right hand side, there is a command to pull the image i.e "docker pull postgres" , copy it .












5.Open a command line terminal and paste command - "docker pull postgres"












OR


if we want PG v11.5 then we need to specify the tag explicitly i.e 








Currently latest is 11.5 only so both are same.

6.Run "docker run" command which is used to launch container 






7.Run "docker ps" command which shows status of container and all required information 




8.Run "docker exec" command which is used to login to the container

Here, if we see - We are not only able to login to the container but also login to PG v11.5 server 

9.Remove the  container once you are done !



Hope this helps!

I would like to  thanks my colleague Mr.Jitendra Wadle for all his help and assistance.