Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot tables and relationships from Postgresql tables

Is it possible to plot the tables in a postgresql database and their relationships using R like shown below?

enter image description here

like image 632
WAF Avatar asked Oct 18 '13 09:10

WAF


People also ask

How do I find the relationship between two tables in PostgreSQL?

If there are foreign keys between the tables, then you can find the relationship between them. To do this, you can call \d on a table and see the foreign keys associated with its columns.


1 Answers

Yes it is possible.

As for how it is possible, see the steps below

Steps

  1. Connect to PostgreSQL database
  2. Get Schema Information for database
  3. Store Schema Information in Data Structure / Rearrange data in data frames into data structure.
  4. Generate Diagram from Data Structure

Step 1

For connection to PostgreSQL database from R there are various mechanisms of doing so including

  1. RPostgreSQL (R to PostgreSQL, persistent connection)
  2. sqldf (R to PostgreSQL, temporary connection does part of step 3 automatically), it has RPostgreSQL as dependency.
  3. PL/R (PostgreSQL to R)
  4. db.r (R to PostgreSQL, has basic database visualisation built in e.g. partial step 2,3,4,)

An example of Step 1 in RPostgreSQL is below:

library(RPostgreSQL)

## loads the PostgreSQL driver
drv <- dbDriver("PostgreSQL")

## Open a connection
con <- dbConnect(drv, dbname="databasename")

Step 2

This can be done in several ways. It can be done directly in SQL, or it can be done using

RPostgreSQL's dbListTables and dbListFields or a combination of the two.

For example SQL to query all tables in a database, or all fields / columns in a table or all constraints in a table see the following StackOverflow answers

  • PostgreSQL Describe Table
  • Show tables in PostgreSQL
  • List all tables in PostgreSQL information schema
  • How do I list all columns for specified table DBA StackExchange
  • PostgreSQL to list foreign keys (you just remove or modify the constraint type from where clause to get foreign and primary keys)

In summary you just query information_schema.tables, information_schema.columns and information_schema.table_constraints for the information you need. You can use the PostgreSQL specific tables rather than the ANSI SQL standard tables, if speed is an issue (they are mentioned in the linked answers above), but they may change over time.

The steps here are

  1. Get List of Tables
  2. Iterate through list of tables and get columns per table (alternately just query all columns with a query that includes table name and column name in the result set)
  3. Iterate through list of tables and get constraints per table alternately just query all columns with a query that includes table name and constraint name in the result set)

An example of Step 2 in RPostgreSQL is below:

Adjust your SQL to suit.

Part1

For getting list of tables

Using built-in function

tables1 <- dbListTables(con)

Using SQL

tables2 <- dbGetQuery(con, "select table_name from information_schema.tables")

Part 2

Use built in function

You would use dbListFields(con,"TableName"), with apply over the previous data frame of tables. See how to apply a function to every row of a matrix (or a data frame) in R or Apply a function to each row in a data frame in R and save the result to a variable.

Using SQL

columns2 <- dbGetQuery(con, "select table_name,column_name from information_schema.columns")

Part 3

Using SQL

constraints <- dbGetQuery(con, "select table_name,constraint_name, constraint_type from information_schema.table_constraints")

Step 3

From step 2, you should have list of tables, a list of tables and their associated fields / columns, and list of tables and their associated constraints.

You either need to output a csv file for CityPlot to use , or a dot file for GraphViz, or igraph's graph format or a data frame or hash in order to process using functions which draw your tables and connections between them using grid or diagram.

If you are combining them into a single dataframe, subset and merge will be useful.

Step 4

This step can also be done in many different ways. These include but are not limited to

  1. grid-package (See the pdf in references for example article that matches your requested use case), Low Level Graphics primitives
  2. diagram package and shape package (See the pdf in references for usage example) Slightly higher level graphics primitives
  3. Rgraphviz package (Graphviz, basically either generate a dot file step 3, or see the pdf in references for more information)
  4. igraph package (would only generate as flattened circles for each column and table combination)
  5. CityPlot package (Generates an Entity Relationship diagram as opposed to database table diagram, but may meet your needs, needs step 3 to generate a csv file from the data frames)

If using the diagram, shape or grid packages, you would iterate over the list of tables, or the hash or other data structure, and apply a draw function on each table, and then have a separate function that is applied for each constraint to draw the lines.

References

  • R an PostgreSQL using RPostgreSQL and sqldf
  • Youtube video Example: PostgreSQL Connection to R
  • Drawing Diagrams with R
  • R Package diagram: visualising simple graphs,flowcharts, and webs
  • Example RPostgreSQL usage
  • Dot Guide For Graphviz
  • How To Plot A Graph Using Rgraphviz
like image 161
Appleman1234 Avatar answered Oct 16 '22 16:10

Appleman1234