Exploring a large database
Big organizations typically have their data stored in one or multiple database. Sometimes, analytics team find themselves buried in the jungle of tables/views with non consistent naming conventions, therefore hard to build a holistic view about where to find which data etc.
In this post we explore a few tricks to ease the initial phase of understanding a large database.
Having applications in mind
Very often in business, database are the backend of some frontend applications with GUI that one can interact with. Therefore, it is not a bad idea to use the frontend application first hand to see what's possible, the meaning of a column, how it relates to other columns etc.
This can be a back-and-forth process, while doing analytics work, check the GUI to see where are the column names appear in the application, gradually completing a good mental picture of the 'geography' of the tables.
Learn the data model from an architect
Data architect is responsible for creating data model: how the business is broken into concepts (stored as tables) and how they relate (via foreign keys). If talking to an architect is an option, it is the fastest way of understaning the data model.
Next we explore how to reverse engineer the data model.
Visualizations
Some DB has their native visualization tools, ER generators (e.g. Oracle SQL developer). Not a bad idea to take advantage of that.
Use metadata tables
Almost all database systems have a kind of information schema which contains tables that describe the metadata of the database such as the existing schemas, tables, views, columns etc.
In Oracle it's ALL_TABLES
ALL_VIEWS
ALL_TAB_COLUMNS
(not a schema per se, rather individual tables that start with ALL_
prefix).
In Postgres/DuckDB/Snowflake it's the information_schema
schema, with tables such as tables
, columns
in it.
Foreign keys
For obvious reasons, it is important to know these keys for the tables of interest. In Oracle, use ALL_CONSTRAINTS
table.
In Snowflake/DuckDB/Postgres, use information_schema.REFERENTIAL_CONSTRAINTS
.
Describe a table
Literally DESCRIBE _NAME_
to show the column names and types.