Forensic Files is a documentary-style, true crime series which reveals how forensic science is used to solve violent crimes, mysterious accidents, and even outbreaks of illness. Being a fan and most likely having watched all of the episodes in the series (more than once!) I was inspired to build a database to be used as a source of information about the show.
What I had in mind was to have the ability to assign words and/or phrases to each episode so that I can perform searches based in terms or phrases – such as “anthropology”, “dumpster”. “ravine” or even “cold case” and have my query return a list of all episodes that include content related to those topics.
The following outlines the solution I came up with.
Choosing the Technology
For this exercise I decided to use PostgreSQL, which is a powerful, open-source object-relational database system with over 35 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance. I’ve used PostgreSQL before, but I wanted to make sure to use the most recent version so the first thing I did was to run the following SQL command:
SELECT version();
Which yields the following result and yes, as of this writing, 15.2 is the latest version:
My other choice was which SQL management tool to use, and I opted for DBeaver (my favorite). I’ve written about both PostgreSQL and DBeaver in the past so I will not spend any time describing them here and just get down to work.
First Step
The first step (I already had a database to use) was to create a new table to hold the Forensic Files episode data:
- Season number (which season did this episode first aired)
- The number of the episode within the entire series (all seasons) – this will be the episodes unique ID)
- The number of the episode within the season
- The title of the episode
- The original air date of the episode
- “Data” – which I am planning to use to hold words and/or phrases I want to associate with the episode (more on this in a bit)
The following SQL statement was used to create my “episode” table:
DROP TABLE IF EXISTS public.Forensic_Files_Episodes;
CREATE TABLE public.Forensic_Files_Episodes (
season INT NOT NULL,
number_in_series INT NOT NULL,
number_in_season INT NOT NULL,
title VARCHAR ( 255 ) UNIQUE NOT NULL,
air_date DATE NOT NULL,
popularity numeric,
data jsonb
);
Once I had defined my table, I wanted to load it with data. In DBeaver, you can easily import CSV’s to a table that matches the columns that the comma-separated values or “CSV” file contains. To do this, you right click the table, and choose “Import Data“:
My file of Forensic Files Episode data contains all of the fields I am interested in (season, the number of episode within the series (all seasons), the number of the episode within the season, the episode title and the original air date) so I can easily import the data :
After the data import completes, I wanted to check the table’s content, so I used this SQL Statement:
select * from public.forensic_files_episodes;
Adding Popularity Ratings
I had a second data file which contained the viewer popularity rating for each episode and wanted to add that information to my table so I again used the DBeaver Import Data feature and this time, indicated CREATE for the target table so I ended up with a new table named “forensic_files_popularity”.
After this import (completes), I checked the new table with the following SL statement:
select * from forensic_files_popularity;
Now that I had loaded the episode popularity ratings, I updated the forensic_files_popularity.popularity column from the forensic_files_popularity table (matching on the “number_in_series” column) with the following SQL Statement:
UPDATE forensic_files_episodes
SET popularity = forensic_files_popularity.popularity
FROM forensic_files_popularity
WHERE forensic_files_episodes.number_in_series = forensic_files_popularity.number_in_series;
At that point I can query the list episodes by a variety of “filters” such as air date, season, number in series, the title or even popularity:
select number_in_series, title, popularity from public.Forensic_Files_Episodes
where public.Forensic_Files_Episodes.popularity > 5.0
order by popularity DESC;
Adding Tags
JSON is an open standard file format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and “arrays”. PostgreSQL introduced the JSONB data type in version 9.4 (a while back!) and warrants a closer look from the reader.
I initially defined a “data” column of this type as part of my forensic_files_episodes table:
So, this now allowed me to define and associate key phrases “tagging” each episode. Below are some sample SQL statements I used to assign an “alternate name” as well as associate multiple key phrases (or tags) for each episode:
UPDATE forensic_files_episodes set data =
(‘{“name”: “mothers diary”, “tags”: [“Haunted”, “disappearance of her mother”, “twenty years earlier”, “young woman”, “undertook an investigation”, “cold case”], “finished”: true}’)
where forensic_files_episodes.number_in_series = 198;
UPDATE forensic_files_episodes set data =
(‘{“name”: “Killigraphy”, “tags”: [“autopsy”,”suspicious”,”death”,”Georgia”,”mentally-unstable”,”husband”,”murder”,”medical examiner”], “finished”: true}’)
where forensic_files_episodes.number_in_series = 61;
UPDATE forensic_files_episodes set data =
(‘{“name”: “A Leg to Stand On”, “tags”: [“Severed leg”,”dumpster”,”investigators”,”identifying the victim”,”anthropology”,”toxicology”,”DNA”,”police”,”victim”,”follow the trail”,”evidence”], “finished”: true}’)
where forensic_files_episodes.number_in_series = 115;
Searching by Key Phrase
After adding the “tags” I was ready to do some searching! I can search by any phrase, such as “blood splatter” or “DNA” or how about a partial phrase like “invest”:
select t.title
from forensic_files_episodes t
where exists (select *
from jsonb_array_elements_text(t.data -> ‘tags’) as t(tagme)
where t.tagme like ‘%invest%’);
Conclusion
This exercise really didn’t take very long to complete and demonstrates a simple solution for using JSONb data types with PostgreSQL to allow searching by tags – and now I am ready for next time I want to binge watch ONLY those Forensic Files episodes that deal with square dancing doctors from New York!