DATA ANALYSIS

Datasets with several timezones

Marco Gabba
2 min readApr 23, 2021

How to deal with daylight saving time in Pandas

Photo by Sonja Langford on Unsplash

Questo articolo è disponibile anche in italiano, qui

Dates and times: your best friend and your worst enemy when it comes to handling a dataset.

In my everyday job, I often deal with timeseries analysis and historical data.
One critical aspect in the datetime format, especially if the timestep is hourly or lower: daylight saving time creates two discontinuities that can create errors in the data handling or interpretation if overlooked.

In this story, I’ll show you a practical example and the solution I adopted.

The data

Data was stored in a database; I was working on a subset stored in .csv for easy handling with Jupyter Notebook.

Here is an example:

datetime, variable1, variable2
2020–01–01, 00:00:00+01, 1, 2
2020–01–01, 00:15:00+01, 3, 5
2020–01–01, 00:30:00+01, 7, 9

This was the first time I had to handle data with the “+01” timezone bit in the timestamp, so I decided to just try and let Pandas handle it.

my_data = pd.read_csv(filepath, 
index_col = ‘datetime’,
parse_dates=True)
my_data.index.dtype
OUTPUT: dtype(‘O’)

Damn! Pandas wasn’t able to convert the datetime column to datetime format! A quick online search solved the mystery: datetimes were recognized with two different timezones.

Let’s see the data around daylight saving time start:

datetime, variable1, variable2
2020–03–29 01:30:00+01, 65, 7
2020–03–29 01:45:00+01, 39, 11
2020–03–29 03:00:00+02, 24, 4
2020–03–29 03:00:15+02, 12, 9

So, Pandas is not able to understand that the notation simply indicates daylight saving time. What should we do now?

UTC, the machine-friendly timezone

Our goal now is to choose a datetime format easy to handle with Pandas.
The library itself has a solution: the datetime parser has a parameter that allows converting each string in UTC datetime, avoiding the daylight saving time ambiguities.

my_data[‘Datetime’] = pd.to_datetime(
Medium.index,
utc=True,
infer_datetime_format=True)

EUREKA! Now the datetimes are in datetime64 format!
Last step: put the column as the index and I’m ready to roll!

my_data = my_data.set_index(‘datetime’)

Or so I believed…

Back to the future

Working with the data was now really impractical: my datetime filters were written in GMT+1 timezone and I wanted all my data visualization expressed in my timezone!

Luckily, Pandas got me covered again: converting the parsed dates into my timezone was easy as writing one line of code.

my_data.index = my_data.index.tz_convert(‘Europe/Rome’)

That’s all Folks!

This experience taught me how to streamline the datetime parsing with parsing in just three steps:

  1. First try: parse datetimes directly with pd.read_csv(); if this fails:
  2. Parse the dates converting them in UTC format, and then
  3. Convert dates to the desired timezone.

Final catch: we can do point 2 and 3 in the same instruction!

my_data.index = my_data.set_index(pd.to_datetime(
my_data.index,
utc=True,
infer_datetime_format=True).tz_convert(‘Europe/Rome’)

--

--

Marco Gabba
Marco Gabba

Written by Marco Gabba

Chemical Engineer with the passion for data and programming. I'm an avid reader, let's see if I can also write :D

No responses yet