DATA ANALYSIS

Datasets with several timezones

2 min readApr 23, 2021

How to deal with daylight saving time in Pandas

Questo articolo è disponibile anche in italiano, qui

Dates and times: your best friend and your worst enemy when it comes to handling a dataset.

In my everyday job, I often deal with timeseries analysis and historical data.
One critical aspect in the datetime format, especially if the timestep is hourly or lower: daylight saving time creates two discontinuities that can create errors in the data handling or interpretation if overlooked.

In this story, I’ll show you a practical example and the solution I adopted.

The data

Data was stored in a database; I was working on a subset stored in .csv for easy handling with Jupyter Notebook.

Here is an example:

datetime, variable1, variable2
2020–01–01, 00:00:00+01, 1, 2
2020–01–01, 00:15:00+01, 3, 5
2020–01–01, 00:30:00+01, 7, 9

This was the first time I had to handle data with the “+01” timezone bit in the timestamp, so I decided to just try and let Pandas handle it.

my_data = pd.read_csv(filepath, 
    index_col = ‘datetime’, 
    parse_dates=True)
my_data.index.dtypeOUTPUT: dtype(‘O’)

Damn! Pandas wasn’t able to convert the datetime column to datetime format! A quick online search solved the mystery: datetimes were recognized with two different timezones.

Let’s see the data around daylight saving time start:

datetime, variable1, variable2
2020–03–29 01:30:00+01, 65, 7
2020–03–29 01:45:00+01, 39, 11
2020–03–29 03:00:00+02, 24, 4
2020–03–29 03:00:15+02, 12, 9

So, Pandas is not able to understand that the notation simply indicates daylight saving time. What should we do now?

UTC, the machine-friendly timezone

Our goal now is to choose a datetime format easy to handle with Pandas.
The library itself has a solution: the datetime parser has a parameter that allows converting each string in UTC datetime, avoiding the daylight saving time ambiguities.

my_data[‘Datetime’] = pd.to_datetime(
    Medium.index, 
    utc=True, 
    infer_datetime_format=True)

EUREKA! Now the datetimes are in datetime64 format!
Last step: put the column as the index and I’m ready to roll!

my_data = my_data.set_index(‘datetime’)

Or so I believed…

Back to the future

Working with the data was now really impractical: my datetime filters were written in GMT+1 timezone and I wanted all my data visualization expressed in my timezone!

Luckily, Pandas got me covered again: converting the parsed dates into my timezone was easy as writing one line of code.

my_data.index = my_data.index.tz_convert(‘Europe/Rome’)

That’s all Folks!

This experience taught me how to streamline the datetime parsing with parsing in just three steps:

First try: parse datetimes directly with pd.read_csv(); if this fails:
Parse the dates converting them in UTC format, and then
Convert dates to the desired timezone.

Final catch: we can do point 2 and 3 in the same instruction!

my_data.index = my_data.set_index(pd.to_datetime(
   my_data.index, 
   utc=True, 
   infer_datetime_format=True).tz_convert(‘Europe/Rome’)

DATA ANALYSIS

Datasets with several timezones

The data

UTC, the machine-friendly timezone

Back to the future

That’s all Folks!

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Marco Gabba

No responses yet