NumPy Datetime: How to Work with Dates and Times in Python?

In this article, we’ll be learning about something that is literally everywhere. Whatever corner you turn, whatever street you run down, you can’t get away from it. It is as ubiquitous as the physical space around us. Yes today, we’re talking about… TIME. More specifically, we’re talking about NumPy’s functions that represent dates and times.

Exercise: Create a NumPy datetime objects from your birthday! Now, calculate the number of days that have passed since then.

When I first heard about NumPy’s datetime library, I didn’t think it was a big deal. Why do we need special functions to deal with dates? They’re pretty simple; can’t we just use strings like '2019/01/01' and be done with it?

Well, younger version of me, it turns out they are rather complicated and so we can’t ‘just use strings’…

If you’re not convinced, try answering any of the following questions using ‘just strings’:

  1. How many days are there in a month?
  2. How many seconds are there between 1st March 2019 at 1 pm and 4th March 2019 at 2 am exactly? 
  3. How many business days are there between 1st Jan 1970 (the Unix epoch) and 3rd December 2008 (the day Python 3 was released)?

Evidently, we need dates and times to have their own functions. And don’t worry, we’ll answer all those questions by the end of the article.

A Gentle Intro

Although we can’t ‘just use strings‘ to represent dates and times, we will use strings as input to the main function we’ll be working with: np.datetime64().

Note: the 64 means the numbers are 64 bits.

NumPy uses the following (common sense) abbreviations for units of time, note the capitalization where it occurs:

AbbreviationMeaning
YYears
MMonths
WWeeks
DDays
hHours
mMinutes
sSeconds
msMilliseconds
usMicroseconds (since ΞΌs (the Greek letter ‘mu’) is the way we’d hand write it and ‘u’ looks closest to this in English).
NumPy abbreviations for units of time

The Input Format – ISO 8601

Different countries write their dates differently. In Europe, 06/10/18 is the 6th October and in America it’s June 10th. In-person, it can be annoying but in business, it’s critical (imagine your program running until 6th October when you wanted it to stop on 10th June)! So, we need a standardized input for our functions. Thankfully, a global standard already exists ISO 8601

The basics are:

  • YYYY-MM-DD for dates, so 2018-10-06 is 6th October 2018
  • hh:mm:ss for times, so 13:21:40 is 13:21 (1.21 pm) and 40 seconds
  • For milliseconds use a full-stop, 13:21:40.3 (adding on 3 milliseconds).

This is easy to remember as the dates and times go left to right from big to small. This will be important later on when we do arithmetic with datetime objects.

To input this to NumPy, we can either

  • Put a space between the date and time '2019-10-23 13:21'
  • Or put a capital T between them '2019-10-21T13:21'

I prefer the first method as it’s easier to read. But note that if you print out any np.datetime64 functions, NumPy always inserts a T.

Now it’s time for our first examples (I’ve tried to pick dates that are easy to read for learning purposes).

# 2000
>>> np.datetime64('2000')
 
# November 2000
>>> np.datetime64('2000-11')
 
# 22nd November 2000
>>> np.datetime64('2000-11-22')
 
# 7th June 1987 at 16:22:44 (twenty two minutes past four in the afternoon and forty four seconds)
>>> np.datetime64('2000-11-22 16:22:44')
numpy.datetime64('2000-11-22T16:22:44')

Rescaling

Given a specific datetime, you can change it to a specific time unit by entering it as a second parameter.

For example, let’s say we have a specific day but only want the month:

# 22nd November 2000
>>> day = np.datetime64('2000-11-22')
 
# Just keep the month
>>> month = np.datetime64(day, 'M')
numpy.datetime64('2000-11')
 
# Change the scale of month to hours
>>> hour_from_month = np.datetime64(month, 'h')
numpy.datetime64('2000-11-01T00','h')
 
# Change the scale of day to hours
>>> hour_from_day = np.datetime64(day, 'h')
numpy.datetime64('2000-11-22T00','h')

Note that the DD part for hour_from_month and hour_from_day are different. The former has 01 and the latter 22. This is because NumPy reverts to default values if none are specified. These are, logically, 01 for M and D and 0 for time variables.

To check the time unit of a np.datetime64 object, we simply use the .dtype attribute:

>>> day.dtype
dtype('<M8[D]')
 
>>> month.dtype
dtype('<M8[M]')

Easy.

Datetime Arithmetic

Now we have the absolute basics down so it’s time to step it up a notch. Can you answer the following question?

What comes next after 12pm on 12th December 1999?

Depending on the time unit you are counting in, it could be anything:

  • Counting forwards in hours: 1pm on 12th December 1999
  • Counting backwards in days: 12pm on 11th December 1999
  • Counting forwards in years: 12th December 2000

You get the idea.

As we have already seen, each np.datetime64 object has a time unit associated with it. So if we '+ 5' to our ‘day’ object, we will go forwards 5 days. And if we '- 5' from our month object, we’ll go backwards 5 months:

# 22nd November 2000 + 5 days
>>> day + 5
numpy.datetime64('2000-11-27')
 
# November 2000 - 5 months
>>> month - 5
numpy.datetime64('2000-06')

But what if we want to look at Christmas Day every year for the last 50 years? We can’t start with 2018-12-25 and '- 1' because that is of time unit 'D' and so gives us 2018-12-24.

NumPy solved this problem by introducing another function with a cool sci-fi name: np.timedelta64.

np.timedelta64

Any arithmetic that’s more complex than adding or subtracting integers involves np.timedelta64. The equation will either return an np.timedelta64 object, or you will need to use one to get a result.

If you’ve understood everything up to this point, you will easily understand this. It’s best explained through examples.

The number of days between 1st Jan 2013 and 1st Jan 2012 is 366 as it was a leap year.

>>> np.datetime64('2013-01-01') - np.datetime64('2012-01-01')
numpy.timedelta64(366,'D')
 
# Add on 15 days to June 2000
>>> np.datetime64('2000-05') + np.timedelta64(15, 'D')
numpy.datetime64('2000-05-16')
 
# What is 5 hours after 1pm on 22nd Nov 2000?
>>> np.datetime64('2000-11-22 13:00') + np.timedelta64(5, 'h')
numpy.datetime64('2000-11-22T18:00')

Each np.timedelta64() object takes a single integer and single time unit. Thus to add on 4 months and 3 days, you have two options.

The first is to use two instances of np.timedelta64:

# Add on 4 months and 3 days
>>> some_date + np.timedelta64(4, 'M') + np.timedelta64(3, 'D')

The second is to convert the separate np.timedelta64 objects into one using division (or modulo):

# 1 day is 24 hours
>>> np.timedelta64(1, 'D') / np.timedelta64(1, 'h')
24.0
 
# 1 week is 10,080 minutes
>>> np.timedelta64(1, 'W') / np.timedelta64(1, 'm')
10080.0
 
# 1 month is ??? days
>>> np.timedelta64(1, 'M') / np.timedelta64(1, 'D')
TypeError: Cannot get a common metadata divisor for NumPy datetime metadata [M] and [D] because they have incompatible nonlinear base time units

Note that both month and year are not a defined length. So you cannot convert them into other, smaller units.

Most people would agree that one month after 31st January is 28th February. But what is one month after 28th February? 28th March or 31st March?

So NumPy throws an error if you try to change the month/year of an np.datetime64 object with time unit 'D' or smaller.

# Both work because time unit is 'M'
 
# Add 1 month
>>> np.datetime64('2000-02') + np.timedelta64(1, 'M')
numpy.datetime64('2000-03')
 
# Add 1 year
>>> np.datetime64('2000-02') + np.timedelta64(1, 'Y')
numpy.datetime64('2001-02')

Once we get more precise, the values of month and year are non-constant, and NumPy throws an error.

# Neither work because time unit is 'D'
 
# Add 1 month
>>> np.datetime64('2000-02-01') + np.timedelta64(1, 'M')
TypeError: ...
 
# Add 1 year
>>> np.datetime64('2000-02-01') + np.timedelta64(1, 'Y')
TypeError: ...

We are now equipped to build a list of any dates and times our heart desires!

So let’s build one containing all the Christmas Days for the last 50 years.

We’ll start with the most recent December. 2018-12, and iterate over it 50 times backwards.

For the first iteration, we subtract one year (to get 2017); for the second we subtract 2 years (to get 2016), and so on. Finally, in each iteration, we add on 24 days (since the default day is 01).

We’ll do all of this using one of Python’s most loved aspects: a list comprehension!

# Start with 2018-12
>>> all_christmas_days = [np.datetime64('2018-12') \
                          - np.timedelta64(i, 'Y') \
                          + np.timedelta64(24, 'D')
                          for i in range(50)]
 
# Contains 50 years
>>> len(all_christmas_days)
50
 
# First 3 years
>>> all_christmas_days[:3]
[numpy.datetime64('2018-12-25'),
 numpy.datetime64('2017-12-25'),
 numpy.datetime64('2016-12-25')]
 
# Last 3 years (we start counting at -1 and the end index is excluded)
>>> all_christmas_days[:-4:-1]
[numpy.datetime64('1969-12-25'),
 numpy.datetime64('1970-12-25'),
 numpy.datetime64('1971-12-25')]

Now, we know the answers to questions 1 and 2 I asked at the start.

  1. How many days are there in a month?
    • Undefined. It can be 28, 30, or 31… this will cause issues if your time units are 'D' or smaller.
  2. How many seconds are there between 1st March 2019 at 1 pm and 4th March 2019 at 2 am exactly? 
    • 219,600s
# Subtract both dates written with time unit 's'
>>> np.datetime64('2019-03-04 02:00:00') \
  - np.datetime64('2019-03-01 13:00:00')
 
numpy.timedelta64(219600,'s')

But what about question 3? That was about business days, so let’s learn about them now.

How Many Business Days Are There Between Two Days?

We can build any list of datetimes using the tools above. But there are a few things we can use to make our lives easier.

Businesses usually only care about what happens on their days of operation. So it would be nice to have a set of functions to deal with this.

There would be a lot of unnecessary data points if we had to include Saturday and Sunday in stock data analysis.

Thankfully NumPy handles this using the concept of Business Days with the np.busday() function.

There are a few of them and to explain them all in detail is another blog post in itself. So, I’ll focus on the most important ones to get us up and running: np.is_busday(), np.busday_count(), and np.busday_offset().

But first, we need to cover an overarching concept.

Weekmask

Central to these functions is the keyword ‘weekmask‘. This variable specifies which days are considered business days.

The following weekmasks all set the business days to be Monday, Tuesday, Wednesday, Thursday, and Friday (the default behavior):

weekmask_names = 'Mon Tue Wed Thu Fri'
weekmask_string = '1111100'
weekmask_list = [1, 1, 1, 1, 1, 0, 0]
  • weekmask_names – write the capitalized, three-letter abbreviations (Mon, Tue, Wed, Thu, Fri, Sat, Sun). Whitespace is ignored.
  • weekmask_string – a string of length 7 where ‘1’ == business day and ‘0’ == non-business day (starting from Mon)
  • weekmask_list – a list of length 7 where 1 == business day and 0 == non-business day (starting from Mon)

I’ll use these interchangeably in the examples for learning purposes. I recommend you pick one style and stick to it in your own code to aid readability.

πŸ’‘ Note: the input to np.busday functions is not a string but rather an np.datetime64() object.

np.is_busday()

Returns a Boolean: True if the day is a business day, False if it isn’t.

# Wednesday 23rd October 2019
>>> np.is_busday(np.datetime64('2019-10-23'))
True
 
# Saturday 19th October 2019
>>> np.is_busday(np.datetime64('2019-10-19'))
False
 
# Saturday 19th October 2019 with Saturday classed as a business
# day in addition to Mon-Fri
>>> np.is_busday(np.datetime64('2019-10-19'), weekmask='1111110')
True

πŸ’‘ Note: you can use this functionality to create lists of any specific day(s). Just set weekmask to the days of interest and use np.is_busday() to select the appropriate ones.

# Create Wed and Sat boolean array (True if day is Wed or Sat)
>>> wed_sat_mask = np.is_busday(list_of_dates, weekmast='Wed Sat')
 
# Filter list to return just Wed and Sat
>>> only_wed_sat = list_of_dates[wed_sat_mask]

np.busday_count()

Returns the number of business days in between the two dates you provide.

Now we can answer question 3:

  1. How many business days are there between 1st Jan 1970 (the Unix epoch) and 3rd December 2008 (the day Python 3 was released)?
    • 10,154 days
>>> np.busday_count('1970-01-01', '2008-12-03')
10154

np.busday_offset()

Returns a valid business day according to the roll rule and the number of days to offset.

We will be using this to find the closest business day for any given date we pass to this function.

For example, we want to find the price of the stock market on 18th May 2019. This is impossible since that is a Saturday and the stock market is closed. We’ll use this function to get the np.datetime64 object closest to it that is a business day.

The arguments are: np.datetime64 object, number of days to offset, and the β€˜roll’. We will just use offset=0 (i.e. no offset) to find the closest business day. And we will set roll='forward'. Thus if we input a Saturday, it should return Monday. If we use roll='backward', we would get Friday (the day before Saturday).

For simplicity, we will ignore more complex use cases. Please refer to the documentation if you’d like more detailed explanations.

# Saturday 18th May 2019
>>> sat_may_18 = np.datetime64('2019-05-18')
 
# Closest business day forwards in time is Monday 20th May 2019
>>> np.busday_offset(sat_may_18, 0, roll='forward')
numpy.datetime64('2019-05-20')
 
# Closest business day back in time is Friday 17th May 2019
>>> np.busday_offset(sat_may_18, 0, roll='backward')
numpy.datetime64('2019-05-17')

np.datetime64 with np.arange

The last thing we’ll look at to make date generation even easier is combining it with the np.arange() function.

Remember that the arguments for np.arange are almost identical to the built-in range() function. But there is an added 'dtype' keyword argument whose default is None. Like range, the stop in np.arange is exclusive and so not included in the calculation.

np.arange(start, stop, step, dtype=None)

🌍 Want to understand the NumPy arange() function once and for all? Read my blog tutorial: The ultimate guide to np.arange() with video.

When working with datetimes, you must include the start and stop date and set the dtype.

# All days in 2017
>>> days_2017 = np.arange('2017-01-01', '2018-01-01', 
                           dtype='datetime64[D]')
 
# Every other day in 2017
>>> days_step_2_2017 = np.arange('2017-01-01', '2018-01-01', 2,
                                  dtype='datetime64[D]')

And that is everything you need to know about datetimes in NumPy! Phew, that was a lot.

To cap it off, let’s visualize the US stock market over the last decade using different time intervals.

How to Visualize The Stock Market with NumPy’s datetime?

For the following example, I’ve downloaded the last 10 years of stock market data for the S&P 500. You can freely download it here.

I’ve done the preprocessing away from this article and ended up with two lists. The first, values, contains the value of the S&P 500 index at the close of every day from 2009-10-23 until 2019-10-22. The second, datetimes, contains np.datetime64 objects for each day.

>>> values[:5]
[1079.6, 1066.95, 1063.41, 1042.63, 1066.11]
 
>>> datetimes[:-6:-1]
[numpy.datetime64('2019-10-22'),
numpy.datetime64('2019-10-21'),
numpy.datetime64('2019-10-18'),
numpy.datetime64('2019-10-17'),
numpy.datetime64('2019-10-16')]

I zipped these lists together to create a dictionary where each key is a date and each value an S&P 500 value. We will use this to generate subsets of our data later on by only selecting keys that we want.

# Dictionary comprehensions are equally as wonderful as list comprehensions
sp500 = {date: val for date, val in zip(datetimes, values)}

Plot 1 – All the Data

plt.plot(datetimes, values)
plt.xlabel('Year')
plt.ylabel('SP500 Index')
plt.title('SP500 Index Every Day from 2010-2019')
plt.show()

There is a huge upward trend across the whole dataset. But the graph is quite noisy. It is possible to see other trends but because there are so many points, it’s not particularly nice to look at.

What if we re-sampled to see how the market performed year on year? To do this, we’ll look at 1st January every year.

Plot 2 – 1st Jan

πŸ’‘ Note: 1st Jan is a holiday every year and so the stock market is not open. So we’ll use np.busday_offset() to select the closest valid date for us.

First, create a list of every year using np.arange().

all_years = np.arange('2010', '2020', dtype='datetime64[Y]')

Since the values of datetimes are of unit time 'D', we must convert the dates in all_years to that too. This defaults each element to YYYY-01-01.

first_jan_dates = [np.datetime64(date, 'D') for date in all_years]

Finally, we apply the .get() method to return the elements we want from sp500. And we wrap them in np.busday_offset() to ensure we are selecting a business day.

first_jan_values = [sp500.get(np.busday_offset(date, 0, roll='forward'))
                    for date in third_jan_dates]

Now, we plot.

This is a much smoother plot and it is very easy to grasp which years had positive and negative growth. Such is the power of resampling!

But is this graph too general? To get a nice middle ground, let’s look at the S&P 500’s value at the start of every quarter over the last 10 years. The process is almost identical to the one we followed above.

Plot 3 – Every Quarter

First, create a list of every quarter in the form of YYYY-MM. Remember quarters are 3 months long!

every_quarter = np.arange('2010-01', '2019-10', 3, dtype='datetime64[M]')

Re-scale our datetime objects to 'D' using a list comprehension.

quarter_start_dates = [np.datetime64(date, 'D') for date in every_quarter]

Finally, we apply the .get() method to return the elements we want. And we wrap them in np.busday_offset() to ensure we are selecting a business day.

quarter_start_values = [sp500.get(np.busday_offset(date, 0, roll='forward'))
                        for date in quarter_start_dates]

Now we plot

plt.plot(quarter_start_dates, quarter_start_values)
plt.show()

This gives us a lovely overview of the trends of the stock market. It is not overly noisy (like plot 1) or overly simplistic (like plot 2). The intra-year dips are clear to see yet the graph is still easily understandable.

And that is that! Everything you’ll ever need to know to use np.datetime64 and its associated functions, along with some real-world examples.

If you have any questions or suggestions, please subscribe and ask. We love to hear feedback and suggestions!

Attribution

This article is contributed by Finxter user Adam Murphy (data scientist):

I am a self-taught programmer with a First Class degree in Mathematics from Durham University and have been coding since June 2019.

I am well versed in the fundamentals of web scraping and data science and can get you a wide variety of information from the web very quickly.

I recently scraped information about all watches that Breitling and Rolex sell in just 48 hours and am confident I can deliver datasets of similar quality to you whatever your needs.

Being a native English speaker, my communication skills are excellent and I am available to answer any questions you have and will provide regular updates on the progress of my work.

Where to Go From Here?

A thorough understanding of the NumPy basics is an important part of any data scientist’s education. NumPy is at the heart of many advanced machine learning and data science libraries such as Pandas, TensorFlow, and Scikit-learn.

If you struggle with the NumPy library β€” fear not! Become a NumPy professional in no time with our new coding textbook β€œCoffee Break NumPy”. It’s not only a thorough introduction to the NumPy library that will increase your value to the marketplace. It’s also fun to go through the large collection of code puzzles in the book.

Get your Coffee Break NumPy!

References

[1] https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html

[2] https://www.iso.org/iso-8601-date-and-time-format.html

[3] https://fred.stlouisfed.org/series/SP500/ download SP500 data here

[4] https://en.wikipedia.org/wiki/History_of_Python 

[5] https://docs.scipy.org/doc/numpy/reference/generated/numpy.busday_offset.html#numpy.busday_offset

Programming Humor

πŸ’‘ Programming is 10% science, 20% ingenuity, and 70% getting the ingenuity to work with the science.

~~~

  • Question: Why do Java programmers wear glasses?
  • Answer: Because they cannot C# …!

Feel free to check out our blog article with more coding jokes. πŸ˜‰