Andrew Cross
REFERENCE IMAGE Date/Time Transformation
  1. Local variables are used to combine date and times
  2. LOAD DATA INFILE is used

In my experience, datasets often require pre-processing when they include time-based data. This is one such example. As seen below, I came across a some .csv data that divided each timestamp into two columns: date (2014-06-01) and time (14:44:03). I knew I needed to place these data in a database, and I knew I’d need to calculate the time difference between any two arbitrary timestamps.

Introduce the .csv Data

date time GHI DNI DBT
6/19/2014 04:26 2.26829 -1.27256 22.02
6/19/2014 04:27 2.66065 -0.879413 22.03
6/19/2014 04:28 3.06231 -0.724224 22.07

These data could be added to the database as-is, but I prefer to minimize the number of columns in my tables when possible. I also find it easier to work with timestamps that conform to MySQL’s standard datetime format (2014-06-01 14:44:03). By combining a given date and time into one value, I eliminate the need to manage an extra database column. It allows me to quickly calculate durations that occur over multiple days.

ddatetime GHI DNI DBT
2014-06-19 04:26 2.26829 -1.27256 22.02
2014-06-19 04:27 2.66065 -0.879413 22.03
2014-06-19 04:28 3.06231 -0.724224 22.07

Forming the SQL

To write this, I used the LOAD DATA INFILE method I wrote about last week. My SQL statement was written as I described there. The statement needed some modifying, though.

LOAD DATA LOCAL INFILE 'C:/{redacted}/file.csv'
INTO TABLE `weather`
(@date, @time, GHI, DNI, DBT)
SET ddatetime = timestamp(str_to_date(@date,'%m/%e/%Y'), @time)

The third to last row, IGNORE 1 LINES, was added due to the fact my dataset has a header row. However, the last two lines are really the point of this post.

What these last two lines do is that they take advantage of local variables. You’ll notice that I give names to each of the columns corresponding to my input data (@date, @time, GHI, DNI, DBT). When a column name is preceded by an at-sign (@), it becomes a column that is represented by a local variable. The other column names are implied to match-up with databse columns, even if the database columns aren’t necessarily in the same order.

The last SET line instructs MySQL to fill the ddatetime column in a certain way. In this case, the ddatecolumn is built from the @date and @time local variables. Timestamp is a built-in MySQL function that will convert given dates and times to the datetime format. @date cannot be fed to this function directly, though. If you’ll notice, the input date is formatted ‘6/19/2014’, whereas it needs to be formatted ‘2014-06-19’. This change in formatting can be accomplished by utilizing the str_to_date function and letting the function know how the incoming data is formatted by using the proper directives.

This technique is referred to as a transformation, if you’re looking to do a more research.

Profile picture of Andrew standing at the Southern-most point in the United States.
Andrew Cross

Andrew is currently a mechanical R&D engineer for a medical imaging company. He enjoys good food, motivated people, and road biking. He has still not completely come to terms with the fact he will never play center field for the Kansas City Royals.