- Local variables are used to combine date and times
- LOAD DATA INFILE is used
In my experience, datasets often require pre-processing when they include time-based data. This is one such example. As seen below, I came across a some .csv data that divided each timestamp into two columns: date (2014-06-01) and time (14:44:03). I knew I needed to place these data in a database, and I knew I’d need to calculate the time difference between any two arbitrary timestamps.
Introduce the .csv Data
date | time | GHI | DNI | DBT |
---|---|---|---|---|
6/19/2014 | 04:26 | 2.26829 | -1.27256 | 22.02 |
6/19/2014 | 04:27 | 2.66065 | -0.879413 | 22.03 |
6/19/2014 | 04:28 | 3.06231 | -0.724224 | 22.07 |
These data could be added to the database as-is, but I prefer to minimize the number of columns in my tables when possible. I also find it easier to work with timestamps that conform to MySQL’s standard datetime format (2014-06-01 14:44:03). By combining a given date and time into one value, I eliminate the need to manage an extra database column. It allows me to quickly calculate durations that occur over multiple days.
ddatetime | GHI | DNI | DBT |
---|---|---|---|
2014-06-19 04:26 | 2.26829 | -1.27256 | 22.02 |
2014-06-19 04:27 | 2.66065 | -0.879413 | 22.03 |
2014-06-19 04:28 | 3.06231 | -0.724224 | 22.07 |
Forming the SQL
To write this, I used the LOAD DATA INFILE method I wrote about last week. My SQL statement was written as I described there. The statement needed some modifying, though.
LOAD DATA LOCAL INFILE 'C:/{redacted}/file.csv'
INTO TABLE `weather`
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(@date, @time, GHI, DNI, DBT)
SET ddatetime = timestamp(str_to_date(@date,'%m/%e/%Y'), @time)
The third to last row, IGNORE 1 LINES, was added due to the fact my dataset has a header row. However, the last two lines are really the point of this post.
What these last two lines do is that they take advantage of local variables. You’ll notice that I give names to each of the columns corresponding to my input data (@date, @time, GHI, DNI, DBT). When a column name is preceded by an at-sign (@), it becomes a column that is represented by a local variable. The other column names are implied to match-up with databse columns, even if the database columns aren’t necessarily in the same order.
The last SET line instructs MySQL to fill the ddatetime column in a certain way. In this case, the ddatecolumn is built from the @date and @time local variables. Timestamp is a built-in MySQL function that will convert given dates and times to the datetime format. @date cannot be fed to this function directly, though. If you’ll notice, the input date is formatted ‘6/19/2014’, whereas it needs to be formatted ‘2014-06-19’. This change in formatting can be accomplished by utilizing the str_to_date function and letting the function know how the incoming data is formatted by using the proper directives.
This technique is referred to as a transformation, if you’re looking to do a more research.