- MySQL's Connector for Python is required.
- This script easily saves me 10 minutes of repetitive action every day.
I took the MySQL script I developed for uploading a local csv file to a database, and expanded it with Python. The script can now download a csv from a remote server, and upload its contents to my database.
Before I get to the details, I should to establish my setup in case someone runs into issues with my method. I’d also point out that I’m not yet a Python expert, so there may be some better methods out there. First of all, I’m working on a Windows box. I used Python(X,Y) 2.7.6.1 to install version Python 2.7 all the scientific packages that accompany it. In addition, WAMP 2.5 was also installed to give me local access to a Apache/MySQL/PHP stack.
Acquire the Dependent Modules
Coming from MATLAB, one thing I’ve discovered is that Python is not nearly as “out of the box”. Despite all the packages provided by Python(X,Y), two more are still required for this effort.
- Use pip to install the wget module.
- Download MySQL’s Connector/Python 2.0. (make sure to download version 2.0)
With the requisite packages now in place, I can divide the post into two sections: downloading the .csv, and adding said .csv to the database.
Download a csv
The .csv in question here is an weather file accessed in an API-like fashion from the National Renewable Energy Laboratory’s (NREL) Measurement and Instrumentation Data Center (MIDC). This offering is free of charge to the public. Full disclosure: the equipment that records this data is actually maintained by my research group.
One neat thing about this site is that it offers several meteorological and solar measurements in resolutions as frequent as 1-minute. I could opt to download all these measurements at once, but the easier-to-manage method is to specify only those parameters I’m concerned with. Specifically:
- Global Horizontal Irradiance (W/m^2)
- Direct Normal Irradiance (W/m^2)
- Dry Bulb Temperature (°C)
Reminder: Don’t forget to click the “Entire Day” radio button, or the data will be restricted to those times when irradiance is above 0.
Once your selections are made, click the “submit” button. Your browser will then download a csv per your selections. Examine your download link for dynamic values, then bring the link into Python.
import wget
import datetime
import mysql.connector
start = datetime.date(2014,6,19)
end = datetime.date(2014,9,15)
url = ("http://midcdmz.nrel.gov/apps/plot.pl?"
"site=UNLV&start=20060318&"
"edy=%s&emo=%s&eyr=%s&"
"zenloc=18&amsloc=20&"
"year=%s&month=%s&day=%s&"
"endyear=%s&endmonth=%s&endday=%s&"
"time=1&inst=3&inst=4&inst=9&type=data&"
"first=3&math=0&second=-1&value=0.0&"
"global=-1&direct=-1&diffuse=-1&user=0&axis=1"
%(end.day,end.month,end.year,start.year,start.month,start.day,end.year,end.month,end.day))
fileloc = (r'C:/<redacted>/%s-%s.csv'
%(str(start.year)+str(start.month)+str(start.day),str(end.year)+str(end.month)+str(end.day)))
file = wget.download(url,fileloc)
As always, the dependent modules are imported first. Next, the datetime package was used to establish date variables designating the time period I’m intending to query. A url string is then established using Python’s built-in implicit line joining syntax and the formatting operator. Now, when I change my data endpoints, the proper url will be generated for me to download just the sensor measurements I’d like over just the specified time period. The next step is to indicate exactly where on my computer I’d like this .csv to download to; as well as the file name it will have once it gets there. Lastly, wget.download is used to actually download the file.
Adding the .csv to a Database
Interfacing with a database is more involved than downloading a file, but the MySQL connector simplifies the process substantially. First, the user’s database configuration is entered as a dictionary object, then unpacked inside the mysql.connector to establish the database connection. Next, a cursor is created and a query is formed. I won’t go into the query here, but for a general idea of what I’m doing refer to my earlier article that discusses the LOAD INFILE process.
config = {
'user': 'root',
'password': '',
'host': 'localhost',
'database': '<redacted>',
'raise_on_warnings': True,
}
cnx = mysql.connector.connect(**config)
cursor = cnx.cursor()
query = ("DELETE FROM `weather`; "
"LOAD DATA LOCAL INFILE '%s' " % fileloc +
"INTO TABLE `weather` "
"FIELDS TERMINATED BY ',' "
"LINES TERMINATED BY '\\n' "
"IGNORE 1 LINES "
"(@date, @time, GHI, DNI, DBT) "
"SET ddatetime = timestamp(str_to_date(@date,'%m/%d/%Y'), @time);"
)
Lastly, the query must be executed, the changes committed, and the cursor/connection closed. The proper syntax definitely eluded me for a while. I was expecting that the (commented out) cursor.execute(query) method would run my query, but it simply would not work. The reason, as I discovered, was because my query was in reality a compound query composed of a DELETE, as well as a LOAD INFILE, statement. To run this query, the optional multi=True parameter must be specified. This particular parameter results in an iterator, allowing the cursor to execute both statements in succession. Pass is simply included to illustrate that the iterator is run on the cursor, and no other operations occur for each result that is iterated.
for result in cursor.execute(query,multi = True):
pass
#cursor.execute(query)
cnx.commit()
cursor.close()
cnx.close()
I will probably reference this in the future when I need to download a csv, so here is the script in its entirety for those that would find it useful.