A basic look at APScheduler and PyMongo

I currently working on getting a project up and running using APScheduler and PyMongo, a python distribution for the NoSQL database MongoDB. Scheduling tasks in any language is something I have always been interested in. The ability to schedule a task and have it repeat at set intervals has a lot of potential uses, from making a twitter bot to scraping information on an hourly to daily basis from a website. Advanced Python Scheduler is easy to use and works perfectly well for my current project.

A twitter bot is something I also want to make someday but my immediate need is web scrapping so I need somewhere to store the data I pull down. I’ve been doing some MongoDB tutorials recently and also the information that I am pulling down will change in shape over time, making a schema-less NoSQL database optimal. Below I’m going to show a toy example using APScheduler and PyMongo.

I have included the below code on my Github profile which also includes links to downloadling APScheduler, PyMongo and MongoDB if you don’t have them already. Also there is notebook covering PyMongo basics if you have never used PyMongo or MongoDB before.

For this example I’m going to pull data from the CoinDesk API on the current price of bitcoin and store it in a MongoDB. APScheduler will run a function to do this every minute and after I finish I will be able to use the data in pandas and plot it with matplotlib. This is a great advantage of using PyMongo as it allows me to leverage all the other tools and modules in python and use them with the data.

Making sure to have one running, make a connection to the mongod instance:

from pymongo import MongoClient

client = MongoClient()
bitcoin = client.test_database.bitcoin

On the third line I create a database called test_database and a collection called bitcoin if these do not already exist.

import urllib2
import requests
from apscheduler.schedulers.blocking import BlockingScheduler
import datetime

def get_price():
    response = requests.get("http://api.coindesk.com/v1/bpi/currentprice.json")
    bitcoin_response = response.json()
    price = bitcoin_response[’bpi’][’EUR’][’rate_float’]
    time = datetime.datetime.now()
    bitcoin.insert({"time" : time, "price" : price})

Next is the function that will be called by APScheduler. urllib2 and requests are imported to allow me to access the CoinDesk API and pull out the bitcoin price in Euro. When the function is run the current price and time will be inserted into the bitcoin collection.

bitcoin.remove({}) #Added to empty out the collection the first time the code is run

scheduler = BlockingScheduler()
scheduler.add_job(get_price, ’interval’, minutes=1)

except (KeyboardInterrupt, SystemExit):

The APScheduler documentation has information on a number of different schedulers which can be used but BlockingScheduler above is the easiest. The scheduler is started by calling add_job() and passing it the function we want called, the "interval" trigger and the interval that we want the function run at. Here the function get_price will run every minute. There are a number of different triggers which add_job() can take, such as cron.

I let this code run for a while and then stopped it. By importing pandas I was able to convert the collection into a Dataframe:

bitcoin_df =  pd.DataFrame(list(bitcoin.find()))

and from here graph it in matplotlib.