Wednesday, April 20, 2016

http://fxgears.com/forum/index.php?topic=361.0


 

Using Python for Trading/Analysis - Development Journal

Offline Jack

  • *****
  • User post count 1,159
    • View Profile
    •  
    • FXGears
http://fxgears.com/forum/index.php?topic=361.0


From time to time, I'm going to drop tips and tricks for what I've learned while using Python for trading and analysis.

This will not be a frequently updated journal. And heck, some of it will be just intermediate programming tips when it comes to using Python. But I'll mention the things that I've discovered to be useful as I go..

This will not be "programming 101", or any basic introduction to Python. That means if you don't already know the basics, this thread might be of little use to you.

I also won't be providing complete applications here... actually, maybe the odd simple program as they come up, but in general I'm more interested in talking about the tools used to build out applications than to provide and support apps for others.

Most of the applications I've written so far are based on equity markets. However, a lot of this can be applied to trading forex (or any other market for that matter.)

Since Oanda just released their REST API, I will try to give examples that utilize it when appropriate.

I'll try to provide examples that will work on Python 3.0, but my target is always going to be Python 2.7 given it has the most 3rd party libraries available and has the most example code floating around the net. Lastly, while I can't imagine this being a problem, most of my coding is targeted to run on both Linux and Windows, so sometimes there might be Windows specific elements to them, but with the basics and stuff I'll write about first this shouldn't even be apparent.

Also, I have another journal for algo trading on MT4.. which I admit is a bit stale. If anything that comes up could be applied to MT4, I'll post it there. This thread specifically is about Python and trading.

I'd like to mention that I'm not a professional programmer. I learned on my own over the years and expanded upon my scripting experience back in my IT days. I invite any seasoned programmer to correct, or otherwise show me more efficient ways of accomplishing stuff I write about. And on that note, I obviously do not provide any warranty to anything you find written here.

Back soon with the first installments. :)
« Last Edit: June 21, 2014, 11:12:11 am by Jack »

Offline Jack

  • *****
  • User post count 1,159
    • View Profile
    •  
    • FXGears

So, what's a "REST API" anyway? 

A new trend in the finance world is to provide APIs a la REST. It's becoming more popular with brokers, institutions, and data providers.

As mentioned in the first post, Oanda just opened up their REST API to the public, but heck, my work even uses a REST hybrid API...

REST stands for "Representational state transfer"

I could quote straight out of the Wikipedia article but the big take away here is that REST APIs use simple HTTP requests to send commands and retrieve data. When your API interface is just HTTP requests, that opens up what can access it to anything with a URL library or the ability to access URLs... Anything... not just your programming language of choice (and mine is Python,) but even your web browser, or Excel...

So with a REST API, you're not dependent on your API provider/broker to make an interface for the language/environment of your choice.

There are some drawbacks; Typically REST APIs are what we call "polling" based. Meaning if I want to get a quote update, I have to ask for it via a URL request (poll it), I don't just get it streaming to me in real time. If I want to know the price every few milliseconds, I have to ask for it via my program every few milliseconds. This typically will not affect the average trader, but it is something to consider when structuring how your application interacts with the API... as some APIs providers/brokers will have a rate limit set on how often you can send requests.

Offline rod178

  • *****
  • User post count 1,282
    • View Profile

Python is a language that I would like to become acquainted. The closest I've come is Ruby, and that was for telecommunication apps, not FX.

The sort of guidance I would appreciate is some recommendations on third party libraries.

Which third party libraries do you find the most applicable to FX, eg Numpy, scripy etc ?
which IDE  eg Eric IDE, Spyder, or PyDev(Eclipse plugin) or iPython?
Better a Goat than a Sheep. Better a Shark than a Fish

Offline Jack

  • *****
  • User post count 1,159
    • View Profile
    •  
    • FXGears

Python is a language that I would like to become acquainted. The closest I've come is Ruby, and that was for telecommunication apps, not FX.

The sort of guidance I would appreciate is some recommendations on third party libraries.

Which third party libraries do you find the most applicable to FX, eg Numpy, scripy etc ?
which IDE  eg Eric IDE, Spyder, or PyDev(Eclipse plugin) or iPython?

I've just been using IDLE that comes with the default install. It's not perfect, but it works, and is decent for debugging (lets you jump in and call variables and functions when at the stat the program errors out in the IDLE Python Shell.

I'm going to get into libraries I discovered and use as I contribute to this thread, but I will say that numpy and Pandas are awesome. (I'll post a few vids I found that were useful in diving into Pandas.)

Offline Jack

  • *****
  • User post count 1,159
    • View Profile
    •  
    • FXGears

This is the first part of a 12 part series on Pandas:



While you might not want to do sentiment analysis (more of an investing thing than trading,) the walk through explains how to practically use Pandas on data sets.

Offline Jack

  • *****
  • User post count 1,159
    • View Profile
    •  
    • FXGears

(General programming tip..)

When working with large log/text files, seek() and tell() are your friends!

Often I'm working with log files that get written to in near real time by a platform to tell me the level 1 quote, level 2 book, time and sales info, etc...

In python, reading a file is a pretty trivial matter... open(filename) and "for i in openedFiled:" usually get the job done. The problem is, with price data and trading, these files tend to get rather large rather quickly.

A tick level price (time and sales) file on an active stock on any major exchange can easily grow to 20-50+mbs in size.. and trying to find out what the last price update (line) is on a file that big would take a few seconds if we keep reading line by line til we get to the end.

So, we can use seek() to jump ahead as far as we need in the file so we only waste time reading the last few lines.

In this example, we'll call the function "tail" since I was aiming to mimic the way the tail command works on *nix systems. We will jump ahead to the last 1024bytes in the file, and read just these... which should comprise of the last few lines.. if the individual line in your given log/text file is hella long, you can just increase this 'buffer'. We then test for a new line character ("\n") in the file and read anything after that (while ignoring the very last new line at the end of the file.)

(this code isn't fully my own, I found it online and adapted it to fit my needs.. I do not take full credit for it.)
Code: [Select]
def tail(in_file, block_size=1024, ignore_ending_newline=True):
    in_file = open(in_file)
    suffix = ""
    in_file.seek(0, os.SEEK_END)
    in_file_length = in_file.tell()
    seek_offset = 0
    while(-seek_offset < in_file_length):
        # Read from end.
        seek_offset -= block_size
        if -seek_offset > in_file_length:
            # Limit if we ran out of file (can't seek backward from start).
            block_size -= -seek_offset - in_file_length
            if block_size == 0:
                break
            seek_offset = -in_file_length
        in_file.seek(seek_offset, os.SEEK_END)
        buf = in_file.read(block_size)
        # Search for line end.
        if ignore_ending_newline and seek_offset == -block_size and buf[-1] == '\n':
            buf = buf[:-1]
        pos = buf.rfind('\n')
        if pos != -1:
            # Found line end.
            in_file.close()
            return buf[pos+1:] + suffix
        suffix = buf + suffix
    # One-line file.
    in_file.close()
    return suffix

(Of course, you'll have to open() the file first (doesn't read it, just maps it to an object in memory) before you can pass that object name to this function as the in_file variable.)

Now we're skipping over any size of text, be it 1Mb, or 100Mb... and just grabbing the most recent info.

With FX, you might be working with large CSV formatted files, and this would work for them as well to grab the last line entry.

We can extend the use of seek() in another direction. Say we do want the latest lines in a file, but we also need to know the values between the last time we read the file and it's new last line at the present (for instance, you need OHLC values minute by minute and all price data is stored in a log file being actively written to by your broker's software.. instead of reading the entire file each time you want to update the candlestick values you're storing in memory, let's skip over what we already know about and only read the new lines.)

A good example of where we'd want to do this is when reading textual data of L2 changes. Storing every single value of the L2 book each time one thing in that book changes is silly, so most vendors only update the data with whatever is new and let's you assume anything that hasn't been overwritten is still unchanged on the book itself... so now we need to read the file, store a consolidated version of the data locally, and store the point in the file we read up to. Then, later, when we need an updated version of the book, we can pick up where we left off in the file, grab the consolidated book we stored, and continue reading/updating as we go for whatever's new and hasn't been read yet:

(this code I wrote entirely myself..)
Code: [Select]
L2Store = {'ZVZZT.NQ':[0,{}],}

def levelTwo(target, ECN="ALL", *args):
    
    # Get file path
    namefile = filepath(target, "L2")
    
    # Load the local memory store of L2 books..
    global L2Store
    
    # Extract the L2 dicts for use... will file the updated dict when done.
    L2 ={}
    for i in L2Store[target][1]:
        L2[i] = L2Store[target][1][i]
        
    # Seek to the last point in the file we previously processed
    offset = L2Store[target][0]
    f = open(namefile)
    f.seek(offset)
    
    # Read file from where we left off or from 0 if first time reading..
    for i in f:
        if i[-1:] != "\n":
            if extraVerbose: print("Warning: Caught L2 log file mid write or with incomplete line.")
            break
        offset = f.tell() # Update the last point of the file reference
        line = i.split(",")
        activeECN = line[3].replace("Mmid=", "")
        if activeECN == '':
            continue
        if activeECN not in L2:
            L2[activeECN] = {}
        if float(line[5].replace("Price=", "")) in L2[activeECN] and float(line[8].replace("SequenceNumber=", "").replace("\n", "")) > L2[activeECN][float(line[5].replace("Price=", ""))][2]:
            if line[6].replace("Volume=", "") == "0":
                del L2[activeECN][float(line[5].replace("Price=", ""))]
            else:
                L2[activeECN][float(line[5].replace("Price=", ""))] = [line[4].replace("Side=", ""), line[6].replace("Volume=", ""), float(line[8].replace("SequenceNumber=", ""))]
        elif float(line[5].replace("Price=", "")) not in L2[activeECN]:
                L2[activeECN][float(line[5].replace("Price=", ""))] = [line[4].replace("Side=", ""), line[6].replace("Volume=", ""), float(line[8].replace("SequenceNumber=", ""))]
        else: continue
    f.close()

    # Store new offset reference number and depth of market dict for symbol in the L2Store
    L2Store[target] = [offset, L2]

    requestedBook = []
    
    if ECN == "ALL":
        for ATS in L2:
            for i in L2[ATS]:
                requestedBook.append(ATS + " " + L2[ATS][i][0]+ " "+ str(i) + " " + L2[ATS][i][1])
        return ",".join(requestedBook)        
    else:
        for i in L2[ECN]:
            requestedBook.append(ECN + " " + L2[ECN][i][0]+ " "+ str(i) + " " + L2[ECN][i][1])
        return ",".join(requestedBook)

Note: there's a lot this module is pulling from (other modules and variables, etc..) that isn't in the code block. It's just an example, it won't work standalone, but you can read through it to see what it's doing.

Also, this module ends up returning a string with comma separated values, which is then broken down by another bit of code for use in a strategy. It doesn't return a list or array of data directly because most of the time what it returns gets sent over a network or at least to another process/app running on the same machine. So I felt that a string was easier to deal with than diving into creating and reading binary blobs.

Anyway, in the module, we store a copy of the current depth of market, and with that, a seek point in the L2 log file (as a reference in bytes deep into the file) in a global variable called L2Store. We initialize L2Store with dummy data, which isn't quite needed, but I keep it there to remind me how it organizes data for when I edit this module from time to time.

Calling tell() on the file as you read it will give you the byte offset you're currently at within the file. Storing this value, then seeking to it on next read, does the skipping head part we talked about. There's also a bit of code in there that will catch the line if it's incomplete and pretend like we didn't read it yet.

This means the first time we have to read a 50+mb text file to build out a depth of market might take 2 seconds.. but all subsequent reads and updates just take a 1-5ms since we only have to read what's new and apply the difference to the consolidated book we have already stored in memory.
« Last Edit: June 22, 2014, 07:34:18 am by Jack »

Offline Jack

  • *****
  • User post count 1,159
    • View Profile
    •  
    • FXGears

(General programming tips again.. but applicable to dealing with multiple things hitting a REST API at once, with examples.)

Messaging Libraries

At some point along the path of writing trading applications, you'll want to break out jobs and tasks to different applications to run concurrently. Or heck, even have different jobs run at the same time within the same application.

An example, which relates to REST APIs, is queuing API calls should the API provider limit how many calls can be made per second. If you only ever run a single trading app, then you can just rate limit the script on it's own, but what if you're running the same app across 50+ stocks, or different apps on the same stock? Eventually, a conflict might arise, so you have to put some thought into how to get all scripts making API calls to play nice with each other.

Enter: inter-process messaging, and the client/server model.

To combat this problem I stumbled upon ZeroMQ. ZMQ is a messaging library, and that helps us communicate between processes and apps... the difference between older, more common, messaging services and ZMQ is that ZMQ sockets act and scale the same if they are inter-process (on the same machine) or even TCP based (over the network.)

With this, you can make one 'server' app that does the actual API calls (and can properly queue requests to the API to prevent any conflicts or breaching the rate limit set by the API provider.) While the other 'client' apps do as they please, completely unaware of the API complexities, just making simple calls directly to the server.

So illustrate a simple client/server setup, let's take an example from the ZeroMQ guide:



Code: [Select]
//  Hello World server

#include <zmq.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <assert.h>

int main (void)
{
    //  Socket to talk to clients
    void *context = zmq_ctx_new ();
    void *responder = zmq_socket (context, ZMQ_REP);
    int rc = zmq_bind (responder, "tcp://*:5555");
    assert (rc == 0);

    while (1) {
        char buffer [10];
        zmq_recv (responder, buffer, 10, 0);
        printf ("Received Hello\n");
        sleep (1);          //  Do some 'work'
        zmq_send (responder, "World", 5, 0);
    }
    return 0;
}

Code: [Select]
#
#   Hello World client in Python
#   Connects REQ socket to tcp://localhost:5555
#   Sends "Hello" to server, expects "World" back
#

import zmq

context = zmq.Context()

#  Socket to talk to server
print("Connecting to hello world server…")
socket = context.socket(zmq.REQ)
socket.connect("tcp://localhost:5555")

#  Do 10 requests, waiting each time for a response
for request in range(10):
    print("Sending request %s …" % request)
    socket.send(b"Hello")

    #  Get the reply.
    message = socket.recv()
    print("Received reply %s [ %s ]" % (request, message))

(That's right, you just saw a full TCP socket server, and client written in a hand full of lines. I've previously done socket connections with Python and let me tell you, it's so refreshing to see a library make it this easy.)

With ZMQ installed, you can run both scripts on their own and they'll talk to each other over TCP.

ZMQ itself was designed with financial applications in mind, and a lot of testing has been done in high volume (frequency of messages) environments like interacting with the entire NASDAQ data feed without skipping a beat.

If this interests you, take a moment to watch this video presentation about ZeroMQ:





So, just as with the example I gave of where we could use it, I wrote out a lot of my Python trading platform to use ZMQ request / receive sockets. One app runs on its own and handles all REST API requests, and other apps (clients) are either there to analyze data, or make trade decisions. The client apps are also able to talk to each other should they rely on info that one might provide...

By putting this effort in now, this means that when one of my apps starts to get too resource intensive for my trading system (say, something that's doing statistical arb analysis, or a lot of number crunching,) I can just move it to another dedicated computer with the only change in my code being the host address of other apps it relies on for info (telling the script that it's not working on localhost, and instead the server and clients it interacts with is on another IP.) That's it... blows my mind how easy writing out quick network apps like this becomes with ZMQ.

I had to go one step further, and define a messaging format between apps. But once defined, the server side just listens for incoming connections and messages, then breaks the message down, and uses a dispatch table to know which code block needs to be called to answer the message properly:

My dispatch table (dictionary) just looks like this:

Code: [Select]
dispatch = {
    #data
    "TOS":tos,
    "LAST":lastPrice,
    "L1":levelOne,
    "L2":levelTwo,
    "IMBAL":'not implimented yet..',
    "GETSTATE":getOrderState,
    "GETOSTAT":getOSTATState,
    "GETOID":getOID,
    "GETPOS":getOpenPos,
    #exec
    "BUY":executeBUYOrder,
    "SELL":executeSELLOrder,
    "CANCEL":cancelOrder,
    "FLAT":flatten,
    }

Where you'll see the 'key' is just the first word of a string message received by the server. My message format was simply ACTION SUBJECT ARGUMENTS/DATA. The return value is the name of a code module, so I can process a command by doing something like this:

Code: [Select]
message = "L1 SPY"
dispatch[message.split()[0]](message.split()[1])

The above (if you look at the dispatch table) results in the exact same thing as directly calling: levelOne("SPY") #This function returns the level 1 price of a given stock.

Actually, as an offshoot of the client / server model and ZeroMQ, if you haven't used a dispatch table before this is something you should seriously consider. It makes processing 'what to do' a lot more elegant than a long line of "if elif else" statements testing for specific conditions. Maybe the next post will be about dispatch tables and how awesome they are. :P 

Offline rod178

  • *****
  • User post count 1,282
    • View Profile

Slightly off topic - good comparison of Ruby and Python, which I found informative as to the Python approach.

https://ochronus.com/a-rubyists-confessions-on-python/ 
Better a Goat than a Sheep. Better a Shark than a Fish

Offline Jack

  • *****
  • User post count 1,159
    • View Profile
    •  
    • FXGears

Slightly off topic - good comparison of Ruby and Python, which I found informative as to the Python approach.

https://ochronus.com/a-rubyists-confessions-on-python/

Never got into Ruby before. My choice of Python here was related to the mass amounts of 3rd party libraries or wrappers to popular applications available for Python.

That said, I've seen some neat things built with Ruby and the Rails framework.

Offline Jack

  • *****
  • User post count 1,159
    • View Profile
    •  
    • FXGears

urllib2 to access the REST API


As mentioned earlier, any programming language with a URL library can use a RESTful API. The most common one with Python (and what comes build in,) is urllib2.

To use it, first we need to build out (concatenate) the URL the REST API expects. Let's write a basic 'get the current price' module in Python for the Oanda REST API. (We will use the sandbox URLs.)

The objective will be that we can call this module to get the latest price instead of writing out the API request every time.

Now, there's one part I'm skipping over right now, and that's the JSON markup that Oanda uses with their API, but we'll get to that later.

Code: [Select]
#This version is nice and broken out so you can see each step the module is taking..
def Oanda_Latest(symbol):
    url_prep = "http://api-sandbox.oanda.com/v1/prices?instruments=" + symbol
    oapi = urllib2.urlopen(url_prep)
    oapi_json = json.loads(oapi.read())
    quote = (oapi_json['prices'][0]['bid'], oapi_json['prices'][0]['ask'])
    return quote

Here's the module knows the string of text that makes up the URL which doesn't change, and concatenates the varibale that does (the symbol.)

Then it opens and reads the URL with urllib2.

(and, the part I'll skip for now, it parses the resulting data from the Oanda API with the JSON library.)

Lastly it returns a tuple of the bid and ask price.

We can use it in a script (with calling the right libraries at the beginning) like so:

Code: [Select]
import urllib2
import json

# This version is not broken out, but is the same as the earlier example.
def Oanda_Latest(symbol):
    oapi_json = json.loads(urllib2.urlopen("http://api-sandbox.oanda.com/v1/prices?instruments=" + symbol).read())
    return  (oapi_json['prices'][0]['bid'], oapi_json['prices'][0]['ask'])

latest = Oanda_Latest("EUR_USD")

print("EUR/USD Bid: " + str(latest[0]) + " Ask: " + str(latest[1]))

So now, moving forward, every time I want to check the latest bid or ask price, I just have to write that line "Oanda_Latest"...

This is clearly just the start... you'd want to write modules like this for other common API calls.. but this should set you in the right direction.
« Last Edit: July 04, 2014, 03:48:27 am by Jack »

Offline Jack

  • *****
  • User post count 1,159
    • View Profile
    •  
    • FXGears

MySQL

I've been doing a lot of work with MySql as a backend for storing and recalling mass amounts of quote, L2, and other data between sessions.

I found these two resources helpful when getting started:



and

http://zetcode.com/db/mysqlpython/

That being said, It's annoying how much I can get done with Pandas' DataFrames only to run into a roadblock (like storage, retrieval, providing data to other machines and services, etc..) that employing MySQL instead starts to make more sense. Sure, there's various hacks and glue code between various libraries in Python that would let me stick to Pandas in this case, but at some point you gotta go with the best tool for the job.

Offline Jack

  • *****
  • User post count 1,159
    • View Profile
    •  
    • FXGears

I haven't written much more about Oanda's REST API because I'm just not finding many practical reasons to automate on Oanda's platform. 99% of the stuff I do manually with Oanda wouldn't benefit from being automated just yet.

Most of the stuff I've been using Python for lately has been in equities. For example, to better manage an event specific trading setup with a HUGE time constraint on the trader which took place this past month, I spent a few weeks coding up a fully graphical program in Python that made the whole trade a relaxed, push-button, experience. Multiple thousands were made beyond what would have been possible had we all done the trade manually. I <3 Python.

I will continue with Oanda REST API examples eventually though, as I plan on porting over an EA I wrote for MT4 into a Python driven strategy for Oanda.

Offline Jack

  • *****
  • User post count 1,159
    • View Profile
    •  
    • FXGears

Offline Jack

  • *****
  • User post count 1,159
    • View Profile
    •  
    • FXGears

The observer pattern, or how I switched form event loops to event based algos. 

Some of my first algos were written using a 'event loop' style control method. That is to say, when the algo was waiting for an event, it would keep looping over code until a condition was true (the event) and would then break or drop into another loop waiting for the next event.

For example:
Code: [Select]
order_id = execute_trade('EUR/USD', 'long', 100000, 'market')

while True:
    if order_status(order_id) == 'Filled':
        print 'Trade filled!'
        #break into a sub-loop
        while True:
            #do other stuff, like test for a stop out condition, etc..
    else:
        time.sleep(0.01) #loop every 10 ms

This was fine for basic strategies, but the constant condition checking and CPU cycles burned on looping over and over while waiting for a condition to be met created a scaling problem I didn't realize I'd have until much later.

Put simply: while this worked fine for a hand full of strategies running at a time, things got hairy as soon as I scaled up beyond that. Issues like CPU usage came up, and as the strategies became more complex, or had some actual math being done with each loop, the amount I could run at the same became a limiting factor.

One option would be to toss more hardware at the problem. However, why spend money on hardware when the real issue was my novice programming skills?

Enter: The observer pattern!

The observer pattern is a way to cut out event loops entirely and only react, or check conditions, when something (like a price quote) changes. Strategies 'subscribe' to updates on specific variables, like orders being filled or time and sales feeds, and a daemon who's in charge of updating these variables lets all subscribed objects know there's been an update.

An added advantage to the observer pattern is the ability to reference the trade algo's object from many other points in your application or framework without worrying about how to interrupt some loop (not to mention this helps get around the blocking nature of loops when it comes to python's GIL and threading, but that's a technical talk for another time.)

To start off we're going to make 3 things: 1) An observable class to template the daemons we'll write. 2) At least 1 daemon to be in charge of a variable worth paying attention to. 3) A strategy / algo which will subscribe to events happening.

This won't be the absolute best way of implementing an observer pattern in Python, but it's the way I've used and currently have running in production code. As I've mentioned before, I'm not a programmer by trade, everything I've learned has been out of self-interest, so please feel free to comment or make suggestions if you think it can be improved.

1: The observable class
Code: [Select]
class Observable(dict):
    """
    A class of thing that can be observed.  When its notifyObservers()
    method is called with an event, it passes that event on to its
    observers.
    """
    def addObserver(self, event, observer):
        if self.has_key(event):
            if observer not in self[event]:
                self[event].append(observer)
        else: self[event] = [observer]
    def removeObserver(self, event, observer):
        if observer in self[event]:
            self[event].remove(observer)
        else: pass
    def notifyObservers(self, event):
        if self.has_key(event):
            for observer in self[event]:
                observer.observeEvent(event, self.msg_name)

Notice we start by subclassing 'dict'. This makes the observable object inherit all that a dictionary object can do, and we'll be extending this with a few functions.

Inheriting dict means the daemon itself can self-store who's subscribed to what data. So code like -- self['foo'] = 'bar' -- would be the same as creating a dictionary and assigning the 'bar' value to a key named 'foo'.

Lastly, notice we have a method to add, remove, and notify other objects of updates. You end up literally adding a pointer to the other objects themselves to this internal dictionary of subscribers.

2) Now we inherit this new class to make our daemon: 
Code: [Select]
class L1_Quote_Daemon(Observable):
    """
    A daemon that listens on an assigned UDP port for L1 quote updates.
    Quotes are stored in memory and observers get notified that there's a quote
    update.
    """    
    def __init__(self, parent):
        self.msg_name = "L1"
        self.parent = parent
        t = threading.Thread(target=self.run,)
        t.setDaemon(True)
        t.start()
        
    def run(self):
        import select, socket
        bufferSize = 4096 
        port = 50152
        a = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        a.bind(('127.0.0.1', port))
        a.setblocking(1)
        while True:
            result = select.select([a],[],[])
            try:
                msg = result[0][0].recv(bufferSize)
                ###
                ### Code that was here deals with the UDP packet/message, and extracts the important data like new bid and ask prices.
                ### This block would also store said data into memory that's readable by other parts of the application, as well as the trading algo.
                ###
                self.notifyObservers(symbol)
            except:
                traceback.print_exc() #working with sockets can be a pain, so I make sure to catch all faults and write the info to console.
                continue

A couple of things to note here:

My source of data in this case is a UDP socket connection to a custom port. I cut out the part that breaks the UDP packets down since specifics about the environment I'm working with isn't the focus. Your source can be nearly anything that runs a non-blocking wait for data (for example: using RTD on a windows platform to communicate with eSignal, using one of python's FIX libraries to listen for new FIX messages, using pywin32's DDE client to listen to DDE values change from ThinkOrSwim or Excel... etc... etc...)

I run the listening in a thread that it creates itself. That way I just create an object in my main program and it takes care of itself. It's own internal while loop is non-blocking, so it won't waste CPU cycles while waiting for new data (this relates to socket / network programming.)

Since any fault will kill the daemon, I wrap the work into a try/except statement and make sure to log any faults that arise. You don't want garbage data to kill your quote updates after all.. and as much as you might trust your own code, the data you're dealing with might not be as clean as you'd hope (.. on that note: IDC/eSignal can sit on a tack, and even Oanda's restful API isn't perfect in this regard either..timeouts suck.) 

Finally, we need to call this daemon into existence. Somewhere in the main thread of your application, usually as part of a startup sequence, we should find a line like this:

Code: [Select]
        self.L1_D = L1_Quote_Daemon(self)
"self." because this would typically be called within a class that manages all your trades or at least the highest level of your algo running (that is, you want to be able to reference it form other objects, so we gotta attach it to some context that's consistent.)

3) So now we have the daemon covered, we need something to listen to it: 
Code: [Select]
class algo(object):
    """I'm a simple script that prints the latest B/A quotes to console"""
    def __init__(self, parent_self, symbol):
        self.name = self.strat_id
        self.parent = parent_self
        self.symbol = symbol

        self.subscribe()
        
    def subscribe(self):
        self.parent.L1_D.addObserver(self.symbol, self)
        
    def unsubscribe(self):
        self.parent.L1_D.removeObserver(self.symbol, self)
            
    def OnQuote(self): #L1 server pokes this method.
        bid = float(self.parent.L1_feed[self.symbol][1])
        ask = float(self.parent.L1_feed[self.symbol][2])
        print bid, ask
    
    def OnDepth(self): #L2 server pokes this method.
        pass #code would be here to handle depth of market updates
    
    def OnOrder(self): #ORD server pokes this method.
        pass #code would be here to update order ticket updates
    
    def OnTick(self): #TOS server pokes this method.
        pass #code would be here to handle time and sales updates
    
    def ObserveEvent(self, event, msg):
        """dispatch messages to event handlers"""
        msg_dispatch = {'ORD':self.OnOrder,
                        'L1':self.OnQuote,
                        'L2':self.OnDepth,
                        'TOS':self.OnTick,}
        msg_dispatch[msg]() 

There's a lot here to cover, but we can see that when we bring this algo into existence it starts (via __init__) by knowing the variables of its parent, which include the daemon and the quotes stored in memory.

We also run a subscribe function that has the algo tell the L1 daemon that it wants to listen for L1 updates on the 'subject' of the symbol it was given when we created the algo object. (This is important, as we don't want updates when any symbol updates its quote, we just want to listen for a single symbol.) Notice we pass the symbol we want, plus the 'self' call, to the quote daemon.. we are literally pointing to the observing/algo object itself when it subscribes, and in python, if we __init__ a object with 'self' (a context,) passing 'self' means we're passing a reference it itself. This way the daemon knows when the symbol has a quote update that this specific object wants to know.

Finally, the ObserveEvent method is a catch-all for any type of update we might want to subscribe to. This way the Observable class only has to call one thing, instead of having to write out different observable classes and special calls for each type of daemon we want to make. The algo or object that subscribes for updates handles the data based on what type of message comes in... if it's a 'L1' message type, the appropriate method is called via a dispatch table.

We could (and should) extend this further by saying if there's no method set to handle a given message type, we should handle that. But as it stands this will just raise a key error.

BTW, dispatch tables are awesome.. I much prefer python's hash based dictionaries to create them, but the first time I mentioned dispatch tables was from my old algo journal where I provided an error lookup dispatch table written in MQL4 for MT4 here.

My next post on this thread might be just about dispatch tables in python, that's how much I like them. :P

The code block for "onQuote" can do much more than just print out the latest bid and ask quotes.. this is where the algo's logic would reside. We can control what happens at each quote update at various stages of the algo's logic by setting some sort of state on the algo itself. For instance, if the self.state variable is set to 'entry_filled', maybe the OnQuote method would be testing for a stop out condition to be met. A switch statement or series of if / elif /else statements could accomplish this level of control.

The results:
Now, I didn't convert all my loop based algo work into observer pattern / event based objects for nothing. The immediate result was going from 50 concurrently running strategies across 50 different securities taking up 50%+ CPU (these were more than just basic strategies that would need to check for updates every 5ms or so,) to now being able to run the same exact strategies over 50 symbols with the CPU ranging from 0-1% utilization with the odd spike to 2-3% when heavy bursts of orders/actions would take place at once.. This was all on commodity hardware, a low end 2nd Gen Core i5 with 8Gb of RAM, I can imagine the additional headroom I'd have running this on a production Xeon system.

That about wraps it up for now.. I'll review this post tomorrow with fresh eyes to catch any typos and to re-write parts I think could be more clear, but if you have any questions or comments please feel free to post them. :D
« Last Edit: November 22, 2015, 07:32:16 pm by Jack »

Offline jonnycab

  • **
  • User post count 87
    • View Profile

A few general comments:

a more Pythonic way of dealing with string concatenation would be to use .format(). For example in, your zmq code:

Code: [Select]
print("Received reply {req} [ {msg} ]".format(req=request, msg=message))
when dealing with file IO, use Pythons with construct. It is also a good idea to open all files using binary mode, this helps with code portability...

Code: [Select]
with open(_file, 'rb') as _f:
    _f.read()
    _f.seek()
    ...

It'll handle closing files for you under all conditions, including crashes...

Its nice to see actual software design patterns being used, good work!

Offline Jack

  • *****
  • User post count 1,159
    • View Profile
    •  
    • FXGears

A few general comments:

a more Pythonic way of dealing with string concatenation would be to use .format(). For example in, your zmq code:

Code: [Select]
print("Received reply {req} [ {msg} ]".format(req=request, msg=message))
when dealing with file IO, use Pythons with construct. It is also a good idea to open all files using binary mode, this helps with code portability...

Code: [Select]
with open(_file, 'rb') as _f:
    _f.read()
    _f.seek()
    ...

It'll handle closing files for you under all conditions, including crashes...

Its nice to see actual software design patterns being used, good work!

I tend to use a few ways of print formatting in Python.. not quite sure what the advantages are to sticking with PEP8 here. Is it just stylistic?

Also, I've been using the 'with open' in all new code since writing that old post. Much prefer it myself... as you said, it closes up loose ends nicely.

Offline jonnycab

  • **
  • User post count 87
    • View Profile

I tend to use a few ways of print formatting in Python.. not quite sure what the advantages are to sticking with PEP8 here. Is it just stylistic?

Also, I've been using the 'with open' in all new code since writing that old post. Much prefer it myself... as you said, it closes up loose ends nicely.

while the + and %s/%d/etc... with tuples are still supported, using .format() is preferred, the + and %s will be deprecated at some point

.format() is also the standard in Python 3. It'll make your code easier to port when you end up moving to Python 3 if you aren't doing so already.

it'll also make your code more maintainable if you stick to a single style that is also forward compatible.

it gives more flexibility as well, you can do stuff like this:

combine the tuple method with dictionary approach
pass in a dict and have it fill in the items you need/want

Code: [Select]
"{item}, {0}".format('There', item='Hi')

d = {
'building': 'a',
'my': 'b', 
'string': 'c',
'from': 1,
'dictionary': 2,
'parameters': 3,
'other': 4
'junk': 5
'not': 6
'required': 7
}
"{building} {my} {string} {from} {dictionary} {parameters}".format(**d)

granted some of it is more useful than others, but the dict stuff is by far the biggest reason you should be using it. seeing as you're dealing with dictionaries quite a lot anyway, this goes a long way to helping you build up your strings in certain areas.

saves on typing too no more "%()s" wrapping all keywords if you want to use % dict, also more readable

glad to hear you're using with, it is a very useful feature

Offline Jeronimo

  • [banned]
  • **
  • User post count 93
  • [banned]
    • View Profile

I tend to use a few ways of print formatting in Python.. not quite sure what the advantages are to sticking with PEP8 here. Is it just stylistic?

Also, I've been using the 'with open' in all new code since writing that old post. Much prefer it myself... as you said, it closes up loose ends nicely.
As always, clueless to what you are talking about. But, always great to see great minds at work.

J

No comments:

Post a Comment