Business Intelligence Tools for Small Companies: A Guide to Free and Low-Cost Solutions

Juan Valladares and I have finished our book and it is published now. You can get it from major retailers or from the publishers website (Apress): http://www.apress.com/gp/book/9781484225677

Also available from amazon: https://www.amazon.com/Business-Intelligence-Tools-Small-Companies/dp/1484225678

The book:

  • Teaches how to implement and manage the business intelligence/data warehousing (BI/DWH) infrastructure for a small company
  • Provides practice extracting data from any enterprise resource planning (ERP) tool
  • Uses open-source extract-transform-load (ETL) tools to process and integrate BI data
  • Shows how to query, report, and analyze BI data using open-source visualization and dashboard tools
  • No previous knowledge is required

Learn how to transition from Excel-based business intelligence (BI) analysis to enterprise stacks of open-source BI tools. Select and implement the best free and freemium open-source BI tools for your company’s needs and design, implement, and integrate BI automation across the full stack using agile methodologies.

Business Intelligence Tools for Small Companies provides hands-on demonstrations of open-source tools suitable for the BI requirements of small businesses. The authors draw on their deep experience as BI consultants, developers, and administrators to guide you through the extract-transform-load/data warehousing (ETL/DWH) sequence of extracting data from an enterprise resource planning (ERP) database freely available on the Internet, transforming the data, manipulating them, and loading them into a relational database.

The authors demonstrate how to extract, report, and dashboard key performance indicators (KPIs) in a visually appealing format from the relational database management system (RDBMS). They model the selection and implementation of free and freemium tools such as Pentaho Data Integrator and Talend for ELT, Oracle XE and MySQL/MariaDB for RDBMS, and Qliksense, Power BI, and MicroStrategy Desktop for reporting. This richly illustrated guide models the deployment of a small company BI stack on an inexpensive cloud platform such as AWS.

What You’ll Learn

You will learn how to manage, integrate, and automate the processes of BI by selecting and implementing tools to:

  • Implement and manage the business intelligence/data warehousing (BI/DWH) infrastructure
  • Extract data from any enterprise resource planning (ERP) tool
  • Process and integrate BI data using open-source extract-transform-load (ETL) tools
  • Query, report, and analyze BI data using open-source visualization and dashboard tools
  • Use a MOLAP tool to define next year’s budget, integrating real data with target scenarios
  • Deploy BI solutions and big data experiments inexpensively on cloud platforms

Who This Book Is For
Engineers, DBAs, analysts, consultants, and managers at small companies with limited resources but whose BI requirements have outgrown the limitations of Excel spreadsheets; personnel in mid-sized companies with established BI systems who are exploring technological updates and more cost-efficient solutions

Introduction to the maths of bookmaking (with python code)

Introduction

In this article I will show you how to calculate simple things about the odds the bookmakers offer and how to play with them with the intention of using the real chance of each outcome to model a group of prices. Basically what we will do is the following:

  • retrieve the odds of a horse race
  • calculate the overround applied
  • determine the true odds
  • generate a new set of odds with the desired overround. We will see several techniques, these are:
    • First approach for pricing: Apply the overround linearly
    • A better approach: Apply the overround based on the chance of winning
    • The real deal: Apply the overround based on a model

Retrieve the Odds

For the sample of this article I will be using odds of a horse race held at Doncaster, the 27th of June 2015. This was the last race of the card, a class 4 handicap of 7 runners, but any race or sport should suit.

The odds on offer at the time of writting were the following (got from oddschecker):

Rio Ronaldo 3.25 3.0 3.0 3.25 3.0 3.25 3.25 3.0 3.0 2.75 3.25 3.25
Beau Eile 3.5 3.25 3.25 3.25 3.5 3.25 3.25 3.25 3.25 3.25 3.5 3.5
Bahamian Sunrise 4.0 4.0 3.75 3.75 4.0 3.75 3.75 3.75 4.0 4.0 3.5 3.75
Silver Rainbow 13.0 13.0 9.0 13.0 12.0 11.0 11.0 13.0 11.0 11.0 10.0 9.0
Snow Cloud 15.0 15.0 12.0 13.0 10.0 12.0 13.0 12.0 12.0 15.0 9.0 12.0
Equally Fast 17.0 17.0 17.0 15.0 17.0 17.0 17.0 17.0 15.0 17.0 13.0 17.0
Mc Diamond 67.0 67.0 41.0 34.0 67.0 51.0 41.0 34.0 51.0 67.0 41.0 41.0

In this article we will choose the best price or joint best price available but any set of ods can be choose.

So we construct our list of best prices with the folowing values: [3.25, 3.5, 4.0, 13.0, 15.0, 17.0, 67.0]

maxPrices = [3.25, 3.5, 4.0, 13.0, 15.0, 17.0, 67.0]

Calculate the Overround of a set of outcomes

To calculate the overround of a set of prices is easy. Basically what needs to be done is to iterate through the list of prices, and calculate the chances of winning each one, accumulate them and see how this number exceeds of 1 (or 100% if we are counting percentages).

To work out the probability of each outcome to win we need to do the following division:

1 / odds

Then we will sum up all these probabilities and will get the overround of the race

overround = 0
for price in maxPrices:
    overround = overround + 1/price
print("Total overround is",overround) 

which gives us the following output: Total overround is 1.0607452395424302

Determine the true odds

For calculating the fair price we will multiply the current price by the overround we calculated in the previous step. In case we were working with probabilities, the process would be the same.

fairPrice = []
for price in maxPrices:
    fairPrice = fairPrice + [price * overround]
print("fairPrice",fairPrice)

The new fair price list without the overround is the following:

fairPrice [3.4474220285128983, 3.7126083383985056, 4.242980958169721, 13.789688114051593, 15.911178593136452, 18.032669072221314, 71.06993104934283]

Generate a new set of odds

The following step is generating a new set of odds. These can be generated with different techniques. We cover the following in this article:

First approach for pricing: Apply the overround linearly

This solution is not the most usefull one but in some cases it may work. Basically it consists in dividing the total percent of overround equaly amongst all the outcomes. This is usually not a good idea as we can get inflated prices for the favourites against the outsiders. And as we know, money is likely to go for these heading the market. So from a bookmaking point of view, it does not make too much sense.

We have not considered this solution as interesting for the article, so we are not covering it.

A better approach: Apply the overround based on the chance of winning

In this paragraph we are presenting a better approach. Instead of dividing the overround in equally parts, we will divide the overround depending on the chance of winning. So, based on the calculated odds, we will apply one part of the overround on the other. This partially compensates the problem with the previous method, and will usually be more than enough, though sometimes it is not yet the perfect solution.

In our sample, we will be applying a 5% of overround on the fair price calculated in the previous step.

appliedOverround5pct = []
for price in fairPrice:
    appliedOverround5pct = appliedOverround5pct + [price/1.05]
print("appliedOverround 5%",appliedOverround5pct)

The new list with a 5% of overround is the folowing. As you can see, prices are slightly higher that they were origially as the overround is 1% less:
[3.2832590747741888, 3.5358174651414336, 4.040934245875924, 13.133036299096755, 15.153503422034715, 17.17397054497268, 67.68564861842174]

The real deal: Apply the overround based on a model

This solution will entitle in building a model of prices withou overround and winning results. Based on a big number of outcomes we would able to model and predict the overround to apply based on this historical data.

Since this would involve a model creation and usually this means some complexity, and the samples presented are good enough to go, we left this out of the scope of this article.

The full code is here:

def pythonPriceOverround():
	maxPrices = [3.25, 3.5, 4.0, 13.0, 15.0, 17.0, 67.0]

	print(maxPrices)
	overround = 0
	for price in maxPrices:
		overround = overround + 1/price

	print("Total overround is",overround)

	fairPrice = []
	for price in maxPrices:
		fairPrice = fairPrice + [price * overround]
	print("fairPrice",fairPrice)

	appliedOverround5pct = []

	for price in fairPrice:
		appliedOverround5pct = appliedOverround5pct + [price/1.05]
	print("appliedOverround 5%",appliedOverround5pct)

	#Check than now the overround is indeed a 5%
	overround = 0
	for price in appliedOverround5pct:
		overround = overround + 1/price

	print("Total overround is",overround)