Installation Guide

Install and start a Hive Hero Analytics server

Intro

There are two basic things you will need to have to run Hive Hero Analytics (HHA). A PostgreSQL database server and a HHA server (These can be hosted together on one system or separately). To simplify things we recommend you install HHA on a VPS (Virtual Private Server) provider, such Digital Ocean, to get started and familiar with HHA. Once you are comfortable doing this you could install it on any environment or even locally.

Detailed in this section are instructions on how to install it on Digital Ocean, and generically on your own mac, linux and windows environment.

Digital Ocean

Digital Ocean is an infrastructure as a service, service provider. They provide servers that can be used to host a Hive Hero Analytics (HHA) server and PostgreSQL database. We recommend using a service like this to ensure that you have a clean slate to begin an install. We also recommend it because it best simulates what would be required to get a HHA server up and running in a production environment.

1. Create your server

  • Login to digital ocean or create an account.
  • In the DigitalOcean console, click “Create Droplet”. Create one with the following attributes:
    • Image: Ubuntu 16.04 (Default)
    • Size: 512 MB
    • Region: Choose the one closet to you (lower latency).
    • Additional options: [x] User data.

  • Add the following user data.
#cloud-config

package_upgrade: true

packages:
  - postgresql
  - postgresql-contrib
  - unzip

runcmd:
  - touch /tmp/cloud-config.log
  - printf "\nHHA - Creating the database\n" >> /tmp/cloud-config.log

  # Create the database
  - sudo -u postgres createdb hha
  - sudo -u postgres bash -c "psql -c \"CREATE USER hha_usr WITH PASSWORD 'hha_pwd' SUPERUSER;\""

  # Install Hive Hero Analytics
  - printf "\nHHA - Installing Hive Hero\n" >> /tmp/cloud-config.log
  - cd /tmp
  - curl -O https://www.hivehero.com/downloads/analytics/current/linux-amd64.zip
  - mkdir -p /usr/local/hha
  - unzip /tmp/linux-amd64.zip -d /usr/local/hha
  - printf "/nHHA INSTALL 2/n" >> /tmp/cloud-config.log

  # Initialise the database
  - printf "\nHHA - Initialising the Hive Hero Database\n" >> /tmp/cloud-config.log
  - /usr/local/hha/hha -new_database -config=/usr/local/hha/config.toml

  # Create and start the Hive Hero Analytics SystemD Service
  - printf "\nHHA - Starting the Hive Hero Service\n" >> /tmp/cloud-config.log
  - mv /usr/local/hha/hha_systemd.service /lib/systemd/system/hha.service
  - systemctl start hha
  • Click create and wait for it to complete.
  • Note down the IP address of the droplet you created for upcoming steps.
  • Note down the username and password of your droplet (unless you specified using an SSH Key)
    • The username is “root” by default
    • The password may be emailed to you

2. Connecting to your database

  • Download PgAdmin
  • Create an SSH tunnel (below)
  • Connect using PgAdmin (below)

Create an SSH tunnel - Mac OS X

Open the “terminal” app and enter the following, substituting the “root” and/or “DROPLET_ip” with the details from step 1. This will create an SSH tunnel that is secure enough for you to read and analyse your data.

ssh -L 1111:localhost:5432 -C -N [email protected]_ip

Create an SSH tunnel - Windows

  • Download Putty
  • Open Putty. In the “Sessions” category (on the left)
    • Enter your droplet IP address in the “Host Name” field
  • On the “Connection -> SSH -> Tunnels” category enter the following
    • Source port: “1111”
    • Destination: “127.0.0.1:5432” (Select Local and Auto)
    • Click Add
  • Click the Open button at the bottom

You should now have an SSH Tunnel into your droplet.

Connect using PgAdmin

  • After creating an SSH tunnel (above) open PgAdmin and create a new server using the following:

    • General Tab
    • Name: HHA
    • Connection Tab
    • Host: localhost
    • Port: 1111
    • Username: hha_usr
    • Password: hha_pwd
  • You should see something like the following:

3. Send test data to your HHA instance

  • Open the test_util.html file, included in zip file distribution of HHA, in a browser. Follow the instructions on the test_util page to generate test data.

  • Go back to your database, outlined in the previous section, and you should see some test data appear.

For a more detailed guide for sending data please refer to the “Web Implementation” section below.

4. Optional Steps

  • Configure domain name with your domain name provider
    • Point your domain to the HHA server through a DNS configuration
  • Follow the web implementation guide below.

Mac OS X

This is a rough outline of the tasks required to install Hive Hero Analytics on Mac OS X. To do this you will need to be a bit comfortable installing and configuring software in the terminal on Mac OS X.

  1. Install PostgreSQL
    • We recommend you use Homebrew
  2. Download the Hive Hero Analytics binary
    • Extract the zip file to a folder. e.g “/usr/local/var/hivehero_analytcs”
    • Edit the config.toml file with database connection details
  3. Initialise the database
    • Use the command “./hha -new_database”
  4. Run the Hive Hero Analytics binary.
    • Use the command “./hha”
    • OR
    • Use the command “./hha -config=/usr/local/var/hivehero_analytcs/config.toml”
  5. Test the instance by opening the included “test_util.html” in your favourite browser.
  6. Connect to the database using PgAdmin and query the test data

Windows

This is a rough outline of the tasks required to install Hive Hero Analytics on a Microsoft Windows operating system.

  1. Install PostgreSQL
    • This can be installed on a separate machine instance (or even a Linux server).
  2. Download the Hive Hero Analytics binary
    • Extract the zip file to a folder. e.g “Program Files/Hive Hero Analytics”
    • Edit the config.toml file with database connection details
  3. Initialise the database
    • Use the command “hha -new_database”
  4. Run the Hive Hero Analytics binary.
    • Use the command “hha”
    • OR
    • Use the command “hha -config=‘C:/Program Files/Hive Hero Analytics/config.toml’”
  5. Test the instance by opening the included “test_util.html” in your favourite browser.
  6. Connect to the database using PgAdmin and query the test data

Generic Linux

This is a rough outline of the tasks required to install Hive Hero Analytics on a linux operating system. To do this you will need to be a bit comfortable installing and configuring software on linux operating systems.

To give you a head start we’ve included a sample Ubuntu Bash script that would complete most of the steps outlined.

#!/bin/bash

# Update and install required software
apt-get -y update
sleep 1 # Sometimes apt locks right after an update
apt-get -y install postgresql postgresql-contrib unzip

# Create the database
sudo -u postgres createdb hha
sudo -u postgres bash -c "psql -c \"CREATE USER hha_usr WITH PASSWORD 'hha_pwd' SUPERUSER;\""

# Install Hive Hero Analytics
wget -P /tmp/ https://www.hivehero.com/downloads/analytics/current/linux-amd64.zip
mkdir /usr/local/hha
unzip /tmp/linux-amd64.zip -d /usr/local/hha

# Initialise the database
/usr/local/hha/hha -new_database -config=/usr/local/hha/config.toml

# Create and start the Hive Hero Analytics SystemD Service
mv /usr/local/hha/hha_systemd.service /lib/systemd/system/hha.service
systemctl start hha

NOTE: If you need to debug the systemd service you can use the “journalctl -u hha” command in a terminal

  1. Install PostgreSQL
    • This can be installed on a separate machine instance.
  2. Download the Hive Hero Analytics binary
    • Extract the zip file to a folder. e.g “/usr/local/var/hivehero_analytcs”
    • Edit the config.toml file with database connection details
  3. Initialise the database
    • Use the command “./hha -new_database”
  4. Run the Hive Hero Analytics binary.
    • Use the command “./hha”
    • OR
    • Use the command “./hha -config=/usr/local/var/hivehero_analytcs/config.toml”
  5. Test the instance by opening the included “test_util.html” in your favourite browser.
  6. Connect to the database using PgAdmin and query the test data

Additional Steps

  • Configure domain name
    • Point your domain to the HHA server through DNS configuration
  • Follow the web implementation guide below.

Uninstalling

Linux

  • Delete the service
    • Stop the service “systemctl stop hha”
    • Delete the service “sudo rm /lib/systemd/system/hha.service”
  • Delete the HHA binary
    • “rm -rf /usr/local/hha”

Mac OS X

  • Delete the HHA binary
    • In terminal run the following “sudo rm -rf /usr/local/var/hivehero_analytcs”

Windows

  • Delete the HHA binary
    • Open Explorer and delete the folder “C:/Program Files/Hive Hero Analytics”

Web Implementation

All you need to know to capture web data

The following sections include JavaScript code examples which should be executed or consolidated in a single JavaScript file included on each web page.

Basics

When you have a HHA server setup you will need to modify your existing webpages to start collecting data.

Out of the box the Hive Hero Analytics server automatically captures views, pages, sessions and visitors. By adding the following HTML onto your web pages, preferably inside the head tag or before the closing body tag, those things will be collected.

You will need to replace “localhost:9001” with the domain you have choosen to host your HHA server on.

<script async src="http://localhost:9001/c.js"></script>

Alternatively if you have a tag management system or just want to include it via JavaScript we recommend using the snippet of code below.

function(t){var e=document.createElement("script");e.type="text/javascript",e.async=1,e.src=t;var c=document.getElementsByTagName("script")[0];c.parentNode.insertBefore(e,c)}("http://localhost:9001/c.js");

Note. Automatic views can be disabled. This is described in the section views section.

Queue Code

Any other configuration or custom tracking must be added after the following “Queue” JavaScript. This queue essentially holds all code that needs to execute as soon as the HHA page instance is ready and able to send data to the server.

When actions are executed in the queue, ids, addresses, and referrer fields, amongst others, will automatically be added and will appear in the database.

window.hha = window.hha || function() {
    (hha.q = hha.q || []).push(arguments)
}

Please note that if you have changed the web_collector namespace in the server configuration you will need to replace window.hha with window.YOUR_NAMESPACE.

Logging

Log messages can be viewed in the developer console of your browser. There are several levels of logging listed below.

When set the level persists for 365 days. We only recommend setting logging higher than 1 for development, debugging, and implementation purposes.

Level 0 - No Logging

All logging is turned off. Not recommended.

hha("log", "level", 0);

Level 1 - Error - DEFAULT

Messages from errors only.

hha("log", "level", 1);

Level 2 - Warn

Any warning messages.

hha("log", "level", 2);

Level 3 - Debug

Debugging level messages.

hha("log", "level", 3);

Level 4 - Trace

Detailed messages of what is happening internally.

hha("log", "level", 4);

Events

Any user interaction, whether it be a click or a keyboard press, can be captured using events. These events end up in the web_events table for analysis.

To capture events you will need to trigger the following JavaScript (after the queue code) when the event you want to capture occurs. You can add a “type” and additional data to describe the context.

hha("send", "event", "TYPE", {ANY OPTIONAL DATA}, {DATA MODIFIERS});
Parameter Name Value Required
1 Action Name “send” Yes
2 Action Type “event” Yes
3 Event Type Any String Yes
4 Data Values Single level JSON object No
5 Data Modifiers Single level JSON object No

Example:

hha("send", "event", "click", {
  Position: "column1",
  Action: "video:play"
});

Data Modifiers

To learn more about data modifiers please skip ahead to the Modifiers section.

Impressions

An impression is usually used to indicate the presence of a piece of content. e.g to track if a specific ad was shown or a certain marketing campaign offer was shown. An impression is different from a view because it usually tracks a piece of content rather than a whole page consisting of multiple pieces of content.

To capture an impression trigger the following JavaScript code when the user sees or loads your ad or piece of content (after the queue code).

hha("send", "impression", "TYPE", {ANY OPTIONAL DATA}, {DATA MODIFIERS});
Parameter Name Value Required
1 Action Name “send” Yes
2 Action Type “impression” Yes
3 Impression Type Any String Yes
4 Data Values Single level JSON object No
5 Data Modifiers Single level JSON object No

Example:

hha("send", "impression", "ad", {
  Name: "popular-campaign",
  Type: "hero-content",
  Position: "center"
});

Data Modifiers

To learn more about data modifiers please skip ahead to the Modifiers section.

Views

A view is usually triggered when an entire web page is shown. Views are automatically captured after a web page has rendered. You may want to capture additional views if you have a dynamic page like a photo gallery or if you have a single page web application with multiple dynamic “views”.

To capture a view trigger the following JavaScript code when the user sees a new view (after the queue code).

hha("send", "view", {ANY OPTIONAL DATA}, {DATA MODIFIERS});
Parameter Name Value Required
1 Action Name “send” Yes
2 Action Type “view” Yes
3 Data Values Single level JSON object No
4 Data Modifiers Single level JSON object No

OR

hha("send", "view", "TYPE", {ANY OPTIONAL DATA}, {DATA MODIFIERS});
Parameter Name Value Required
1 Action Name “send” Yes
2 Action Type “view” Yes
3 View Type Any String Yes
4 Data Values Single level JSON object No
5 Data Modifiers Single level JSON object No

Examples:

hha("send", "view", {
  Department: "Mens",
  Section: "Clothing",
  Item: "26"
});

OR

hha("send", "view", "video", {
  Title: "TITLE of Video"
});

Disabling Automatic Views

You can suppress or disable automatic views by adding the following code immediately after the queue code.

hha("config", "set", "autoPV", 0);

Note. Ensure the queue code and this config code runs before or immediately after the HHA script is loaded.

Data Modifiers

To learn more about data modifiers please skip ahead to the Modifiers section.

Page Data

A page instance can have many pieces of data associated with it at any time during its lifetime. An example of this is, page level conversion, an error in a component on the page or form data entered. Page data is accumulated or overwritten progressively.

To capture page instance data at any point, trigger the following JavaScript code (after the queue code).

hha("send", "pageData", {ANY DATA}, {DATA MODIFIERS});
Parameter Name Value Required
1 Action Name “send” Yes
2 Action Type “pageData” Yes
3 Data Values Single level JSON object Yes
4 Data Modifiers Single level JSON object No

Example:

hha("send", "pageData", {
    Conversion1: 1,
    FormField1: "ABC",
    VideoError: "VideoABC"
});

Page Custom IDs

A page instance can have many IDs associated with it. Product IDs, User IDs, Content IDs, Ad IDs and any other ID can be added to the custom IDs field for a page instance. This field is specifically designed for IDs to separate data (previously mentioned) and IDs from their “concerns”. It also makes it easier for optimizing database performance (indexing).

To capture page instance custom ids at any point, trigger the following JavaScript code (after the queue code).

hha("send", "pageCustomIDs", {CUSTOM IDS}, {CUSTOM ID MODIFIERS});
Parameter Name Value Required
1 Action Name “send” Yes
2 Action Type “pageCustomIDs” Yes
3 Custom IDs Single level JSON object Yes
4 Custom ID Modifiers Single level JSON object No

Example:

hha("send", "pageCustomIDs", {
  "ProductID1": "123456678",
  "ProductID2": "cfe4c053-42cf-431d-ad0d-1964512160b2",
  "ContentID1": "fc08bb1c-8676-49d8-a525-70d971e70858"
});

Session Data

A session instance can have many pieces of data associated with it at any time during its lifetime. An example of this is, session level conversion, session preferences, or entry page tracking. Session data is accumulated or overwritten progressively through out the session lifetime.

To capture session instance data at any point, trigger the following JavaScript code (after the queue code).

hha("send", "sessionData", {ANY DATA}, {DATA MODIFIERS});
Parameter Name Value Required
1 Action Name “send” Yes
2 Action Type “sessionData” Yes
3 Data Values Single level JSON object Yes
4 Data Modifiers Single level JSON object No

Example:

hha("send", "sessionData", {
    TotalPurchased: 110,
    TotalItems: 5,
});

Session Custom IDs

A session instance can have many IDs associated with it. Product IDs, User IDs, Content IDs, Ad IDs and any other ID can be added to the custom IDs field for a session instance. This field is specifically designed for IDs to separate data (previously mentioned) and IDs from their “concerns”. It also makes it easier for optimizing database performance (indexing).

To capture session instance custom ids at any point, trigger the following JavaScript code (after the queue code).

hha("send", "sessionCustomIDs", {CUSTOM IDS}, {CUSTOM ID MODIFIERS});
Parameter Name Value Required
1 Action Name “send” Yes
2 Action Type “sessionCustomIDs” Yes
3 Custom IDs Single level JSON object Yes
4 Custom ID Modifiers Single level JSON object No

Example:

hha("send", "sessionCustomIDs", {
  "FormID": "123456678",
  "PartnerSourceID": "cfe4c053-42cf-431d-ad0d-1964512160b2",
  "IncomingADID": "fc08bb1c-8676-49d8-a525-70d971e70858"
});

Visitor Data

A visitor instance can have many pieces of data associated with it at any time during its lifetime. An example of this is a visitor’s attributes or segments. Visitor data is accumulated or overwritten progressively through out the visitor lifetime.

To capture visitor instance data at any point, trigger the following JavaScript code (after the queue code).

hha("send", "visitorData", {ANY DATA}, {DATA MODIFIERS});
Parameter Name Value Required
1 Action Name “send” Yes
2 Action Type “visitorData” Yes
3 Data Values Single level JSON object Yes
4 Data Modifiers Single level JSON object No

Example:

hha("send", "visitorData", {
    Gender: "m",
    Age: "25-30",
    Interests: "science,maths"
});

Visitor Custom IDs

A visitor instance can have many IDs associated with it. Product IDs, User IDs, Content IDs, Ad IDs and any other ID can be added to the custom IDs field for a visitor instance. This field is specifically designed for IDs to separate data (previously mentioned) and IDs from their “concerns”. It also makes it easier for optimizing database performance (indexing).

To capture visitor instance custom ids at any point, trigger the following JavaScript code (after the queue code).

hha("send", "visitorCustomIDs", {CUSTOM IDS}, {CUSTOM ID MODIFIERS});
Parameter Name Value Required
1 Action Name “send” Yes
2 Action Type “visitorCustomIDs” Yes
3 Custom IDs Single level JSON object Yes
4 Custom ID Modifiers Single level JSON object No

Example:

hha("send", "visitorCustomIDs", {
  "SourceCampaignID": "fc08bb1c-8676-49d8-a525-70d971e70858",
  "CustomerID": "123456789",
  "SegmentID": "a123bcd",
});

Modifiers

Server side modifiers enable you to modify or insert data by utilizing the HHA server.

A good example of this is time, if we wanted to add the timestamp in a data field we could do this using browser side JavaScript. The collected time however would be relative to a users browser, which could be problematic as some computer clocks are faster, slower or completely inaccurate. To solve this problem we created modifiers that insert or modify data on the HHA server. Since they are executed on the server you can be more confident that the data is much more uniform and trusted.

A modifier set is a JSON object, it can have one or more modifiers. Lets say we wanted to hash the value of the “Content” field, then we could use the following modifier set.

{
  "Content": "hash:sha256"
}

The Keys reference the fields to be modified, and the Values reference the modifier to use. In the context of sending session data we could use it as per the following example.

hha("send", "sessionData", {
  "Content": "one->two->three",
  "Body": "abcdefghijlmnopqrstuvwxyz",
  "Title": "Robert Frost - Two Roads"
}, {
  "Content": "hash:sha256",
  "Body": "hash:sha512",
});

The resultant data field in the associated web_session database record would look like the following. Note that we’ve also added a modifier to the “Body” field in the modifier set and also that the “Title” field is untouched.

{
  "Content":"633430bee6867091490178354e32d7f4e654f327202de63756c69143c5cd5590",
  "Body":"116B7EC5F19DB1B8B465C98B4744CE470E1F547783DC72F37DF1050707AC428744C0007280EAA7A4AB8A01C44A023F5CB0409C5B18A2B61D1EDAC04D7E46D549",
  "Title":"Robert Frost - Two Roads"
}

Note. Modifiers are limited to improve server performance. There are settings for both limiting the modifier set and the length of modifiers themselves. Refer to the server configuration for further detail.

Piping Modifiers

You can use multiple modifiers together by “piping” them. The effect of this is that the resultant values are passed left to right through all the modifiers.

Example

Assume we have the current data

{
  "TestKey":"TestValue"
}

Lets say you want to append the string “Last”, prepend the string “First”, and also append the current year the the “TestKey” field. You could use the following modifier set.

{
  "TestKey":"appendVal:First|prependVal:Last|appendMod:time:year"
}

The resultant data is as follows.

{
  "TestKey":"FirstTestValueLast2018"
}

Modifier Reference

Transform Modifiers

Transform modifiers take the current value provided and applies a transformation to it.

Hash

Modifier Description
“hash:sha256” Hashes the associated value using sha256
“hash:sha512” Hashes the associated value using sha512

String

Modifier Description Example
“appendVal” Appends a string “appendVal:hello world”
“prependVal” Prepends a string “prependVal:test string”
“appendMod” Appends an insert modifier “appendMod:unix:nano”
“prependMod” Prepends an insert modifier “prependMod:time:year”
  • To escape a pipe character you can use a double pipe. e.g. “appendVal:hello||world”.
  • Colons don’t need to be escaped i.e. “appendVal:hello:world” should append the string “hello:world”.

Insert Modifiers

Insert modifiers simply add or replaces existing values.

Time

All time modifiers executed within a modifier set will have the same time. i.e if you have multiple time modifiers in a modifier set they should produce exactly the same time.

Modifier Description
“unix:nano” Unix time in nanoseconds
“unix:micro” Unix time in microseconds
“unix:milli” Unix time in milliseconds
“unix:second” Unix time in seconds
“time:yearDay” The day of the year 1-365 (366 for leap years)
“time:yearWeek” The week of the year 1-52
“time:yearMonth” The month of the year 1-12
“time:year” The current year
“time:namedDay” The current named day i.e. monday - friday
“time:namedMonth” The current named month i.e. january - december

Pings

A ping request is automatically sent to the Hive Hero Analytics server on a periodic basis. This ping helps track page, session and visitor times.

Ping Interval

By default a ping is sent every 5 seconds. To change this you will need to use the following code.

hha("config", "set", "ping", 5);

The fourth parameter is the interval in seconds.

Disabling Pings

You can disable the ping using the following code.

hha("config", "set", "ping", 0);

Database Structure

Detailed information about the data captured in the database

To understand the data and how to query it, a good understanding of the database structure is recommended. A list of the tables is below.

  • web_views
  • web_events
  • web_impressions
  • web_pages
  • web_pages_data_updates
  • web_pages_custom_id_updates
  • web_visitors
  • web_visitors_data_updates
  • web_visitors_custom_id_updates
  • web_sessions
  • web_sessions_data_updates
  • web_sessions_custom_id_updates
  • web_rejects
  • web_versions

The following sections outline each table, their associated fields and some common terms used.

Common Terms

UUID - A Universally Unique Identifier that is unique across all data. I is commonly represented as a text string.

Collector - The code/framework that collects data from the client (browser).

Web Events

An Event or a Page Event is a record triggered on a web page to record a specific action or event. Every row in this table represents an event.

Column Data Type Description
id BIGSERIAL A unique incremental id for each event
id.request TEXT A UUID that identifies the http request that collected the event
id.page TEXT A UUID that identifies the page instance that the event originated from
id.session TEXT A UUID that identifies the browser session instance that the event originated from
id.visitor TEXT A UUID that identifies the browser visitor instance that the event originated from
collector.request_num INTEGER A collector instance sends many HTTP requests. This request number is nth request the event originated from i.e the 5th request.
collector.message_num INTEGER There are multiple message slots in a request from a collector. The event is created from a message and the message number is the slot that the message occupied i.e the 2nd slot.
origin.remote_address.ip INET The IP address where the event originated from
origin.remote_address.port INTEGER The network port where the event originated from
origin.x_forwarded_for_ips INET[] An array of IP addresses indicating the path, if any, where the event originated from.
address TEXT The users current browser page URL where the event originated from.
address.protocol TEXT The protocol (HTTP/HTTPS) portion of the above address. e.g https://hivehero.com:80/pages/2015/09/31/ article.html
address.hostname TEXT The hostname (or domain name) portion of the above address. e.g https://hivehero.com:80/pages/2015/09/31/article.html
address.port INTEGER The network port portion of the above address. If no network port is specified it is usually 80. e.g https://hivehero.com :80/pages/2015/09/31/ article.html
address.pathname TEXT The file path portion of the above address. e.g https://hivehero.com:80/pages/2015/09/31/article.html
address.query_string TEXT The query string portion of the above address. e.g https://hivehero.com:80/pages/2015/09/31/article.html ?key=value&key2=value2
address.hash TEXT The address hash portion of the above address. e.g https://hivehero.com:80/pages/2015/09/31/article.html#SomeHash
referrer TEXT The address of the referring page. i.e The address of the previous page before visiting the current one where this event was generated.
referrer.protocol TEXT The protocol portion of the above referrer
referrer.hostname TEXT The hostname (or domain name) portion of the above referrer
referrer.port INTEGER The network port portion of the above referrer
referrer.pathname TEXT The file path portion of the above referrer
referrer.query_string TEXT The query string portion of the above referrer
type TEXT A category or type that can be arbitrarily used by developers to help categorize events. e.g “video:play” could indicate a video play event as apposed to a “button:click” event. This field is indexed for fast querying.
data JSONB An arbitrary place for data to be stored. All data has a key and a value associated with it. The JSON object cannot be nested beyond the first level.
received TIMESTAMP WITH TIME ZONE The time the server first received the event.
created TIMESTAMP WITH TIME ZONE The time the event was created in the database.

Web Impressions

An impression is usually used to indicate the presence of a piece of content. e.g to track if a specific ad was shown or a certain marketing campaign offer was shown. An impression is different from a view because it usually tracks a piece of content rather than a whole page consisting of multiple pieces of content.

Column Data Type Description
id BIGSERIAL PRIMARY KEY A unique incremental id for each impression
id.request TEXT A UUID that identifies the http request that collected the impression
id.page TEXT A UUID that identifies the page instance that the impression originated from
id.session TEXT A UUID that identifies the browser session instance that the impression originated from
id.visitor TEXT A UUID that identifies the browser visitor instance that the impression originated from
collector.request_num INTEGER A collector instance sends many HTTP requests. This request number is nth request the impression originated from i.e the 5th request.
collector.message_num INTEGER There are multiple message slots in a request from a collector. The impression is created from a message and the message number is the slot that the message occupied i.e the 2nd slot.
origin.remote_address.ip INET The IP address where the impression originated from
origin.remote_address.port INTEGER The network port where the impression originated from
origin.x_forwarded_for_ips INET[] An array of IP addresses indicating the path, if any, where the impression originated from.
address TEXT The users current browser page URL where the impression originated from.
address.protocol TEXT The protocol (HTTP/HTTPS) portion of the above address. e.g https://hivehero.com:80/pages/2015/09/31/ article.html
address.hostname TEXT The hostname (or domain name) portion of the above address. e.g https://hivehero.com:80/pages/2015/09/31/article.html
address.port INTEGER The network port portion of the above address. If no network port is specified it is usually 80. e.g https://hivehero.com :80/pages/2015/09/31/ article.html
address.pathname TEXT The file path portion of the above address. e.g https://hivehero.com:80/pages/2015/09/31/article.html
address.query_string TEXT The query string portion of the above address. e.g https://hivehero.com:80/pages/2015/09/31/article.html ?key=value&key2=value2
address.hash TEXT The address hash portion of the above address. e.g https://hivehero.com:80/pages/2015/09/31/article.html#SomeHash
referrer TEXT The address of the referring page. i.e The address of the previous page before visiting the current one where this impression was generated.
referrer.protocol TEXT The protocol portion of the above referrer
referrer.hostname TEXT The hostname (or domain name) portion of the above referrer
referrer.port INTEGER The network port portion of the above referrer
referrer.pathname TEXT The file path portion of the above referrer
referrer.query_string TEXT The query string portion of the above referrer
type TEXT A category or type that can be arbitrarily used by developers to help categorize impressions. e.g “external:ad” could indicate an ad impression as apposed to a “internal:offer” or “internal:banner” impression. This field is indexed for fast querying.
data JSONB An arbitrary place for data to be stored. All data has a key and a value associated with it. The JSON object cannot be nested beyond the first level.
received TIMESTAMP WITH TIME ZONE The time the HHA server first received the impression.
created TIMESTAMP WITH TIME ZONE The time the current row was created in the database.

Web Views

A View or a Page View is record created every time someone loads or reloads a web page. Every row in this table represents a View.

Column Data Type Description
id BIGSERIAL A unique incremental id for each view
id.request TEXT A UUID that identifies the http request that collected the view
id.page TEXT A UUID that identifies the page instance that the view originated from
id.session TEXT A UUID that identifies the browser session instance that the view originated from
id.visitor TEXT A UUID that identifies the browser visitor instance that the view originated from
collector.request_num INTEGER A collector instance sends many HTTP requests. This request number is nth request the view originated from i.e the 5th request.
collector.message_num INTEGER There are multiple message slots in a request from a collector. The view is created from a message and the message number is the slot that the message occupied i.e the 2nd slot.
origin.remote_address.ip INET The IP address where the view originated from
origin.remote_address.port INTEGER The network port where the view originated from
origin.x_forwarded_for_ips INET[] An array of IP addresses indicating the path, if any, where the view originated from.
address TEXT The users current browser page URL where the view originated from.
address.protocol TEXT The protocol (HTTP/HTTPS) portion of the above address. e.g https :// hivehero.com :80 /pages/2015/09/31/ article.html
address.hostname TEXT The hostname (or domain name) portion of the above address. e.g https:// hivehero.com :80 /pages/2015/09/31/ article.html
address.port INTEGER The network port portion of the above address. e.g https:// hivehero.com :80 /pages/2015/09/31/ article.html
address.pathname TEXT The file path portion of the above address. e.g https:// hivehero.com :80 /pages/2015/09/31/ article.html
address.query_string TEXT The query string portion of the above address. e.g https:// hivehero.com :80 /pages/2015/09/31/ article.html ?key=value&key2=value2
address.hash TEXT The address hash portion of the above address. e.g https:// hivehero.com :80 /pages/2015/09/31/ article.html #SomeHash
referrer TEXT The address of the referring page. i.e The address of the previous page before visiting the current one where this view was generated.
referrer.protocol TEXT The protocol portion of the above referrer
referrer.hostname TEXT The hostname (or domain name) portion of the above referrer
referrer.port INTEGER The network port portion of the above referrer
referrer.pathname TEXT The file path portion of the above referrer
referrer.query_string TEXT The query string portion of the above referrer
type TEXT A category or type that can be arbitrarily used by developers to help categorize views. e.g “video” could indicate a video view as apposed to a “page” view. This field is indexed for fast querying.
data JSONB An arbitrary place for data to be stored. All data has a key and a value associated with it. The JSON object cannot be nested beyond the first level.
received TIMESTAMP WITH TIME ZONE The time the server first received the view.
created TIMESTAMP WITH TIME ZONE The time the view was created in the database.

Web Pages

A page is defined as an instance of a web page loaded in a browser. It is different from a view as multiple views can be triggered on an instance of a page. e.g when a single page web app is loaded a “page” instance is captured; the app may have multiple screens and a “view” is captured when a user sees one of those screens.

A page record’s data is cumulatively collected from multiple messages across multiple http requests. The multiple components that make up the record are outlined below. A page record maybe incomplete if the user browses away from the page, stops the page or looses internet connectivity.

A complete page record will give invaluable information about the users browser and computer, the performance of your web pages, and any technical issues with your web pages.

Start Fields

The “start.” fields prefixed with “start.” are collected immediately when a user requests a web page. It provides technical performance numbers and is useful when pages fail or are slow to load.

Combined with “complete.” fields you should be ale to gauge if there are any missed opportunities due to technical and performance issues.

Technical details of the performance numbers collected can be found in the Mozilla documentation.

Detail Fields

Any page instance information about the user’s browser or computer can be found in the “detail.” fields.

Complete Fields

“complete.” fields like the “start.” fields collect technical performance data. This data is collected when the page finishes loading and displaying to the user.

Combined with “start.” fields you should be ale to gauge if there are any missed opportunities due to technical and performance issues.

Technical details of the performance numbers collected can be found in the Mozilla documentation.

Counter and Seen Fields

Counter fields count how many pings, events, impressions and views happen on the particular page instance. This can be used to gauge page traffic density and also the time a user spends on a page instance.

Detailed Field Descriptions

Column Data Type Description
id BIGSERIAL PRIMARY KEY A unique incremental id for each page instance
id.page TEXT A UUID that identifies the page instance
id.session TEXT A UUID that identifies the browser session instance that the page originated from
id.visitor TEXT A UUID that identifies the browser visitor instance that the page originated from
start.id.request TEXT A UUID that identifies the http request that the “start.” information originated from
start.origin.remote_address.ip INET The IP address where the “start.” request originated from
start.origin.remote_address.port INTEGER The network port where the “start.” request originated from
start.origin.x_forwarded_for_ips INET[] An array of IP addresses indicating the path, if any, where the start request originated from.
start.collector.request_num INTEGER A collector instance sends many HTTP requests. This request number is nth collector request the “start.” message originated from i.e the 5th request.
start.collector.message_num INTEGER There are multiple message slots in a request from a collector. The “start.” fields are created from a message and the message number is the slot in the request i.e the 2nd slot.
start.collector.time_started BIGINT The time, in milliseconds, from when the user navigates to the page to the time the collector instance starts to execute
start.performance.redirect_start BIGINT The time, in milliseconds, from when the user navigates to the page to the time the browser starts executing redirects
start.performance.redirect_end BIGINT The time, in milliseconds, from when the user navigates to the page to the time the browser finishes executing redirects
start.performance.fetch_start BIGINT The time, in milliseconds, from when the user navigates to the page to the time the browser starts the request to download the page
start.performance.domain_lookup_start BIGINT The time, in milliseconds, from when the user navigates to the page to the time the browser starts resolving the page’s domain
start.performance.domain_lookup_end BIGINT The time, in milliseconds, from when the user navigates to the page to the time the browser finishes resolving the page’s domain
start.performance.connect_start BIGINT The time, in milliseconds, from when the user navigates to the page to the time the browser makes a connection to download the page
start.performance.connect_end BIGINT The time, in milliseconds, from when the user navigates to the page to the time the browser finishes making a connection to download the page
start.performance.request_start BIGINT The time, in milliseconds, from when the user navigates to the page to the time the request to download the page is made
start.performance.response_start BIGINT The time, in milliseconds, from when the user navigates to the page to the time when a response has started being received
start.performance.response_end BIGINT The time, in milliseconds, from when the user navigates to the page to the time when a response has been received
start.received TIMESTAMP WITH TIME ZONE The time the HHA server received the “start.” message.
detail.id.request TEXT A UUID that identifies the http request that the “.detail” information originated from
detail.origin.remote_address.ip INET The IP address where the “detail.” request originated from
detail.origin.remote_address.port INTEGER The network port address where the “detail.” request originated from
detail.origin.x_forwarded_for_ips INET[] An array of IP addresses indicating the path, if any, where the “detail.” request originated from.
detail.collector.request_num INTEGER A collector instance sends many HTTP requests. This request number is nth request the “detail.” message originated from i.e the 5th request.
detail.collector.message_num INTEGER There are multiple message slots in a request from a collector. The “detail.” data is created from a message and the message number is the request slot i.e the 2nd slot.
detail.address TEXT The users current browser page URL or page address.
detail.address.protocol TEXT The protocol (HTTP/HTTPS) portion of the above address. e.g https://hivehero.com:80/pages/2015/09/31/article.html
detail.address.hostname TEXT The hostname (or domain name) portion of the above address. e.g https://hivehero.com:80/pages/2015/09/31/ article.html
detail.address.port INTEGER The network port portion of the above address. If no network port is specified it is usually 80. e.g https://hivehero.com:80/pages/2015/09/31/ article.html
detail.address.pathname TEXT The file path portion of the above address. e.g https:// hivehero.com :80/pages/2015/09/31/article.html
detail.address.query_string TEXT The query string portion of the above address. e.g https://hivehero.com:80/pages/2015/09/31/article.html?key=value&key2=value2
detail.address.hash TEXT The address hash portion of the above address. e.g https://hivehero.com:80/pages/2015/09/31/ article.html#SomeHash
detail.referrer TEXT The address of the referring page. i.e The address of the previous page before loading the current one.
detail.referrer.protocol TEXT The protocol portion of the above referrer
detail.referrer.hostname TEXT The hostname (or domain name) portion of the above referrer
detail.referrer.port INTEGER The network port portion of the above referrer
detail.referrer.pathname TEXT The file path portion of the above referrer
detail.referrer.query_string TEXT The query string portion of the above referrer
detail.screen.width INTEGER User’s computer screen width
detail.screen.height INTEGER User’s computer screen height
detail.screen.visible_width INTEGER Browser’s visible width
detail.screen.visible_height INTEGER Browser’s visible height
detail.browser.name TEXT The name of the browser e.g chrome
detail.browser.version TEXT The version of the browser
detail.browser.user_agent TEXT The browser user agent string used to determine the browser version and name
detail.browser.accept_languages TEXT[] A list of languages the browser is set to accept e.g en_au
detail.browser.accept_languages_quality TEXT[] A list of the language quality or priorities relative to the accept_languages
detail.browser.plugins TEXT[] A list of the browsers plugins enabled by the user
detail.os.name TEXT The operating system the browser is running in. e.g windows
detail.os.version TEXT The version of the operating system
detail.received TIMESTAMP WITH TIME ZONE The time the HHA server received the “detail.” message.
complete.id.request TEXT A UUID that identifies the http request that the “complete.” information originated from
complete.origin.remote_address.ip INET The IP address where the “complete.” request originated from. Should be the same as the “start.” field.
complete.origin.remote_address.port INTEGER The network port where the “complete.” request originated from
complete.origin.x_forwarded_for_ips INET[] An array of IP addresses indicating the path, if any, where the start request originated from.
complete.collector.request_num INTEGER A collector instance sends many HTTP requests. This request number is nth collector request the “complete.” message originated from i.e the 5th request.
complete.collector.message_num INTEGER There are multiple message slots in a request from a collector. The “complete.” fields are created from a message and the message number is the slot in the request i.e the 2nd slot.
complete.collector.time_started BIGINT The time, in milliseconds, from when the user navigates to the page to the time the collector instance starts to execute
complete.performance.redirect_start BIGINT The time, in milliseconds, from when the user navigates to the page to the time the browser starts executing redirects
complete.performance.redirect_end BIGINT The time, in milliseconds, from when the user navigates to the page to the time the browser finishes executing redirects
complete.performance.fetch_start BIGINT The time, in milliseconds, from when the user navigates to the page to the time the browser starts the request to download the page
complete.performance.domain_lookup_start BIGINT The time, in milliseconds, from when the user navigates to the page to the time the browser starts resolving the page’s domain
complete.performance.domain_lookup_end BIGINT The time, in milliseconds, from when the user navigates to the page to the time the browser finishes resolving the page’s domain
complete.performance.connect_start BIGINT The time, in milliseconds, from when the user navigates to the page to the time the browser makes a connection to download the page
complete.performance.connect_end BIGINT The time, in milliseconds, from when the user navigates to the page to the time the browser finishes making a connection to download the page
complete.performance.request_start BIGINT The time, in milliseconds, from when the user navigates to the page to the time the request to download the page is made
complete.performance.response_start BIGINT The time, in milliseconds, from when the user navigates to the page to the time when a response has started being received
complete.performance.response_end BIGINT The time, in milliseconds, from when the user navigates to the page to the time when a response has been received
complete.performance.dom_loading BIGINT The time, in milliseconds, from when the user navigates to the page to when it’s document object model starts to load. The user will start to see the page render at this point
complete.performance.dom_interactive BIGINT The time, in milliseconds, from when the user navigates to the page to when it starts to become interactive.
complete.performance.dom_content_loaded_event_start BIGINT The time, in milliseconds, from when the user navigates to the page to when scripts start to run
complete.performance.dom_content_loaded_event_end BIGINT The time, in milliseconds, from when the user navigates to the page to when all scripts queued are triggered
complete.performance.dom_complete BIGINT The time, in milliseconds, from when the user navigates to the page to when the document object model is complete. The page, from a user’s perspective, is generally fully interactive at this point.
complete.performance.load_event_start BIGINT The time, in milliseconds, from when the user navigates to the page to when all content and scripts are loaded on the page.
complete.performance.load_event_end BIGINT The time, in milliseconds, from when the user navigates to the page to when the load event and all scripts are triggered by the load event have been completed.
complete.received TIMESTAMP WITH TIME ZONE The time the HHA server received the “complete.” message.
data JSONB Accumulated custom data associated with the page is stored here. This JSONB object should have a single level with simple keys and values
data.writes INTEGER The number of accumulated updates made to the data field
data.last_received TIMESTAMP WITH TIME ZONE The last time the data field was updated or written to
custom_ids JSONB Accumulated custom ids should be stored here. e.g. {“customerID”:“123456”} . This JSONB object should have a single level with simple string keys and values
custom_ids.writes INTEGER The number of accumulated updates made to the custom_ids field
custom_ids.last_received TIMESTAMP WITH TIME ZONE The last time the custom_ids field was updated or written to
counter.pings INTEGER The number of ping messages received from the page instance
counter.events INTEGER The number of event messages received from the page instance
counter.impressions INTEGER The number of impression messages received from the page instance
counter.views INTEGER The number of page view messages received from the page instance
seen.first TIMESTAMP WITH TIME ZONE Time the user was first seen on the page. This is usually the same as the “start.received” field as the “start.” fields are sent first. If there is network congestion or latency the “complete.received” may be the same as this field and messages and requests maybe received out of order.
seen.last TIMESTAMP WITH TIME ZONE Time the user was last seen on the page. This is determined by the last ping received, it therefore will be accurate to plus or minus the ping interval.
updated TIMESTAMP WITH TIME ZONE The time the current row was last updated in the database.
created TIMESTAMP WITH TIME ZONE The time the current row was created in the database.

Web Pages Data Updates

The data field in the web_pages table accumulates data. To maintain history of all updates this table records each individual transaction that goes into making the final data field.

Column Data Type Description
id BIGSERIAL A unique incremental id for each data update
id.request TEXT A UUID that identifies the http request that collected the data update
id.page TEXT A UUID that identifies the page instance that the data update originated from
id.session TEXT A UUID that identifies the browser session instance that the data update originated from
id.visitor TEXT A UUID that identifies the browser visitor instance that the data update originated from
origin.remote_address.ip INET The IP address where the data update originated from
origin.remote_address.port INTEGER The network port where the data update originated from
origin.x_forwarded_for_ips INET[] An array of IP addresses indicating the path, if any, where the data update originated from.
collector.request_num INTEGER A collector instance sends many HTTP requests. This request number is nth request the data update originated from i.e the 5th request.
collector.message_num INTEGER There are multiple message slots in a request from a collector. The data update is created from a message and the message number is the slot that the message occupied i.e the 2nd slot.
data JSONB The data sent to overwrite or add the data field in the web_pages table.
received TIMESTAMP WITH TIME ZONE The time the HHA server first received the data update.
created TIMESTAMP WITH TIME ZONE The time the current row was created in the database.

Web Pages Custom ID Updates

The custom_ids field in the web_pages table accumulates ids. To maintain history of all updates this table records each individual transaction that goes into making the final custom_ids field in the web_pages table.

Column Data Type Description
id BIGSERIAL A unique incremental id for each custom id update
id.request TEXT A UUID that identifies the http request that collected the custom id update
id.page TEXT A UUID that identifies the page instance that the custom id update originated from
id.session TEXT A UUID that identifies the browser session instance that the custom id update originated from
id.visitor TEXT A UUID that identifies the browser visitor instance that the custom id update originated from
origin.remote_address.ip INET The IP address where the custom id update originated from
origin.remote_address.port INTEGER The network port where the custom id update originated from
origin.x_forwarded_for_ips INET[] An array of IP addresses indicating the path, if any, where the custom id update originated from.
collector.request_num INTEGER A collector instance sends many HTTP requests. This request number is nth request the custom id update originated from i.e the 5th request.
collector.message_num INTEGER There are multiple message slots in a request from a collector. The custom id update is created from a message and the message number is the slot that the message occupied i.e the 2nd slot.
custom_ids JSONB The custom_ids sent to overwrite or add to the custom_ids field in the web_pages table.
received TIMESTAMP WITH TIME ZONE The time the HHA server first received the custom id update.
created TIMESTAMP WITH TIME ZONE The time the current row was created in the database.

Web Sessions

A session is defined as a continuous stream of interactions that does not stop for a defined period of time, and expires after a set period of time. e.g a session can expire after 30mins of inactivity and cannot be more than 3hrs long. Therefore multiple events, impressions, views, and pages can be created during a session. Each row in this table is a session.

A session’s data field and custom_ids field are cumulatively collected over the course of a session.

Column Data Type Description
id BIGSERIAL PRIMARY KEY A unique incremental id for each session instance
id.session TEXT A UUID that identifies the browser session instance
id.visitor TEXT A UUID that identifies the browser visitor instance that the session originated from
data JSONB Accumulated custom data associated with the session is stored here. This JSONB object should have a single level with simple keys and values
data.writes INTEGER The number of accumulated updates made to the data field
data.last_received TIMESTAMP WITH TIME ZONE The last time the data field was updated or written to
custom_ids JSONB Accumulated custom ids should be stored here. e.g. {“customerID”:“123456”} . This JSONB object should have a single level with simple string keys and values
custom_ids.writes INTEGER The number of accumulated updates made to the custom_ids field
custom_ids.last_received TIMESTAMP WITH TIME ZONE The last time the custom_ids field was updated or written to
counter.pings INTEGER The number of ping messages received in the session instance
counter.events INTEGER The number of event messages received in the session instance
counter.impressions INTEGER The number of impression messages received in the session instance
counter.views INTEGER The number of page view messages received in the session instance
seen.first TIMESTAMP WITH TIME ZONE Time the user was first seen in the session, by the server.
seen.last TIMESTAMP WITH TIME ZONE Time the user was last seen in the session by the server. This is determined by the last ping received, it therefore will be accurate to plus or minus the ping interval.
updated TIMESTAMP WITH TIME ZONE The time the current row was last updated in the database.
created TIMESTAMP WITH TIME ZONE The time the current row was created in the database.

Web Sessions Data Updates

The data field in the web_sessions table accumulates data. To maintain history of all updates this table records each individual transaction that goes into making the final data field.

Column Data Type Description
id BIGSERIAL A unique incremental id for each data update
id.request TEXT A UUID that identifies the http request that collected the data update
id.page TEXT A UUID that identifies the page instance that the data update originated from
id.session TEXT A UUID that identifies the browser session instance that the data update originated from
id.visitor TEXT A UUID that identifies the browser visitor instance that the data update originated from
origin.remote_address.ip INET The IP address where the data update originated from
origin.remote_address.port INTEGER The network port where the data update originated from
origin.x_forwarded_for_ips INET[] An array of IP addresses indicating the path, if any, where the data update originated from.
collector.request_num INTEGER A collector instance sends many HTTP requests. This request number is nth request the data update originated from i.e the 5th request.
collector.message_num INTEGER There are multiple message slots in a request from a collector. The data update is created from a message and the message number is the slot that the message occupied i.e the 2nd slot.
data JSONB The data sent to overwrite or add to the data field in the web_sessions table.
received TIMESTAMP WITH TIME ZONE The time the HHA server first received the data update.
created TIMESTAMP WITH TIME ZONE The time the current row was created in the database.

Web Sessions Custom ID Updates

The custom_ids field in the web_sessions table accumulates ids. To maintain history of all updates this table records each individual transaction that goes into making the final custom_ids field in the web_sessions table.

Column Data Type Description
id BIGSERIAL A unique incremental id for each custom id update
id.request TEXT A UUID that identifies the http request that collected the custom id update
id.page TEXT A UUID that identifies the page instance that the custom id update originated from
id.session TEXT A UUID that identifies the browser session instance that the custom id update originated from
id.visitor TEXT A UUID that identifies the browser visitor instance that the custom id update originated from
origin.remote_address.ip INET The IP address where the custom id update originated from
origin.remote_address.port INTEGER The network port where the custom id update originated from
origin.x_forwarded_for_ips INET[] An array of IP addresses indicating the path, if any, where the custom id update originated from.
collector.request_num INTEGER A collector instance sends many HTTP requests. This request number is nth request the custom id update originated from i.e the 5th request.
collector.message_num INTEGER There are multiple message slots in a request from a collector. The custom id update is created from a message and the message number is the slot that the message occupied i.e the 2nd slot.
custom_ids JSONB The custom_ids sent to overwrite or add to the custom_ids field in the web_sessions table.
received TIMESTAMP WITH TIME ZONE The time the HHA server first received the custom id update.
created TIMESTAMP WITH TIME ZONE The time the current row was created in the database.

Web Visitors

A visitor or a unique browser is essentially a way to track recurring users over longer periods of time. Multiple events, impressions, views, pages, and sessions can be attributed to a visitor. Each row in this table is a visitor.

A visitor is the best possible approximation of a user or person without knowing their identity via collecting personally identifiable information i.e names, address, emails etc.

Column Data Type Description
id BIGSERIAL PRIMARY KEY A unique incremental id for each visitor instance
id.visitor TEXT A UUID that identifies the browser visitor instance
data JSONB Accumulated custom data associated with the visitor is stored here. This JSONB object should have a single level with simple keys and values
data.writes INTEGER The number of accumulated updates made to the data field
data.last_received TIMESTAMP WITH TIME ZONE The last time the data field was updated or written to
custom_ids JSONB Accumulated custom ids should be stored here. e.g. {“customerID”:“123456”} . This JSONB object should have a single level with simple string keys and values
custom_ids.writes INTEGER The number of accumulated updates made to the custom_ids field
custom_ids.last_received TIMESTAMP WITH TIME ZONE The last time the custom_ids field was updated or written to
counter.pings INTEGER The number of ping messages received in the visitor instance
counter.events INTEGER The number of event messages received in the visitor instance
counter.impressions INTEGER The number of impression messages received in the visitor instance
counter.views INTEGER The number of page view messages received in the visitor instance
seen.first TIMESTAMP WITH TIME ZONE Time the visitor was first seen by the server.
seen.last TIMESTAMP WITH TIME ZONE Time the visitor was last seen by the server. This is determined by the last ping received, it therefore will be accurate to plus or minus the ping interval.
updated TIMESTAMP WITH TIME ZONE The time the current row was last updated in the database.
created TIMESTAMP WITH TIME ZONE The time the current row was created in the database.

Web Visitors Data Updates

The data field in the web_visitors table accumulates data. To maintain history of all updates this table records each individual transaction that goes into making the final data field.

Column Data Type Description
id BIGSERIAL A unique incremental id for each data update
id.request TEXT A UUID that identifies the http request that collected the data update
id.page TEXT A UUID that identifies the page instance that the data update originated from
id.session TEXT A UUID that identifies the browser session instance that the data update originated from
id.visitor TEXT A UUID that identifies the browser visitor instance that the data update originated from
origin.remote_address.ip INET The IP address where the data update originated from
origin.remote_address.port INTEGER The network port where the data update originated from
origin.x_forwarded_for_ips INET[] An array of IP addresses indicating the path, if any, where the data update originated from.
collector.request_num INTEGER A collector instance sends many HTTP requests. This request number is nth request the data update originated from i.e the 5th request.
collector.message_num INTEGER There are multiple message slots in a request from a collector. The data update is created from a message and the message number is the slot that the message occupied i.e the 2nd slot.
data JSONB The data sent to overwrite or add to the data field in the web_visitors table.
received TIMESTAMP WITH TIME ZONE The time the HHA server first received the data update.
created TIMESTAMP WITH TIME ZONE The time the current row was created in the database.

Web Visitors Custom ID Updates

The custom_ids field in the web_visitors table accumulates ids. To maintain history of all updates this table records each individual transaction that goes into making the final custom_ids field in the web_visitors table.

Column Data Type Description
id BIGSERIAL A unique incremental id for each custom id update
id.request TEXT A UUID that identifies the http request that collected the custom id update
id.page TEXT A UUID that identifies the page instance that the custom id update originated from
id.session TEXT A UUID that identifies the browser session instance that the custom id update originated from
id.visitor TEXT A UUID that identifies the browser visitor instance that the custom id update originated from
origin.remote_address.ip INET The IP address where the custom id update originated from
origin.remote_address.port INTEGER The network port where the custom id update originated from
origin.x_forwarded_for_ips INET[] An array of IP addresses indicating the path, if any, where the custom id update originated from.
collector.request_num INTEGER A collector instance sends many HTTP requests. This request number is nth request the custom id update originated from i.e the 5th request.
collector.message_num INTEGER There are multiple message slots in a request from a collector. The custom id update is created from a message and the message number is the slot that the message occupied i.e the 2nd slot.
custom_ids JSONB The custom_ids sent to overwrite or add to the custom_ids field in the web_visitors table.
received TIMESTAMP WITH TIME ZONE The time the HHA server first received the custom id update.
created TIMESTAMP WITH TIME ZONE The time the current row was created in the database.

Web Rejects

All requests that fail to validate and parse end up in this table.

Column Data Type Description
id BIGSERIAL A unique incremental id for each data update
request.raw TEXT This is the raw body of the request. It maybe truncated if it is larger than the configured limit.
request.parsed JSONB This is a parsed JSON version of the request. This maybe null or empty if parsing failed.
errors TEXT[] A list of the errors that caused this request to be rejected.
received TIMESTAMP WITH TIME ZONE The time the HHA server first received the request.
created TIMESTAMP WITH TIME ZONE The time the current row was created in the database.

Web Versions

Any Updates to the database schema should be logged in this table.

Column Data Type Description
id BIGSERIAL A unique incremental id for each data update
version TEXT The version of the database
description TEXT A description of version and the changes made
sql_update TEXT The actual SQL run when updating from the previous version
created TIMESTAMP WITH TIME ZONE The time the current row was created in the database.

Reporting Queries

Simple queries to show you how to get answers.

Events

How many events of a specific type occurred?

SELECT
  count(*)
/* Show the count of all rows */ FROM
  web_events
/* Retrieve data from the web_events table, as each row represents an event */ WHERE
  received >= '2017-01-01 00:00:00'::TIMESTAMP
/* Where the event was received exactly on or after 2017-01-01 00:00:00. The received column contains timestamps */   AND
  received <= '2017-01-02 00:00:00'::TIMESTAMP
/* Where the event was received exactly on or before 2017-01-02 00:00:00. The received column contains timestamps */   AND
  type like 'expand'
/* Where the event type is 'expand' i.e. we may have triggered the event when a user expanded a form field */

Impressions

How many impressions of a specific type occurred?

SELECT
  count(*)
/* Show the count of all rows */ FROM
  web_impressions
/* Retrieve data from the web_impressions table, as each row represents an impression */ WHERE
  received >= '2017-01-01 00:00:00'::TIMESTAMP
/* Where the impression was received exactly on or after 2017-01-01 00:00:00. The received column contains timestamps */   AND
  received <= '2017-01-02 00:00:00'::TIMESTAMP
/* Where the impression was received exactly on or before 2017-01-02 00:00:00. The received column contains timestamps */   AND
  type like 'external:ad'
/* Where the impression type is 'external:ad' i.e. Triggered when an Ad on an external website is viewed */

Views

How many pages were viewed in a specific time range?

SELECT
  count(*)
/* Show the count of all rows */ FROM
  web_views
/* Retrieve data from the web_views table, as each row represents a page view */ WHERE
  received >= '2017-01-01 00:00:00'::TIMESTAMP
/* Where the view was received exactly on or after 2017-01-01 00:00:00. The received column contains timestamps */   AND
  received <= '2017-01-02 00:00:00'::TIMESTAMP
/* Where the view was received exactly on or before 2017-01-02 00:00:00. The received column contains timestamps */

Pages

What are the 20 slowest loading pages?

SELECT
  "detail.address" AS "Address",
/* List the address of each page instance */
  avg("complete.performance.load_event_end") AS "Average Time"
/* List the average load times for each address */ FROM
  web_pages
/* Retrieve data from the web_pages table, as each row represents an page instance */ WHERE
  "complete.received" >= '2017-01-01 00:00:00'::TIMESTAMP
/* Where the completed page data was received exactly on or after 2017-01-01 00:00:00. */   AND
  "complete.received" <= '2017-01-02 00:00:00'::TIMESTAMP
/* Where the completed page data was received exactly on or before 2017-01-02 00:00:00. */ GROUP BY "detail.address" /* Group the data by address (to calculate the average) */ ORDER BY "Average Time" DESC /* The slowest pages first */ LIMIT 20 /* Limit to the 20 rows of data */

On average, list the top 20 page instances that are open the longest.

SELECT
  "detail.address",
/* List the address of each page instance */   avg(EXTRACT(MILLISECONDS FROM "seen.last" - "seen.first")) AS "Avg Time Open" /* Subtract the when we last saw the page instance, "seen.last", with when we first saw the page instance, "seen.first". This will produce an interval. Then get the average milliseconds from the interval and label the column "Avg Time Open" */ FROM
  web_pages
/* Retrieve data from the web_pages table, as each row represents an page instance */ WHERE
  "complete.received" >= '2017-01-01 00:00:00'::TIMESTAMP
/* Where the view was received exactly on or after 2017-01-01 00:00:00. */   AND
  "complete.received" <= '2018-01-02 00:00:00'::TIMESTAMP
/* Where the view was received exactly on or before 2017-01-02 00:00:00. */ GROUP BY "detail.address" /* GROUP BY - Group the data by address */ ORDER BY "Avg Time Open" DESC /* The largest numbers first */ LIMIT 20 /* Limit to the 20 rows of data */

Sessions

How long is an average session in seconds?

SELECT
  avg(EXTRACT(SECONDS FROM "seen.last" - "seen.first")) AS "Avg Time"
/* Subtract the when we last saw the session instance, "seen.last", with when we first saw the session instance, "seen.first". This will produce an interval. Then get the average seconds from the interval and label the column "Avg Time" */ FROM
  web_sessions
/* Retrieve data from the web_sessions table, as each row represents a session */ WHERE
  "created" >= '2017-01-01 00:00:00'::TIMESTAMP
/* Where the session was created on or after 2017-01-01 00:00:00. */   AND
  "created" <= '2018-01-02 00:00:00'::TIMESTAMP
/* Where the session was created on or before 2017-01-02 00:00:00. */

How many “views” happen on average in a session?

SELECT
  avg("counter.views")
/* Get the average views */ FROM
  web_sessions
/* Retrieve data from the web_sessions table, as each row represents a session */ WHERE
  "created" >= '2017-01-01 00:00:00'::TIMESTAMP
/* Where the session was created on or after 2017-01-01 00:00:00. */   AND
  "created" <= '2017-01-02 00:00:00'::TIMESTAMP
/* Where the session was created on or before 2017-01-02 00:00:00. */

Visitors

How many visitors started a session in the last month?

You actually don’t need the web_visitors table to do this. We can get this using just the web_sessions table.

SELECT
  count(*)
/* Get the count of all rows returned */ FROM
  web_sessions
WHERE
  web_sessions."created" >= date_trunc('month', now()) - INTERVAL '1 month'
/* Where the session was created on or after the first day of the last month */   AND
  web_sessions."created" <  date_trunc('month', now())
/* Where the session was created before the first day of this month. */

How may sessions do visitors have on average? From visitors created in the last two days.

SELECT
  AVG("counter.sessions")
/* Get the average of all session counts */ FROM
  web_visitors
WHERE
  "created" >= date_trunc('day', now()) - INTERVAL '3 days'
/* Where the visitor was created on or after 3 days ago */   AND
  "created" < date_trunc('day', now())
/* before today */

Funnel Analysis

A funnel is a technique to help determine how entities travel through a specific journey to reach a goal. A funnel can be refined so that you can determine where entities drop out before reaching the goal.

A traditional funnel analysis has defined visitor steps or actions. Each step in the visitors journey takes them one step closer to a goal. The analysis usually focuses on how many visitors reach each step and tries to identify the step where visitors drop off most before reaching the goal.

Cart Promotion Checkout

Funnel steps - Step 1. Viewed email promotion landing page - Step 2. Added an item to the card - Step 3. Viewed the checkout page - Step 4. Completed the purchase

WITH
/*  Step 1. Retrieve all records that qualify. i.e all visitors that viewed the email promotion landing page */
step1 AS (
    SELECT web_views."id.visitor" AS "v", web_views."created" AS "c"
    FROM web_views
    WHERE "address.pathname" = '/email/campaign1.html'
),
/* Step 2. Retrieve all "car:add:item" events that visitors triggered, make sure they happened after step 1 for each visitor. */ step2 AS (
    SELECT web_events."id.visitor" AS "v", web_events."created" AS "c"
    FROM web_events INNER JOIN step1 ON web_views."id.visitor" = step1.v
    WHERE
        web_events."type" = 'cart:add:item'
    AND
        web_events."created" > step1.c
),
/* Step 3. Retrieve all views from the "/car/checkout" page. Make sure they happen after step2 for each visitor. */ step3 AS (
    SELECT web_views."id.visitor" AS "v", web_views."created" AS "c"
    FROM web_views INNER JOIN step2 ON web_views."id.visitor" = step2.v
    WHERE
        web_views."address.pathname" = '/cart/checkout'
    AND
        web_views."created" > step2.c
),
/* Step 4. Retrieve all views from the "/car/checkout/thankyou" page. Make sure they happen after step3 for each visitor. */ step4 AS (
    SELECT web_views."id.visitor" AS "v", web_views."created" AS "c"
    FROM web_views INNER JOIN step3 ON web_views."id.visitor" = step3.v
    WHERE
        web_views."address.pathname" = '/cart/checkout/thankyou'
    AND
        web_views."created" > step3.c
)
/* Collate all results into one table */ SELECT
  (SELECT COUNT(DISTINCT v) FROM step1) AS "Step1",
  (SELECT COUNT(DISTINCT v) FROM step2) AS "Step2",
  (SELECT COUNT(DISTINCT v) FROM step3) AS "Step3",
  (SELECT COUNT(DISTINCT v) FROM step4) AS "Step4"


The result of the query should look the following table:

Step1 Step2 Step3 Step4
10942 901 120 110

Just by looking at the numbers produced in the previous table it seems quite obvious that visitors are finding it difficult to progress from Step2 to Step3. To determine the root cause you may want to introduce additional steps in the funnel until you are satisfied enough to that you know how to fix or improve the visitor journey.

Server Configuration

Configure a Hive Hero Analytics Server to your needs

All configuration is controlled via the config.toml file specified when starting the HHA server on the command line.

hh-analytics -config=config.toml

The following configuration options can all be modified and added to the config.toml file.

Logging Configuration

Levels

Logging levels allows you to set how much detail HHA server will log to the console. In the config.toml file you can configure logging as per the example below:

[logging]
level = 3
# trace=3,info=2,warning=1,error=0 (Default is 3)
  • 3 - will give you the most trace level detail
  • 2 - will give you the informational detail
  • 1 - will give you only error logging

Server Configuration

Port

The network port the server will run from.

[server]
port = 9001

To test the server is running correctly, open a browser and go to localhost:[PORT]/c.js , this should return a response with JavaScript. Please ensure you replace “[PORT]” with the configured port in the toml file.

Database Configuration

PostgreSQL database server configuration. The HHA server needs a PostgreSQL server in order to collect data, this section outlines its configuration. An working example of this configuration is below:

[postgresql]
host = "localhost"
port = 5432 # defaults to port 5432
connect_timeout = 0 # zero means an indefinite wait

user = "HHAServer"
password = ""
database = "hha"

sslmode = "disable" # disabled SSL
# sslmode - Whether or not to use SSL
# sslmode DEFAULT is "require"
# sslmode = "disable" - No SSL
# sslmode = "require" - Always SSL (skip verification)
# sslmode = "verify-ca" - Always SSL (verify that the certificate presented by the server was signed by a trusted CA)
# sslmode = "verify-full" - Always SSL (verify that the certification presented by the server was signed by a trusted CA and the server host name matches the one in the certificate)

Connectivity

  • host - Host - The host can be an IP address or a server/computer name.
  • port - Port - The port the database is running on. The port defaults to 5432.
  • connect_timeout - Connection Timeout - Maximum time to wait for a connection, in seconds. Zero or not specified means an indefinite wait.
[postgresql]
host = "localhost"
port = 5432 # defaults to port 5432
connect_timeout = 0 # Zero means an indefinite wait.

Credentials

  • user - User - database access username (requires fill read and write access)
  • password - Password - to access the database
  • database - Database - postgresSQL database to use
[postgresql]
user = "HHAUser"
password = "NoMoreSecrets"
database = "HHA"

Security

  • sslmode - The level of SSL network protocol level encryption to use for communication between the HHA server and the database
    • sslmode DEFAULT is “require”
    • sslmode = “disable” - No SSL
    • sslmode = “require” - Always SSL (skip verification)
    • sslmode = “verify-ca” - Always SSL (verify that the certificate presented by the server was signed by a trusted CA)
    • sslmode = “verify-full” - Always SSL (verify that the certification presented by the server was signed by a trusted CA and the server host name matches the one in the certificate)
  • sslcert - SSL Certificate - Cert file location. The file must contain PEM encoded data.
  • sslkey - SSL Key File - Key file location. The file must contain PEM encoded data.
  • sslrootcert - SSL Root Certificate - The location of the root certificate file. The file must contain PEM encoded data.
[postgresql]
sslmode = "disable"
sslcert = ""
sslkey = ""
sslrootcert = ""

Visitor Configuration

Configuration associated with visitor data and id collection. Working example is below:

[web_visitor]
data_db_limit = 4000 #DEFAULT is 4000
data_payload_limit = 2000 #DEFAULT is 2000
data_modifier_limit = 5 #DEFAULT is 5
data_modifier_length_limit = 50 #DEFAULT is 50
custom_ids_db_limit = 4000 #DEFAULT is 4000
custom_ids_payload_limit = 2000 #DEFAULT is 2000
custom_ids_modifier_limit = 5 #DEFAULT is 5
custom_ids_modifier_length_limit = 50 #DEFAULT is 50

Data

  • data_db_limit - Character length limit of a visitors JSON data object. Per visitor data will not accumulate after this limit is reached.
    • data_db_limit DEFAULT is 4000
  • data_payload_limit - JSON data updates larger than this are ignored
    • data_payload_limit DEFAULT is 2000
  • data_modifier_limit - Number of server side data modifiers that can be executed in one call
    • data_modifier_limit DEFAULT is 5
  • data_modifier_length_limit - The maximum length of a data modifier string
    • data_modifier_length_limit DEFAULT is 50
[web_visitor]
data_db_limit = 4000
data_payload_limit = 2000
data_modifier_limit = 5
data_modifier_length_limit = 50

Custom IDs

  • custom_ids_db_limit - Character length limit of a visitors JSON custom ids object. Per visitor custom ids will not accumulate after this limit is reached.
    • custom_ids_db_limit DEFAULT is 4000
  • custom_ids_payload - JSON custom ids updates larger than this are ignored
    • custom_ids_payload DEFAULT is 2000
  • custom_ids_modifier_limit - Number of server side custom ids modifiers that can be executed in one call
    • custom_ids_modifier_limit DEFAULT is 5
  • custom_ids_modifier_length_limit - The maximum length of a custom ids modifier string
    • custom_ids_modifier_length_limit DEFAULT is 50
[web_visitor]
custom_ids_db_limit = 4000
custom_ids_payload_limit = 2000
custom_ids_modifier_limit = 5
custom_ids_modifier_length_limit = 50

Session Configuration

Configuration associated with session data and id collection. Working example is below:

[web_session]
data_db_limit = 4000 #DEFAULT is 4000
data_payload_limit = 2000 #DEFAULT is 2000
data_modifier_limit = 5 #DEFAULT is 5
data_modifier_length_limit = 50 #DEFAULT is 50
custom_ids_db_limit = 4000 #DEFAULT is 4000
custom_ids_payload_limit = 2000 #DEFAULT is 2000
custom_ids_modifier_limit = 5 #DEFAULT is 5
custom_ids_modifier_length_limit = 50 #DEFAULT is 50

Data

  • data_db_limit - Character length limit of a sessions JSON data object. Per session data will not accumulate after this limit is reached.
    • data_db_limit DEFAULT is 4000
  • data_payload_limit - JSON data updates larger than this are ignored
    • data_payload_limit DEFAULT is 2000
  • data_modifier_limit - Number of server side data modifiers that can be executed in one call
    • data_modifier_limit DEFAULT is 5
  • data_modifier_length_limit - The maximum length of a data modifier string
    • data_modifier_length_limit DEFAULT is 50
[web_session]
data_db_limit = 4000
data_payload_limit = 2000
data_modifier_limit = 5
data_modifier_length_limit = 50

Custom IDs

  • custom_ids_db_limit - Character length limit of a sessions JSON custom ids object. Per session custom ids will not accumulate after this limit is reached.
    • custom_ids_db_limit DEFAULT is 4000
  • custom_ids_payload - JSON custom ids updates larger than this are ignored
    • custom_ids_payload DEFAULT is 2000
  • custom_ids_modifier_limit - Number of server side custom ids modifiers that can be executed in one call
    • custom_ids_modifier_limit DEFAULT is 5
  • custom_ids_modifier_length_limit - The maximum length of a custom ids modifier string
    • custom_ids_modifier_length_limit DEFAULT is 50
[web_session]
custom_ids_db_limit = 4000
custom_ids_payload_limit = 2000
custom_ids_modifier_limit = 5
custom_ids_modifier_length_limit = 50

Page Configuration

Configuration associated with page data and id collection. Working example is below:

[web_page]
data_db_limit = 4000 #DEFAULT is 4000
data_payload_limit = 2000 #DEFAULT is 2000
data_modifier_limit = 5 #DEFAULT is 5
data_modifier_length_limit = 50 #DEFAULT is 50
custom_ids_db_limit = 4000 #DEFAULT is 4000
custom_ids_payload_limit = 2000 #DEFAULT is 2000
custom_ids_modifier_limit = 5 #DEFAULT is 5
custom_ids_modifier_length_limit = 50 #DEFAULT is 50

Data

  • data_db_limit - Character length limit of a pages JSON data object. Per page data will not accumulate after this limit is reached.
    • data_db_limit DEFAULT is 4000
  • data_payload_limit - JSON data updates larger than this are ignored
    • data_payload_limit DEFAULT is 2000
  • data_modifier_limit - Number of server side data modifiers that can be executed in one call
    • data_modifier_limit DEFAULT is 5
  • data_modifier_length_limit - The maximum length of a data modifier string
    • data_modifier_length_limit DEFAULT is 50
[web_page]
data_db_limit = 4000
data_payload_limit = 2000
data_modifier_limit = 5
data_modifier_length_limit = 50

Custom IDs

  • custom_ids_db_limit - Character length limit of a pages JSON custom ids object. Per page custom ids will not accumulate after this limit is reached.
    • custom_ids_db_limit DEFAULT is 4000
  • custom_ids_payload - JSON custom ids updates larger than this are ignored
    • custom_ids_payload DEFAULT is 2000
  • custom_ids_modifier_limit - Number of server side custom ids modifiers that can be executed in one call
    • custom_ids_modifier_limit DEFAULT is 5
  • custom_ids_modifier_length_limit - The maximum length of a custom ids modifier string
    • custom_ids_modifier_length_limit DEFAULT is 50
[web_page]
custom_ids_db_limit = 4000
custom_ids_payload_limit = 2000
custom_ids_modifier_limit = 5
custom_ids_modifier_length_limit = 50

View Configuration

Configuration associated with page view data. Working example is below:

[web_view]
data_payload_limit = 2000 #DEFAULT is 2000
data_modifier_limit = 5 #DEFAULT is 5
data_modifier_length_limit = 50 #DEFAULT is 50

Data

  • data_payload_limit - JSON data updates larger than this are ignored
    • data_payload_limit DEFAULT is 2000
  • data_modifier_limit - Number of server side data modifiers that can be executed in one call
    • data_modifier_limit DEFAULT is 5
  • data_modifier_length_limit - The maximum length of a data modifier string
    • data_modifier_length_limit DEFAULT is 50

Event Configuration

Configuration associated with page event data. Working example is below:

[web_event]
data_payload_limit = 2000 #DEFAULT is 2000
data_modifier_limit = 5 #DEFAULT is 5
data_modifier_length_limit = 50 #DEFAULT is 50

Data

  • data_payload_limit - JSON data updates larger than this are ignored
    • data_payload_limit DEFAULT is 2000
  • data_modifier_limit - Number of server side data modifiers that can be executed in one call
    • data_modifier_limit DEFAULT is 5
  • data_modifier_length_limit - The maximum length of a data modifier string
    • data_modifier_length_limit DEFAULT is 50

Impression Configuration

Configuration associated with impression data. Working example is below:

[web_event]
data_payload_limit = 2000 #DEFAULT is 2000
data_modifier_limit = 5 #DEFAULT is 5
data_modifier_length_limit = 50 #DEFAULT is 50

Data

  • data_payload_limit - JSON data updates larger than this are ignored
    • data_payload_limit DEFAULT is 2000
  • data_modifier_limit - Number of server side data modifiers that can be executed in one call
    • data_modifier_limit DEFAULT is 5
  • data_modifier_length_limit - The maximum length of a data modifier string
    • data_modifier_length_limit DEFAULT is 50

Collector

The Collector is the code that runs in the user’s device to collect data.

Namespace

Each collector can have its own unique namespace. Renaming this configuration allows for multiple HHA servers to collect data separately without collision.

[web_collector]
namespace = "HHAQ"
# namespace - JavaScript namespace to replace window.HHAQ with window.YOURNAMESAPCE. Defaults to "HHAQ" when not set.

Privacy

Do Not Track

Turning on Do Not Track support allows users to opt out of all tracking and analytics via the settings in their associated browsers.

[web_privacy]
dnt_support = true # Do Not Track header support. When this is true all data collection stops when the header is detected.