Skip to content
Burak Emre Kabakcı edited this page Dec 3, 2016 · 7 revisions

Setup Rakam

We use a configuration file config.properties file that specifies the backend storage and the modules that will be used. You can either use a custom configuration file or use pre-defined configurations for most common use cases. We provide DockerFiles, have integration with cloud services such as Heroku, AWS and Digitalocean but you can also download Rakam and install it on your on-premise servers. Here is the guide.

Collecting Events

Rakam provides client libraries for a few programming languages for collecting events easily. Also you can send events to Rakam in JSON format to an endpoint using a RESTFul Web Service that Rakam provides.

You can find the integrations for collecting events from here and the API documentation for RESTFul API here.

You can think of an event as the row in a table. Similar to the tables in RDBMSs, we have collections that store events. In order to use Rakam, you need to create a project with a unique name and send events to that project. Collections also have schemas but we don't enforce you do define the schema before collecting events. We can generate the schema in runtime automatically so that you don't have to do it manually.

We have aimed to make the schema evolution easy since the analytical systems almost always need it and it can be hard to do handle the schema evolution manually at runtime. As you start to send your events to Rakam, it will create the schemas based on the value types of the fields that you send as event properties. Let's say you sent the following event to the Rakam:

{
    "collection": "pageview",
    "properties": {
        "user_agent": "Firefox",
        "locale": "TR-tr",
        "url": "http://mysite.com/blog-post",
        "referrer": "http://google.com/?q=term",
        "ip": "186.45.356.33",
        "platform": "web",
        "page_duration": 5,
        "session_id": "c88d7bad-af01-433f-bef1-dc107cee4334"
        "_user": "fdsy8fy7d"
        "_time": 1480759056906
    }
}

Rakam will generate the schema for this event collection based on the values that you sent. Then, creates the collection in backend storage. For example, Postgresql backend will execute a CREATE TABLE query similar to this one:

CREATE TABLE pageView (
    user_agent     TEXT,
    locale         TEXT,
    url            TEXT,
    referrer       TEXT,
    ip             TEXT,
    platform       TEXT,
    page_duration  BIGINT,
    session_id     TEXT,
    _user           TEXT,
    _time           TIMESTAMP
);

Rakam will check the fields and if they exists and values match the existing schema, the event will be sent to the storage backend as you sent. However; let's say the field ip is created previously and the type is integer. Since string can't cast to integer, the field ip will be ignored.

The schema evolution feature may cause a security problem if you don't have the control on the client side that sends the events to Rakam. For example, if you install the Javascript tracker on website that sends the events directly from the users' browsers to Rakam, an attacker may send randomly generated fields to Rakam so you may end up event collections that have 100s of fields. Therefore we suggest disabling dynamic schemas using disable_dynamic_schema=true once you created the schema of your event collections or disable this feature completely on production. You may always use your master_key in order to have flexible schema but the clients should not be able the change the schema.

Enrichment & Sanitization

When events data is parsed and built, the second step is mapping events with Event Mappers. Since the data will be stored in denormalized format, you may pre-process the events before storing in backend storages. This pre-processing step may be attaching values based on existing ones or deleting the existing field values. One of the common use cases of event mappers is the geolocation data. For example, it's possible to get IPs of users in browsers but not always possible to get the geolocation data (country, city, latitude, altitude etc.) via Javascript running on browsers. You need to attach geolocation data using a database that resolves IPs and returns geolocation data. Rakam provides a set of event mappers like ip-to-geolocation, current time and referrer url resolution. You can enable these mappers using the configuration file. See: (configuration file) It's also quite easy to add new event mappers depending on your needs. See: (Event Enrichment)

Event Storage

Now the event is ready for storing. The last step for the events that you sent to Rakam is sending them to the storage implementation that you configured via configuration file. (config.properties). Depending on the implementation, you will be able to querying the event via SQL query language data-set in a short period. (See: postgresql event storage Presto event storage).

Analyzing Events

You're ready to analyze your events. You may either use Analysis API or user interface of Rakam (docs) for creating query tables from SQL queries. The methods you can use are explained in analyze data section.