Correlation and Regression Analysis

Linear regression and correlation analysis are key methods for deriving insights from data. They are used for predictions and forecasting but can be challenging to understand and differentiate.

Some of the typical applications of C&R are identification activation moment, cohort retention analysis, onboarding, etc.

In this post, I will explain the basics of both methods, highlight their primary use cases, and provide steps and tools for conducting your own analysis.

Correlation analysis

Correlation analysis illustrates the strength of the relationship between two variables, indicating how closely they are correlated. The result always falls between -1.0 and 1.0. A value of 1.0 represents the highest positive correlation, -1.0 represents the highest negative correlation (explained below), and 0 indicates no correlation:

Correlation doesn't mean causation, but it's the easiest and quickest way to find which features or metrics are most related to high or low user engagement.

In my work, I like to start by analyzing the correlation between different metrics as soon as I begin working with the data in order to find any connections.

For example, at Beaconstac, I used correlation analysis to understand which user actions/behaviors drive higher sign-ups from the free QR Code generator widget on the marketing website. The widget gives the user a taste of the product and acts as a very important traffic magnet for us via SEO.

Examples of great use cases for doing correlation analysis:

Is there any relationship between the two features or two user activities (e.g., booking a hotel and purchasing a concert ticket, creating a document template, and sending a message to a collaborator)?
Do they increase and decrease together (e.g., does an increase in comments per post correlate with an increase in post shares? Is the decline in activated trials related to the price change)?
Are they dependent or independent (e.g., the day’s weather and your app usage, activations by day of the week, or market recession and your subscription churn)?

“Positive” and “Inverse” correlation

Positive correlation—an increase in X is correlated with an increase in Y:

An increase in installs relates to an increase in signups
An increase in notifications relates to an increase in daily active users

Inverse correlation (or inverse correlation)—an increase in X is correlated with a decrease in Y: