-
Notifications
You must be signed in to change notification settings - Fork 5
Issues with Ratio metrics analysis #231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
thanks tatiana! if you can build a reproducible example with simulated data and the AnaalysisPlan definitions and executions included in a py file, it'd be great! |
btw have you tried DeltaMethod for Ratio metrics? |
@LGonzalezGomez you've worked on this, thoughts? |
@david26694 I can't attach a py file here, but here is the code to reproduce with simulated data:
|
@david26694 @tbikeeva I will check this! |
@tbikeeva I think there's a bug with ratio metrics, I'd avoid using them for now (essentially: the analysis is run with the numerator and not the denominator, except for the treatment and control means, that use both. I can check with @ludovico-lanni when he comes back). For the time being, I see two options:
|
hey @tbikeeva I've had a look: #%%
import numpy as np
import pandas as pd
from cluster_experiments import (
AnalysisPlan,
RatioMetric,
SimpleMetric,
Variant,
)
from cluster_experiments.inference.analysis_results import AnalysisPlanResults
#%%
# Constants
NUM_ORDERS = 10_000
NUM_CUSTOMERS = 3_000
EXPERIMENT_GROUPS = ['control', 'treatment_1']
GROUP_SIZE = NUM_CUSTOMERS // len(EXPERIMENT_GROUPS)
# Generate customer_ids
customer_ids = np.arange(1, NUM_CUSTOMERS + 1)
# Shuffle and split customer_ids into experiment groups
np.random.shuffle(customer_ids)
experiment_group = np.repeat(EXPERIMENT_GROUPS, GROUP_SIZE)
experiment_group = np.concatenate(
(experiment_group, np.random.choice(EXPERIMENT_GROUPS, NUM_CUSTOMERS - len(experiment_group))))
# Assign customers to groups
customer_group_mapping = dict(zip(customer_ids, experiment_group))
# Generate orders
order_ids = np.arange(1, NUM_ORDERS + 1)
customers = np.random.choice(customer_ids, NUM_ORDERS)
order_values = np.abs(np.random.normal(loc=10, scale=2, size=NUM_ORDERS)) # Normally distributed around 10 and positive
# Create DataFrame
data = {
'order_id': order_ids,
'customer_id': customers,
'experiment_group': [customer_group_mapping[customer_id] for customer_id in customers],
'order_value': order_values
}
df = pd.DataFrame(data)
df = df.groupby(['customer_id', 'experiment_group']).agg({
'order_value': 'sum',
'order_id': 'count'
}
).rename(columns={'order_id': 'total_orders', 'order_value': 'total_order_value'}).reset_index()
#%%
df['AOV'] = df['total_order_value'] / df['total_orders']
df.head(3)
#%%
metric__total_order_value = SimpleMetric(
alias='TOTAL ORDER VALUE',
name='total_order_value'
)
metric__total_orders = SimpleMetric(
alias='TOTAL ORDERS',
name='total_orders'
)
metric__aov_ratio = RatioMetric(
alias='AOV RATIO METRIC',
numerator_name='total_order_value',
denominator_name='total_orders'
)
metric__aov_simple = SimpleMetric(
alias='AOV SIMPLE METRIC',
name='AOV'
)
variants = [
Variant('control', is_control=True),
Variant('treatment_1', is_control=False),
]
#%%
analysis_plan_config = [
(metric__total_orders, 'ols', df),
(metric__total_order_value, 'ols', df),
(metric__aov_ratio, 'delta', df), # this is correct
(metric__aov_ratio, 'ols', df), # this has a bug
(metric__aov_simple, 'ols', df), # while there is no bug, the estimand is weird, since we're doing average of averages
]
scorecards = AnalysisPlanResults()
for metric, analysis_type, df in analysis_plan_config:
analysis_config = {"cluster_cols": ["customer_id"], "scale_col": "total_orders"} if analysis_type == 'delta' else None
analysis_plan = AnalysisPlan.from_metrics(
metrics=[metric],
variants=variants,
variant_col='experiment_group',
alpha=0.05,
analysis_type=analysis_type,
analysis_config=analysis_config,
)
results = analysis_plan.analyze(exp_data=df, verbose=True)
scorecards += results
#%% regex to filter columns that don't contain 'dimension'
print(scorecards.to_dataframe().drop(columns=["control_variant_name", "treatment_variant_name", "alpha"]).filter(regex="^(?!.*dimension).*$", axis=1))
# %% Please run this and let me know what you think. Bottom line is that we should not use OLS with Ratio Metrics, the only thing working with ratio metrics is delta method, will do a release of the library where this fails. |
@david26694 Looks good! I modified a bit to have scale_col the same as denominator_name from the metric (not necessarily total_orders) – and it worked, thank you! |
Hi team! I think I spotted a bug: when using Ration metrics for experiment analysis, it looks like it's not taking into account denominator values when computing ATE and p-value for the scorecard. See screen below:
This is how the metrics were defined:
And this is how they were analyzed:
When calculating this metric on customer level and then running a simple metric test, I get very different results:
Let me know if you need any additional data or queries, thank you!
The text was updated successfully, but these errors were encountered: