(solved)[Error] Could not utilize other features to build model for estimation

Dear organizers, I am trying to incorporate other features to the model, for example GENDER (I guess the Datatype is "Categorical" because you didn't provide me the information to distinguish the Datatype). When I try to estimate it by pushing "RUN G-Formula estimation" button, I received the following messages: 1. if I select GENDER as Time-Varying Covariates: ``` Estimation Error /app/venv/lib/python3.12/site-packages/pygformula/parametric_gformula/histories.py:86: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy pool[lagged_cov] = np.where(pool[time_name] >= lagged_nums[i], /app/venv/lib/python3.12/site-packages/pygformula/parametric_gformula/histories.py:78: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy pool[lagged_cov] = np.where(pool[time_name] >= lagged_nums[i], /app/venv/lib/python3.12/site-packages/pygformula/parametric_gformula/histories.py:80: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy pool[lagged_cov] = pd.Categorical(pool[lagged_cov]) Error during g-formula execution: p < 0, p > 1 or p is NaN ``` 1. if I select GENDER as Baseline Covariates: ``` Estimation Error Error during g-formula execution: The baseline covariate for each individual should be the same value at all time steps. ``` Could you please help me solve the issue? Sincerely yours, Tsai-Min

Created by TSAI-MIN (在民) CHEN (陳) chentsaimin
Hi Tsai-Min and Haoze, ETHNICITY, RACE, RACE_SPECIFY will have similar types of missingness as GENDER. While testing the dataset, we saw most models fail to converge when including these variables and therefore we removed them from the data dictionary. We can add them back, but there will be model convergence issues. Fixed error with time-varying variables. The R and Python gformula packages handle model formulas differently. The Python implementation requires at least one covariate in each time-varying covariate model, therefore exogenous nodes (variables with no parent variables) with a formula 'FEATURE ~ 1' causes Python implementation to fail, whereas the R version runs. Solution: Each time-varying covariate model will now includes a lagged version of itself (for example, STEROIDS ~ lag1_STEROIDS) to meet Python's requirements and prevent this error. Please try your model submissions after 9AM Pacific Time 6/2/2025. Thank you for reporting this issue.
Dear organizers, I encountered the same problem as @chentsaimin. Even if I have set all variables to be time-varying, I still receive the error message ``` Error during g-formula execution: The baseline covariate for each individual should be the same value at all time steps. ``` Could you please help me with this issue? Best, Haoze
Dear organizers, Thanks for your clarification. I have another issue with other variables: ETHNICITY, RACE, RACE_SPECIFY and so on. I could not find them in the "Feature" of "Data Layering" of "Data" from the left panel. Could you please help me find them? Yours, Tsai-Min
Hello, Both errors describe a missingness issue with the GENDER variable. There are individuals who have a missing value for the first time-point and a non-missing value at a subsequent time-point. In keeping with the principle that we should 'not use information from the future', we did not backfill missing information from future timepoints. Backfilling information for variables that are likely to be relatively stable can be a reasonable feature engineering choice, however we did not pursue this choice. One of the strengths of structural causal models is the ability to choose a dataset-specific adjustment set. The solution to this issue is to 1) build a structural causal model that captures GENDER's important downstream causal mechanisms prior to the intervention and outcome and 2) include nodes associated with dataset features that block a) biasing paths and b) converge.

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

(solved)[Error] Could not utilize other features to build model for estimation page is loading…