Z score normalizing r dataframe consecutively


Z score normalizing r dataframe consecutively



I would like to normalize an R data.frame by computing the z-score using the function scale().


data.frame


scale()



However, I am not sure whether this approach is subject to "look-ahead bias", which is a finance term for making up features that would not have been known or available during the period being analyzed.



These are stock returns, and I want to use this data for a "backtest" (a finance term for validation). I want to make sure that each period's z-score is only using data available up to that point and not the entire series mean and std when computing the z-score.



Does anyone know how to perform the calculation for this? Or is there a different approach?





Could you provide a sample of your data using reprex::reprex() or dput(). It sounds like you don't want want to standardize columns all at once using all the data, but rather standardize them in periods or chunks. Is this correct?
– MHammer
Jul 2 at 3:04



reprex::reprex()


dput()





That is correct, I will create a min example and add it to the question
– Niccola Tartaglia
Jul 2 at 3:45




1 Answer
1



You can normalize data or create new features using normalization without worrying about "look-ahead" bias. It's very common.



You just don't use any data to do so that would not be available in the period being analyzed.



Much like with target encoding or other feature engineering techniques you simply create those features on a training subset of your historical data, then validate it on a validation split. You may also consider KFold cross-validation.



If you'd like to augment your question with a reproducible example I can show you.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Rothschild family

Cinema of Italy