Download here (Apache Parquet file; as of 25.01.23) | The data is preliminary and may change as we update the paper/model!
We currently provide the recovered percentiles of missing firm characteristics in file raw_infill.pq. The data has 10 columns:
date: in format YYYY-MM-01
id: in format crsp_PERMNO
char: name of the characteristic, following the convention in Jensen, Kelly, Pedersen (2021)
perc: recovered percentile of the missing entry
lower: lower raw value of the recovered percentile
upper: upper raw value of the recovered percentile
mid: mean between lower and upper as an estimate for the raw value of the recovered characteristic
mean: mean of the raw observed entries for other firms within the recovered percentile
median: median of the raw observed entries for other firms within the recovered percentile
missingness [%]: how often the target characteristic is missing per month (date) across all firms
NOTE: We only provide information about the recovered missing entries.
We now also provide the estimated probability distributions across the percentiles of a target characteristic for each missing entry in the raw dataset. Since these files are quite large, we have broken them up by decade:
Link to folder (Apache Parquet files; as of 25.01.23)
CAUTION: the files are large (in total 15GB).