Preprocess returns
Wash data for portfolio analyze.
fill_date(strategy_id=None, data=None, need_start=pd.Timestamp('2015-01-01', tz='UTC'), need_end=pd.Timestamp.now(tz='UTC'), time_column='ts', netvalue_column='net_value')
¶
Fills missing dates in a pandas DataFrame with specified values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
strategy_id |
int
|
The ID of the strategy. |
None
|
data |
pandas.DataFrame
|
The DataFrame to fill missing dates in. |
None
|
need_start |
pandas.Timestamp
|
The start date to fill missing dates from. |
pd.Timestamp('2015-01-01', tz='UTC')
|
need_end |
pandas.Timestamp
|
The end date to fill missing dates to. |
pd.Timestamp.now(tz='UTC')
|
time_column |
str
|
The name of the column containing the timestamps. |
'ts'
|
netvalue_column |
str
|
The name of the column containing the net values. |
'net_value'
|
Returns:
Type | Description |
---|---|
pandas.DataFrame: The DataFrame with missing dates filled in. |
Source code in onequant/data_wash/preprocess_returns.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
|
filter_returns_by_corr(corr, cutoff=0.9, exact=None)
¶
This function is the Python implementation of the R function findCorrelation()
.
Relies on numpy and pandas, so must have them pre-installed.
It searches through a correlation matrix and returns a list of column names
to remove to reduce pairwise correlations.
For the documentation of the R function, see
https://www.rdocumentation.org/packages/caret/topics/findCorrelation
and for the source code of findCorrelation()
, see
https://github.com/topepo/caret/blob/master/pkg/caret/R/findCorrelation.R
pandas dataframe.
A correlation matrix as a pandas dataframe.
float, default: 0.9.
A numeric value for the pairwise absolute correlation cutoff
bool, default: None
A boolean value that determines whether the average correlations be recomputed at each step
list of column names¶
Example:¶
R1 = pd.DataFrame({ 'x1': [1.0, 0.86, 0.56, 0.32, 0.85], 'x2': [0.86, 1.0, 0.01, 0.74, 0.32], 'x3': [0.56, 0.01, 1.0, 0.65, 0.91], 'x4': [0.32, 0.74, 0.65, 1.0, 0.36], 'x5': [0.85, 0.32, 0.91, 0.36, 1.0] }, index=['x1', 'x2', 'x3', 'x4', 'x5']) findCorrelation(R1, cutoff=0.6, exact=False) # ['x4', 'x5', 'x1', 'x3'] findCorrelation(R1, cutoff=0.6, exact=True) # ['x1', 'x5', 'x4']
Source code in onequant/data_wash/preprocess_returns.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
|
filter_returns_by_weights(returns, weights, min_weights=0.005)
¶
This function filters the returns by weights.
pandas dataframe.
Type | Description |
---|---|
A dataframe containing the returns. |
pandas dataframe.
A dataframe containing the weights.
float, default: 0.005.
The minimum weight.
pandas dataframe.
A dataframe containing the cumulative return.
Source code in onequant/data_wash/preprocess_returns.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|