Skip to content Skip to sidebar Skip to footer

How To Prepare Large Datasets With Patsy's Api?

I'm running a logistic regression and having trouble using Patsy's API to prepare the data when it is bigger than a small sample. Using the dmatrices function directly on a DataFra

Solution 1:

y and dta are DesignInfo objects -- they encode all the information needed to take a row of a data frame and convert it to a row of a design matrix. They do not, though, have your actual data in them -- to get a piece of your design matrix, you have to give them a piece of your data. To use them, you need to do something like

for data_chunk in iter_maker():
  y_chunk, design_chunk = dmatrices((y, dta), data_chunk,
                                    NA_action="drop", return_type="dataframe")
  # do something with y_chunk and design_chunk
  # ...

Post a Comment for "How To Prepare Large Datasets With Patsy's Api?"