Categorical types
The main point of the categorical data type is to define an acceptable set of domain values that your pd.Series can contain. The CSV - strategies for reading large files recipe in Chapter 4, The pandas I/O System, will show you an example where this can result in significant memory savings, but generally, the use case here is to have pandas convert string values like foo, bar, and baz into codes 0, 1, and 2, respectively, which can be much more efficiently stored.
How to do it
So far, we have always opted for pd.XXDtype() as the dtype= argument, which still could work in the case of categorical data types, but unfortunately does not handle missing values consistently (see There’s more… for a deeper dive into this). Instead, we have to opt for one of two alternative approaches to creating a pd.CategoricalDtype with the pd.NA missing value indicator.
With either approach, you will want to start with a pd.Series of data using pd.StringDtype...