PyArrow List types
Life would be so simple if every bit of data you came across fit nicely and squarely in a single location of pd.DataFrame, but inevitably you will run into issues where that is not the case. For a second, let’s imagine trying to analyze the employees that work at a company:
df = pd.DataFrame({
"name": ["Alice", "Bob", "Janice", "Jim", "Michael"],
"years_exp": [10, 2, 4, 8, 6],
})
df
name years_exp
0 Alice 10
1 Bob 2
2 Janice 4
3 Jim 8
4 Michael 6
This type of data is pretty easy to work with – you could easily add up or take the average number of years that each employee has of experience. But what if we also wanted to know that Bob and Michael reported to Alice while Janice reported to Jim?
Our picturesque view of the world has suddenly come crashing down – how could we possibly express this in pd.DataFrame...