Introduction To Pandas Data Structures
Here are some quick notes on two of Pandas data structures: Series and Data Frame.
A series is a one-dimensional container containing an array of data and its corresponding labels (its index).
There are multiple different ways to create or initialize a Pandas series:
From an existing python list or numpy array
s = Series([1,2,3,4,5,6,7])
s1 = Series(np.array([1,2,3,4,5,6,7]))
If you do not provide an index, the Series will default index is provide consisting of integers 0 to N-1 where N is the length of your data
In : s.index Out: Int64Index([0, 1, 2, 3, 4, 5, 6], dtype='int64')
You can also initialize a Series with a dict (you could even think of a Series as an ordered-dict):
Finally, you can initialize a series by explicitly defining the data and the index:
A series index can be altered in place also:
Pandas Data Frames are like matrices where each column can be of a different type. As it can be thought of as an dictionary of Series, you can initailize a Data Frame using a dictionary of python lists:
Again, you will see that if you do not provide an index, it defaults to the same standard integer index as the Series data structure does. You can specify the index in the Data Frame constructor.
You can access a single column or multiple columns using the following format
Column names can be changed in place, or specified using the
column attribute in the Data Frame constructor:
Rows in a Data Frame can be accessed a number of different ways:
Adding a new column is pretty easy also. For example, if we want to create a new column of a single value
3.14, you can do that like this
Or, you can create a new column using a list/np.array as long as you keep the dimensionality correct:
And check this out, can you transpose a Data Frame by using the
You can also easily export the Data Frame to json:
Re-indexing is a pretty useful feature than can be applied to both Series and Data Frames. Reindexing creates a new object which conforms to a new index: