
Pandas is a powerful and popular data manipulation library in Python. It provides various functionalities to work with structured data, and indexes are one of the essential features of pandas. Indexes play a vital role in data manipulation, selection, and analysis in Pandas. The index is a label that uniquely identifies each row or observation in a Pandas DataFrame or Series. In this tutorial, we will discuss how to set, reset, and use indexes in Pandas. We will cover different scenarios where indexes are useful and how to perform various operations related to indexes in Pandas. By the end of this tutorial, you will have a good understanding of how to work with indexes in Pandas and how to use them for efficient data manipulation and analysis.
- How to Create a Pandas DataFrame with a Default Index
- How to Set a Column as the Index in Pandas DataFrame
- How to Reset the Index of a Pandas DataFrame
- How to Set Multiple Columns as Index in Pandas DataFrame
- How to Change the Name of an Index in Pandas DataFrame
- How to Select Data using Index in Pandas DataFrame
- How to Sort a Pandas DataFrame by Index
- How to Merge DataFrames based on Index in Pandas
- How to Create a Hierarchical Index in Pandas DataFrame
- How to Reshape Data using Index in Pandas DataFrame
How to Create a Pandas DataFrame with a Default Index
In Pandas, a default index is created automatically when we create a DataFrame. The default index is a sequence of integers starting from 0 to n-1, where n is the number of rows in the DataFrame. However, we can also specify a custom index while creating a DataFrame.
To create a Pandas DataFrame with a default index, we can use the Pandas DataFrame()
function. Here is an example:
import pandas as pd
# Creating a DataFrame
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
'Age': [30, 25, 35],
'Gender': ['Male', 'Female', 'Male']})
# Printing the DataFrame
print(df)
Output:
Name Age Gender
0 John 30 Male
1 Alice 25 Female
2 Bob 35 Male
In the above example, we have created a DataFrame with three columns – ‘Name’, ‘Age’, and ‘Gender’. Since we did not specify any index, Pandas has created a default index starting from 0 to n-1, where n is the number of rows in the DataFrame. We can access the rows using these default indexes.
How to Set a Column as the Index in Pandas DataFrame
In Pandas, we can set a column as the index of a DataFrame using the set_index()
method. By setting a column as the index, we can access the rows using the values of that column instead of the default integer index.
Here is an example of how to set a column as the index in a Pandas DataFrame:
import pandas as pd
# Creating a DataFrame
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
'Age': [30, 25, 35],
'Gender': ['Male', 'Female', 'Male']})
# Setting the 'Name' column as the index
df.set_index('Name', inplace=True)
# Printing the DataFrame
print(df)
Output:
Age Gender
Name
John 30 Male
Alice 25 Female
Bob 35 Male
In the above example, we have set the ‘Name’ column as the index of the DataFrame using the set_index()
method. We have also used the inplace=True
parameter to modify the original DataFrame instead of creating a new one. Now we can access the rows of the DataFrame using the values of the ‘Name’ column.
How to Reset the Index of a Pandas DataFrame
In Pandas, we can reset the index of a DataFrame using the reset_index()
method. The reset_index()
method removes the current index and resets it to the default integer index starting from 0 to n-1, where n is the number of rows in the DataFrame.
Here is an example of how to reset the index of a Pandas DataFrame:
import pandas as pd
# Creating a DataFrame with a custom index
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
'Age': [30, 25, 35],
'Gender': ['Male', 'Female', 'Male']},
index=['a', 'b', 'c'])
# Resetting the index
df.reset_index(inplace=True)
# Printing the DataFrame
print(df)
Output:
index Name Age Gender
0 a John 30 Male
1 b Alice 25 Female
2 c Bob 35 Male
In the above example, we have created a DataFrame with a custom index using the index
parameter. We have then reset the index to the default integer index using the reset_index()
method. We have also used the inplace=True
parameter to modify the original DataFrame instead of creating a new one.
Note: The old index becomes a new column in the DataFrame after resetting the index. If we don’t want to keep the old index as a column, we can use the drop=True
parameter with the reset_index()
method.
How to Set Multiple Columns as Index in Pandas DataFrame
In Pandas, we can set multiple columns as the index of a DataFrame using the set_index()
method. By setting multiple columns as the index, we can access the rows using the combination of values of those columns instead of the default integer index.
Here is an example of how to set multiple columns as the index in a Pandas DataFrame:
import pandas as pd
# Creating a DataFrame
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
'Age': [30, 25, 35],
'Gender': ['Male', 'Female', 'Male'],
'Country': ['USA', 'UK', 'USA']})
# Setting the 'Country' and 'Gender' columns as the index
df.set_index(['Country', 'Gender'], inplace=True)
# Printing the DataFrame
print(df)
Output:
Name Age
Country Gender
USA Male John 30
UK Female Alice 25
USA Male Bob 35
In the above example, we have set the ‘Country’ and ‘Gender’ columns as the index of the DataFrame using the set_index()
method. We have passed a list of column names to the set_index()
method to set multiple columns as the index. Now we can access the rows of the DataFrame using the combination of values of ‘Country’ and ‘Gender’ columns.
Note: We can also set multiple columns as the index in a hierarchical way by passing a list of lists to the set_index()
method. The first list contains the names of the columns for the outer level index, and the second list contains the names of the columns for the inner level index.
How to Change the Name of an Index in Pandas DataFrame
In Pandas, we can change the name of an index using the rename_axis()
method. The rename_axis()
method is used to set or change the name of the index or column labels.
Here is an example of how to change the name of an index in a Pandas DataFrame:
import pandas as pd
# Creating a DataFrame with a custom index
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
'Age': [30, 25, 35],
'Gender': ['Male', 'Female', 'Male']},
index=['a', 'b', 'c'])
# Changing the name of the index
df.rename_axis('IndexLabel', inplace=True)
# Printing the DataFrame
print(df)
Output:
Name Age Gender
IndexLabel
a John 30 Male
b Alice 25 Female
c Bob 35 Male
In the above example, we have created a DataFrame with a custom index using the index
parameter. We have then changed the name of the index to ‘IndexLabel’ using the rename_axis()
method. We have also used the inplace=True
parameter to modify the original DataFrame instead of creating a new one.
Note: We can also change the name of the index by directly assigning a new name to the df.index.name
attribute.
How to Select Data using Index in Pandas DataFrame
In Pandas, we can select data using the index of a DataFrame using the loc[]
method. The loc[]
method is used to select rows and columns by labels or a boolean array.
Here is an example of how to select data using the index of a Pandas DataFrame:
import pandas as pd
# Creating a DataFrame with a custom index
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
'Age': [30, 25, 35],
'Gender': ['Male', 'Female', 'Male']},
index=['a', 'b', 'c'])
# Selecting a single row using the index label
print(df.loc['a'])
# Selecting multiple rows using a list of index labels
print(df.loc[['a', 'c']])
# Selecting rows and columns using the index label and column label
print(df.loc['a', 'Age'])
# Selecting a subset of rows and columns using the index label and column label
print(df.loc[['a', 'c'], ['Name', 'Age']])
Output:
Name John
Age 30
Gender Male
Name: a, dtype: object
Name Age Gender
a John 30 Male
c Bob 35 Male
30
Name Age
a John 30
c Bob 35
In the above example, we have selected data using the index of a Pandas DataFrame. We have used the loc[]
method to select a single row using the index label ‘a’, multiple rows using a list of index labels ‘a’ and ‘c’, a single value using the index label ‘a’ and column label ‘Age’, and a subset of rows and columns using a list of index labels and column labels.
Note: We can also select data using the integer position of the index using the iloc[]
method instead of loc[]
.
How to Sort a Pandas DataFrame by Index
A DataFrame can be sorted by its index using the sort_index()
method. The sort_index()
method sorts the DataFrame by its index in ascending or descending order.
Here is an example of how to sort a Pandas DataFrame by its index:
import pandas as pd
# Creating a DataFrame with a custom index
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
'Age': [30, 25, 35],
'Gender': ['Male', 'Female', 'Male']},
index=['c', 'a', 'b'])
# Sorting the DataFrame by index in ascending order
df.sort_index(inplace=True)
# Printing the DataFrame
print(df)
Output:
Name Age Gender
a Alice 25 Female
b Bob 35 Male
c John 30 Male
In the above example, we have sorted the DataFrame by its index using the sort_index()
method. We have used the inplace=True
parameter to modify the original DataFrame instead of creating a new one. Now the DataFrame is sorted by its index in ascending order.
Note: We can also sort the DataFrame by index in descending order by passing the ascending=False
parameter to the sort_index()
method.
How to Merge DataFrames based on Index in Pandas
Two DataFrames can be merged based on their indexes using the merge()
method with the left_index=True
and right_index=True
parameters. The merge()
method combines two DataFrames into one based on the values of the index column(s) in both DataFrames.
Here is an example of how to merge two DataFrames based on their indexes:
import pandas as pd
# Creating the first DataFrame with a custom index
df1 = pd.DataFrame({'Age': [30, 25, 35],
'Gender': ['Male', 'Female', 'Male']},
index=['a', 'b', 'c'])
# Creating the second DataFrame with a custom index
df2 = pd.DataFrame({'Country': ['USA', 'UK', 'USA'],
'Salary': [50000, 60000, 55000]},
index=['a', 'b', 'c'])
# Merging the two DataFrames based on their indexes
df_merged = pd.merge(df1, df2, left_index=True, right_index=True)
# Printing the merged DataFrame
print(df_merged)
Output:
Age Gender Country Salary
a 30 Male USA 50000
b 25 Female UK 60000
c 35 Male USA 55000
In the above example, we have merged two DataFrames based on their indexes using the merge()
method. We have passed the left_index=True
and right_index=True
parameters to merge the DataFrames based on their indexes. Now the two DataFrames are merged into one based on their index values.
Note: We can also merge two DataFrames based on their index and column values by passing the index column name and column names to the left_on
and right_on
parameters of the merge()
method.
How to Create a Hierarchical Index in Pandas DataFrame
In Pandas, we can create a hierarchical index in a DataFrame using the set_index()
method with a list of column names. A hierarchical index, also known as a multi-index, is a way of indexing data using more than one column. It is useful when we have multiple levels of grouping in our data.
Here is an example of how to create a hierarchical index in a Pandas DataFrame:
import pandas as pd
# Creating a DataFrame
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob', 'Sara', 'David', 'Mary'],
'Age': [30, 25, 35, 40, 20, 28],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
'City': ['NY', 'NY', 'LA', 'LA', 'NY', 'LA']})
# Creating a hierarchical index using the 'City' and 'Gender' columns
df.set_index(['City', 'Gender'], inplace=True)
# Printing the DataFrame
print(df)
Output:
Name Age
City Gender
NY Male John 30
Female Alice 25
LA Male Bob 35
Female Sara 40
NY Male David 20
LA Female Mary 28
In the above example, we have created a DataFrame with four columns – ‘Name’, ‘Age’, ‘Gender’, and ‘City’. We have then created a hierarchical index using the set_index()
method with the ‘City’ and ‘Gender’ columns. Now the DataFrame is indexed by two columns, ‘City’ and ‘Gender’, forming a hierarchical index.
Note: We can create a multi-level hierarchical index by passing a list of column names to the set_index()
method. The first column name in the list becomes the outer level of the index, and the last column name becomes the innermost level of the index.
How to Reshape Data using Index in Pandas DataFrame
In Pandas, we can reshape data using index in a DataFrame using the stack()
and unstack()
methods. The stack()
method is used to convert a DataFrame from a wide format to a long format, and the unstack()
method is used to convert a DataFrame from a long format to a wide format.
Here is an example of how to reshape data using index in a Pandas DataFrame:
import pandas as pd
# Creating a DataFrame with a custom index
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
'Age': [30, 25, 35],
'Gender': ['Male', 'Female', 'Male']},
index=['a', 'b', 'c'])
# Reshaping the DataFrame from wide to long format using stack()
df_stacked = df.stack()
# Printing the stacked DataFrame
print(df_stacked)
# Reshaping the DataFrame from long to wide format using unstack()
df_unstacked = df_stacked.unstack()
# Printing the unstacked DataFrame
print(df_unstacked)
Output:
a Name John
Age 30
Gender Male
b Name Alice
Age 25
Gender Female
c Name Bob
Age 35
Gender Male
dtype: object
Name Age Gender
a John 30 Male
b Alice 25 Female
c Bob 35 Male
In the above example, we have reshaped the DataFrame using index. First, we have converted the DataFrame from a wide format to a long format using the stack()
method. The stack()
method has stacked the columns of the DataFrame to create a multi-level index. Now the DataFrame has become longer and narrower.
Then, we have converted the DataFrame from a long format to a wide format using the unstack()
method. The unstack()
method has unstacked the multi-level index to create a wide format DataFrame. Now the DataFrame has become wider and shorter.
- Python Pandas Indexes – How to Set, Reset, and Use Indexes (vegibit.com)
- python – How to reset index in a pandas dataframe? (stackoverflow.com)
- pandas.DataFrame.reset_index — pandas 2.0.0 (pandas.pydata.org)
- Python Pandas Tutorial (Part 3): Indexes – How to Set, (www.youtube.com)
- Reset Index in Pandas Dataframe – GeeksforGeeks (www.geeksforgeeks.org)
- Pandas Reset Index: How to Reset a Pandas Index • (datagy.io)
- Pandas DataFrame set_index() Method – W3School (www.w3schools.com)
- How to Use Pandas Reset Index – Sharp Sight (www.sharpsightlabs.com)
- Reset index in pandas DataFrame – PYnative (pynative.com)
- Pandas set index method explained with examples – GoLinuxCloud (www.golinuxcloud.com)
- [Solved] Index must be called with a collection of some | 9to5Answer (9to5answer.com)
- 8 things you should know when dealing with pandas (towardsdatascience.com)
- Pandas reset index – Machine Learning Plus (www.machinelearningplus.com)
- pandas reset_index() – Rest Index on DataFrame – Spark by (sparkbyexamples.com)