Click to share! ⬇️

Pandas is a powerful and popular data manipulation library in Python. It provides various functionalities to work with structured data, and indexes are one of the essential features of pandas. Indexes play a vital role in data manipulation, selection, and analysis in Pandas. The index is a label that uniquely identifies each row or observation in a Pandas DataFrame or Series. In this tutorial, we will discuss how to set, reset, and use indexes in Pandas. We will cover different scenarios where indexes are useful and how to perform various operations related to indexes in Pandas. By the end of this tutorial, you will have a good understanding of how to work with indexes in Pandas and how to use them for efficient data manipulation and analysis.

  1. How to Create a Pandas DataFrame with a Default Index
  2. How to Set a Column as the Index in Pandas DataFrame
  3. How to Reset the Index of a Pandas DataFrame
  4. How to Set Multiple Columns as Index in Pandas DataFrame
  5. How to Change the Name of an Index in Pandas DataFrame
  6. How to Select Data using Index in Pandas DataFrame
  7. How to Sort a Pandas DataFrame by Index
  8. How to Merge DataFrames based on Index in Pandas
  9. How to Create a Hierarchical Index in Pandas DataFrame
  10. How to Reshape Data using Index in Pandas DataFrame

How to Create a Pandas DataFrame with a Default Index

In Pandas, a default index is created automatically when we create a DataFrame. The default index is a sequence of integers starting from 0 to n-1, where n is the number of rows in the DataFrame. However, we can also specify a custom index while creating a DataFrame.

To create a Pandas DataFrame with a default index, we can use the Pandas DataFrame() function. Here is an example:

import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
                   'Age': [30, 25, 35],
                   'Gender': ['Male', 'Female', 'Male']})

# Printing the DataFrame
print(df)

Output:

   Name  Age  Gender
0  John   30    Male
1 Alice   25  Female
2   Bob   35    Male

In the above example, we have created a DataFrame with three columns – ‘Name’, ‘Age’, and ‘Gender’. Since we did not specify any index, Pandas has created a default index starting from 0 to n-1, where n is the number of rows in the DataFrame. We can access the rows using these default indexes.

How to Set a Column as the Index in Pandas DataFrame

In Pandas, we can set a column as the index of a DataFrame using the set_index() method. By setting a column as the index, we can access the rows using the values of that column instead of the default integer index.

Here is an example of how to set a column as the index in a Pandas DataFrame:

import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
                   'Age': [30, 25, 35],
                   'Gender': ['Male', 'Female', 'Male']})

# Setting the 'Name' column as the index
df.set_index('Name', inplace=True)

# Printing the DataFrame
print(df)

Output:

       Age  Gender
Name             
John    30    Male
Alice   25  Female
Bob     35    Male

In the above example, we have set the ‘Name’ column as the index of the DataFrame using the set_index() method. We have also used the inplace=True parameter to modify the original DataFrame instead of creating a new one. Now we can access the rows of the DataFrame using the values of the ‘Name’ column.

How to Reset the Index of a Pandas DataFrame

In Pandas, we can reset the index of a DataFrame using the reset_index() method. The reset_index() method removes the current index and resets it to the default integer index starting from 0 to n-1, where n is the number of rows in the DataFrame.

Here is an example of how to reset the index of a Pandas DataFrame:

import pandas as pd

# Creating a DataFrame with a custom index
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
                   'Age': [30, 25, 35],
                   'Gender': ['Male', 'Female', 'Male']},
                  index=['a', 'b', 'c'])

# Resetting the index
df.reset_index(inplace=True)

# Printing the DataFrame
print(df)

Output:

  index   Name  Age  Gender
0     a   John   30    Male
1     b  Alice   25  Female
2     c    Bob   35    Male

In the above example, we have created a DataFrame with a custom index using the index parameter. We have then reset the index to the default integer index using the reset_index() method. We have also used the inplace=True parameter to modify the original DataFrame instead of creating a new one.

Note: The old index becomes a new column in the DataFrame after resetting the index. If we don’t want to keep the old index as a column, we can use the drop=True parameter with the reset_index() method.

How to Set Multiple Columns as Index in Pandas DataFrame

In Pandas, we can set multiple columns as the index of a DataFrame using the set_index() method. By setting multiple columns as the index, we can access the rows using the combination of values of those columns instead of the default integer index.

Here is an example of how to set multiple columns as the index in a Pandas DataFrame:

import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
                   'Age': [30, 25, 35],
                   'Gender': ['Male', 'Female', 'Male'],
                   'Country': ['USA', 'UK', 'USA']})

# Setting the 'Country' and 'Gender' columns as the index
df.set_index(['Country', 'Gender'], inplace=True)

# Printing the DataFrame
print(df)

Output:

                Name  Age
Country Gender           
USA     Male    John   30
UK      Female  Alice   25
USA     Male      Bob   35

In the above example, we have set the ‘Country’ and ‘Gender’ columns as the index of the DataFrame using the set_index() method. We have passed a list of column names to the set_index() method to set multiple columns as the index. Now we can access the rows of the DataFrame using the combination of values of ‘Country’ and ‘Gender’ columns.

Note: We can also set multiple columns as the index in a hierarchical way by passing a list of lists to the set_index() method. The first list contains the names of the columns for the outer level index, and the second list contains the names of the columns for the inner level index.

How to Change the Name of an Index in Pandas DataFrame

In Pandas, we can change the name of an index using the rename_axis() method. The rename_axis() method is used to set or change the name of the index or column labels.

Here is an example of how to change the name of an index in a Pandas DataFrame:

import pandas as pd

# Creating a DataFrame with a custom index
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
                   'Age': [30, 25, 35],
                   'Gender': ['Male', 'Female', 'Male']},
                  index=['a', 'b', 'c'])

# Changing the name of the index
df.rename_axis('IndexLabel', inplace=True)

# Printing the DataFrame
print(df)

Output:

            Name  Age  Gender
IndexLabel                  
a           John   30    Male
b          Alice   25  Female
c            Bob   35    Male

In the above example, we have created a DataFrame with a custom index using the index parameter. We have then changed the name of the index to ‘IndexLabel’ using the rename_axis() method. We have also used the inplace=True parameter to modify the original DataFrame instead of creating a new one.

Note: We can also change the name of the index by directly assigning a new name to the df.index.name attribute.

How to Select Data using Index in Pandas DataFrame

In Pandas, we can select data using the index of a DataFrame using the loc[] method. The loc[] method is used to select rows and columns by labels or a boolean array.

Here is an example of how to select data using the index of a Pandas DataFrame:

import pandas as pd

# Creating a DataFrame with a custom index
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
                   'Age': [30, 25, 35],
                   'Gender': ['Male', 'Female', 'Male']},
                  index=['a', 'b', 'c'])

# Selecting a single row using the index label
print(df.loc['a'])

# Selecting multiple rows using a list of index labels
print(df.loc[['a', 'c']])

# Selecting rows and columns using the index label and column label
print(df.loc['a', 'Age'])

# Selecting a subset of rows and columns using the index label and column label
print(df.loc[['a', 'c'], ['Name', 'Age']])

Output:

Name       John
Age          30
Gender     Male
Name: a, dtype: object

   Name  Age Gender
a  John   30   Male
c   Bob   35   Male

30

    Name  Age
a  John   30
c   Bob   35

In the above example, we have selected data using the index of a Pandas DataFrame. We have used the loc[] method to select a single row using the index label ‘a’, multiple rows using a list of index labels ‘a’ and ‘c’, a single value using the index label ‘a’ and column label ‘Age’, and a subset of rows and columns using a list of index labels and column labels.

Note: We can also select data using the integer position of the index using the iloc[] method instead of loc[].

How to Sort a Pandas DataFrame by Index

A DataFrame can be sorted by its index using the sort_index() method. The sort_index() method sorts the DataFrame by its index in ascending or descending order.

Here is an example of how to sort a Pandas DataFrame by its index:

import pandas as pd

# Creating a DataFrame with a custom index
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
                   'Age': [30, 25, 35],
                   'Gender': ['Male', 'Female', 'Male']},
                  index=['c', 'a', 'b'])

# Sorting the DataFrame by index in ascending order
df.sort_index(inplace=True)

# Printing the DataFrame
print(df)

Output:

    Name  Age  Gender
a  Alice   25  Female
b    Bob   35    Male
c   John   30    Male

In the above example, we have sorted the DataFrame by its index using the sort_index() method. We have used the inplace=True parameter to modify the original DataFrame instead of creating a new one. Now the DataFrame is sorted by its index in ascending order.

Note: We can also sort the DataFrame by index in descending order by passing the ascending=False parameter to the sort_index() method.

How to Merge DataFrames based on Index in Pandas

Two DataFrames can be merged based on their indexes using the merge() method with the left_index=True and right_index=True parameters. The merge() method combines two DataFrames into one based on the values of the index column(s) in both DataFrames.

Here is an example of how to merge two DataFrames based on their indexes:

import pandas as pd

# Creating the first DataFrame with a custom index
df1 = pd.DataFrame({'Age': [30, 25, 35],
                    'Gender': ['Male', 'Female', 'Male']},
                   index=['a', 'b', 'c'])

# Creating the second DataFrame with a custom index
df2 = pd.DataFrame({'Country': ['USA', 'UK', 'USA'],
                    'Salary': [50000, 60000, 55000]},
                   index=['a', 'b', 'c'])

# Merging the two DataFrames based on their indexes
df_merged = pd.merge(df1, df2, left_index=True, right_index=True)

# Printing the merged DataFrame
print(df_merged)

Output:

   Age  Gender Country  Salary
a   30    Male     USA   50000
b   25  Female      UK   60000
c   35    Male     USA   55000

In the above example, we have merged two DataFrames based on their indexes using the merge() method. We have passed the left_index=True and right_index=True parameters to merge the DataFrames based on their indexes. Now the two DataFrames are merged into one based on their index values.

Note: We can also merge two DataFrames based on their index and column values by passing the index column name and column names to the left_on and right_on parameters of the merge() method.

How to Create a Hierarchical Index in Pandas DataFrame

In Pandas, we can create a hierarchical index in a DataFrame using the set_index() method with a list of column names. A hierarchical index, also known as a multi-index, is a way of indexing data using more than one column. It is useful when we have multiple levels of grouping in our data.

Here is an example of how to create a hierarchical index in a Pandas DataFrame:

import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob', 'Sara', 'David', 'Mary'],
                   'Age': [30, 25, 35, 40, 20, 28],
                   'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
                   'City': ['NY', 'NY', 'LA', 'LA', 'NY', 'LA']})

# Creating a hierarchical index using the 'City' and 'Gender' columns
df.set_index(['City', 'Gender'], inplace=True)

# Printing the DataFrame
print(df)

Output:

              Name  Age
City Gender           
NY   Male     John   30
     Female  Alice   25
LA   Male      Bob   35
     Female   Sara   40
NY   Male    David   20
LA   Female   Mary   28

In the above example, we have created a DataFrame with four columns – ‘Name’, ‘Age’, ‘Gender’, and ‘City’. We have then created a hierarchical index using the set_index() method with the ‘City’ and ‘Gender’ columns. Now the DataFrame is indexed by two columns, ‘City’ and ‘Gender’, forming a hierarchical index.

Note: We can create a multi-level hierarchical index by passing a list of column names to the set_index() method. The first column name in the list becomes the outer level of the index, and the last column name becomes the innermost level of the index.

How to Reshape Data using Index in Pandas DataFrame

In Pandas, we can reshape data using index in a DataFrame using the stack() and unstack() methods. The stack() method is used to convert a DataFrame from a wide format to a long format, and the unstack() method is used to convert a DataFrame from a long format to a wide format.

Here is an example of how to reshape data using index in a Pandas DataFrame:

import pandas as pd

# Creating a DataFrame with a custom index
df = pd.DataFrame({'Name': ['John', 'Alice', 'Bob'],
                   'Age': [30, 25, 35],
                   'Gender': ['Male', 'Female', 'Male']},
                  index=['a', 'b', 'c'])

# Reshaping the DataFrame from wide to long format using stack()
df_stacked = df.stack()

# Printing the stacked DataFrame
print(df_stacked)

# Reshaping the DataFrame from long to wide format using unstack()
df_unstacked = df_stacked.unstack()

# Printing the unstacked DataFrame
print(df_unstacked)

Output:

a  Name         John
   Age            30
   Gender       Male
b  Name        Alice
   Age            25
   Gender     Female
c  Name          Bob
   Age            35
   Gender       Male
dtype: object

    Name Age  Gender
a   John  30    Male
b  Alice  25  Female
c    Bob  35    Male

In the above example, we have reshaped the DataFrame using index. First, we have converted the DataFrame from a wide format to a long format using the stack() method. The stack() method has stacked the columns of the DataFrame to create a multi-level index. Now the DataFrame has become longer and narrower.

Then, we have converted the DataFrame from a long format to a wide format using the unstack() method. The unstack() method has unstacked the multi-level index to create a wide format DataFrame. Now the DataFrame has become wider and shorter.

Click to share! ⬇️