Python Pandas DataFrame and Series Basics – Selecting Rows and Columns

Click to share! ⬇️

Pandas is a popular data manipulation library for Python. It provides data structures for efficiently storing and manipulating data in memory. Two of the main data structures in Pandas are the DataFrame and Series. In this tutorial, we will cover the basics of working with DataFrame and Series in Pandas. We will learn how to select specific rows and columns, filter data using Boolean indexing, drop unnecessary rows and columns, rename columns, sort data, apply functions to rows or columns, merge data from different sources, group data by one or more columns, and aggregate data.

  1. How To Create a DataFrame and Series in Pandas
  2. How To Select Rows and Columns by Label or Position
  3. How To Use Boolean Indexing to Filter Rows
  4. How To Drop Rows and Columns from a DataFrame
  5. How To Rename Columns or Indexes in a DataFrame
  6. How To Sort a DataFrame by One or Multiple Columns
  7. How To Apply Functions to DataFrame Rows or Columns
  8. How To Merge Two DataFrames based on a Common Column
  9. How To Group Data in a DataFrame by One or Multiple Columns
  10. How To Aggregate Data in a DataFrame by One or Multiple Columns
  11. Conclusion and Summary

How To Create a DataFrame and Series in Pandas

To create a DataFrame or Series in Pandas, we can use a variety of input formats such as lists, arrays, dictionaries, or other Pandas data structures.

  1. Creating a Series: A Series is a one-dimensional array-like object that can hold any data type. To create a Series, we can pass a list or an array of values to the Series() function. Here’s an example:
import pandas as pd
s = pd.Series([10, 20, 30, 40, 50])
print(s)

Output:

0    10
1    20
2    30
3    40
4    50
dtype: int64
  1. Creating a DataFrame: A DataFrame is a two-dimensional table-like object that can hold multiple Series or arrays. To create a DataFrame, we can pass a dictionary or a list of dictionaries to the DataFrame() function. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
print(df)

Output:

    name  age    country
0   John   30        USA
1   Jane   25     Canada
2    Bob   40         UK
3  Alice   35  Australia

In this example, we created a DataFrame with three columns: ‘name’, ‘age’, and ‘country’. The values for each column were passed as lists in a dictionary.

We can also create a DataFrame from a list of dictionaries. Each dictionary in the list represents a row in the DataFrame. Here’s an example:

import pandas as pd
data = [{'name': 'John', 'age': 30, 'country': 'USA'},
        {'name': 'Jane', 'age': 25, 'country': 'Canada'},
        {'name': 'Bob', 'age': 40, 'country': 'UK'},
        {'name': 'Alice', 'age': 35, 'country': 'Australia'}]
df = pd.DataFrame(data)
print(df)

Output:

    name  age    country
0   John   30        USA
1   Jane   25     Canada
2    Bob   40         UK
3  Alice   35  Australia

In this example, we created the same DataFrame as before, but using a list of dictionaries instead of a dictionary of lists.

How To Select Rows and Columns by Label or Position

In Pandas, we can select specific rows and columns from a DataFrame using either the label or the position of the row or column.

  1. Selecting Rows by Label: To select rows by label, we can use the loc[] function. The loc[] function accepts a row label or a list of row labels as input. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
print(df.loc['B'])

Output:

name         Jane
age            25
country    Canada
Name: B, dtype: object

In this example, we selected the row with the label ‘B’ using the loc[] function. We can also select multiple rows using a list of labels:

import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
print(df.loc[['B', 'D']])

Output:

    name  age    country
B   Jane   25     Canada
D  Alice   35  Australia
  1. Selecting Rows by Position: To select rows by position, we can use the iloc[] function. The iloc[] function accepts a row position or a list of row positions as input. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
print(df.iloc[1])

Output:

name         Jane
age            25
country    Canada
Name: B, dtype: object

In this example, we selected the second row (position 1) using the iloc[] function. We can also select multiple rows using a list of positions:

import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
print(df.iloc[[1, 3]])

Output:

    name  age    country
B   Jane   25     Canada
D  Alice   35  Australia
  1. Selecting Columns by Label or Position: To select columns by label or position, we can use the loc[] or iloc[] function with a colon (:) to select all rows. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
print(df.loc[:, 'name'])
print(df.iloc[:, 1])
A     John
B     Jane
C      Bob
D    Alice
Name: name, dtype: object
A    30
B    25
C    40
D    35
Name: age, dtype: int64

In this example, we selected the ‘name’ column using the loc[] function with a colon to select all rows. We also selected the second column (position 1) using the iloc[] function with a colon to select all rows. We can also select multiple columns by passing a list of labels or positions:

import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
print(df.loc[:, ['name', 'country']])
print(df.iloc[:, [0, 2]])
    name    country
A   John        USA
B   Jane     Canada
C    Bob         UK
D  Alice  Australia
    name    country
A   John        USA
B   Jane     Canada
C    Bob         UK
D  Alice  Australia

In this example, we selected the ‘name’ and ‘country’ columns using the loc[] function with a list of labels. We also selected the first and third columns (positions 0 and 2) using the iloc[] function with a list of positions.

How To Use Boolean Indexing to Filter Rows

In Pandas, we can use Boolean indexing to filter rows based on a certain condition. Boolean indexing returns a Boolean Series with True or False values indicating whether each element in the original Series or DataFrame satisfies the condition. We can then use this Boolean Series to select the rows that satisfy the condition.

Here’s an example:

import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
bool_series = df['age'] > 30
print(bool_series)

Output:

0    False
1    False
2     True
3     True
Name: age, dtype: bool

In this example, we created a Boolean Series that checks whether each element in the ‘age’ column of the DataFrame is greater than 30. We can then use this Boolean Series to select the rows that satisfy the condition:

import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
bool_series = df['age'] > 30
filtered_df = df[bool_series]
print(filtered_df)

Output:

    name  age    country
2    Bob   40         UK
3  Alice   35  Australia

In this example, we selected the rows where the ‘age’ column is greater than 30 by passing the Boolean Series to the DataFrame[] operator. The resulting DataFrame contains only the rows that satisfy the condition.

We can also use logical operators such as AND (&) and OR (|) to combine multiple conditions. Here’s an example:

import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
bool_series = (df['age'] > 30) & (df['country'] != 'USA')
filtered_df = df[bool_series]
print(filtered_df)

Output:

    name  age    country
2    Bob   40         UK
3  Alice   35  Australia

In this example, we selected the rows where the ‘age’ column is greater than 30 and the ‘country’ column is not equal to ‘USA’. We used the logical AND operator (&) to combine the two conditions. The resulting DataFrame contains only the rows that satisfy both conditions.

How To Drop Rows and Columns from a DataFrame

We can remove rows or columns from a DataFrame using the drop() function. The drop() function returns a new DataFrame with the specified rows or columns removed.

  1. Dropping Rows: To drop one or more rows from a DataFrame, we need to specify the row labels and axis=0 in the drop() function. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
new_df = df.drop(['B', 'D'], axis=0)
print(new_df)

Output:

   name  age country
A  John   30     USA
C   Bob   40      UK

In this example, we dropped the rows with labels ‘B’ and ‘D’ using the drop() function with axis=0. The resulting DataFrame contains only the remaining rows.

  1. Dropping Columns: To drop one or more columns from a DataFrame, we need to specify the column labels and axis=1 in the drop() function. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
new_df = df.drop(['age', 'country'], axis=1)
print(new_df)

Output:

   name
A  John
B  Jane
C   Bob
D  Alice

In this example, we dropped the columns with labels ‘age’ and ‘country’ using the drop() function with axis=1. The resulting DataFrame contains only the remaining column ‘name’.

Note that the drop() function does not modify the original DataFrame. If we want to modify the original DataFrame, we need to set the inplace parameter to True:

import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
df.drop(['age', 'country'], axis=1, inplace=True)
print(df)

Output:

   name
A  John
B  Jane
C   Bob
D  Alice

In this example, we modified the original DataFrame by setting inplace=True. The resulting DataFrame contains only the remaining column ‘name’.

How To Rename Columns or Indexes in a DataFrame

The rename() function returns a new DataFrame with the specified columns or indexes renamed.

  1. Renaming Columns: To rename one or more columns in a DataFrame, we need to pass a dictionary with the old column names as keys and the new column names as values to the rename() function with axis=1. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
new_df = df.rename(columns={'name': 'first_name', 'country': 'nation'}, inplace=False)
print(new_df)

Output:

  first_name  age      nation
0       John   30         USA
1       Jane   25      Canada
2        Bob   40          UK
3      Alice   35   Australia

In this example, we renamed the ‘name’ column to ‘first_name’ and the ‘country’ column to ‘nation’ using the rename() function with axis=1. The resulting DataFrame contains the renamed columns.

  1. Renaming Indexes: To rename the index in a DataFrame, we need to pass a dictionary with the old index values as keys and the new index values as values to the rename() function with axis=0. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
new_df = df.rename(index={'A': 'one', 'B': 'two', 'C': 'three', 'D': 'four'}, inplace=False)
print(new_df)

Output:

      name  age      country
one   John   30         USA
two   Jane   25      Canada
three  Bob   40          UK
four   Alice   35   Australia

In this example, we renamed the index values using the rename() function with axis=0. The resulting DataFrame contains the renamed indexes.

Note that the rename() function does not modify the original DataFrame. If we want to modify the original DataFrame, we need to set the inplace parameter to True:

import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
df.rename(columns={'name': 'first_name', 'country': 'nation'}, inplace=True)
df.rename(index={'A': 'one', 'B': 'two', 'C': 'three', 'D': 'four'}, inplace=True)
print(df)

Output:

     first_name  age      nation
one        John   30         USA
two        Jane   25      Canada
three       Bob   40          UK
four      Alice   35   Australia

In this example, we modified the original DataFrame by setting inplace=True. The resulting DataFrame contains the renamed columns and indexes.

How To Sort a DataFrame by One or Multiple Columns

In Pandas, we can sort a DataFrame by one or multiple columns using the sort_values() function. The sort_values() function returns a new DataFrame with the rows sorted by the specified columns.

  1. Sorting by One Column: To sort a DataFrame by one column, we need to pass the column label to the sort_values() function. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
new_df = df.sort_values('age', ascending=True)
print(new_df)

Output:

   name  age    country
1  Jane   25     Canada
0  John   30        USA
3  Alice  35  Australia
2   Bob   40         UK

In this example, we sorted the DataFrame by the ‘age’ column in ascending order using the sort_values() function. The resulting DataFrame contains the rows sorted by the ‘age’ column.

  1. Sorting by Multiple Columns: To sort a DataFrame by multiple columns, we need to pass a list of column labels to the sort_values() function. The DataFrame will be sorted by the first column in the list, then by the second column, and so on. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
new_df = df.sort_values(['country', 'age'], ascending=[True, False])
print(new_df)

Output:

   name  age    country
1  Jane   25     Canada
0  John   30        USA
2   Bob   40         UK
3  Alice  35  Australia

In this example, we sorted the DataFrame by the ‘country’ column in ascending order and then by the ‘age’ column in descending order using the sort_values() function. The resulting DataFrame contains the rows sorted by both columns.

Note that the sort_values() function does not modify the original DataFrame. If we want to modify the original DataFrame, we need to set the inplace parameter to True:

import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
df.sort_values(['country', 'age'], ascending=[True, False], inplace=True)
print(df)

Output:

   name  age    country
1  Jane   25     Canada
0  John   30        USA
2   Bob   40         UK
3  Alice  35  Australia

In this example, we modified the original DataFrame by setting inplace=True. The resulting DataFrame contains the sorted rows.

How To Apply Functions to DataFrame Rows or Columns

We can apply functions to DataFrame rows or columns using the apply() function. The apply() function applies a function to each element, row, or column of a DataFrame and returns a new DataFrame with the results.

  1. Applying a Function to a Column: To apply a function to a column of a DataFrame, we need to pass the function to the apply() function with axis=0. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35]}
df = pd.DataFrame(data)
new_col = df['age'].apply(lambda x: x * 2)
print(new_col)

Output:

0    60
1    50
2    80
3    70
Name: age, dtype: int64

In this example, we applied a lambda function that multiplies each element in the ‘age’ column by 2 using the apply() function with axis=0. The resulting Series contains the values of the new column.

  1. Applying a Function to a Row: To apply a function to a row of a DataFrame, we need to pass the function to the apply() function with axis=1. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35]}
df = pd.DataFrame(data)
new_row = df.apply(lambda x: x['name'] + ' is ' + str(x['age']) + ' years old', axis=1)
print(new_row)

Output:

0      John is 30 years old
1      Jane is 25 years old
2       Bob is 40 years old
3    Alice is 35 years old
dtype: object

In this example, we applied a lambda function that concatenates the values of the ‘name’ and ‘age’ columns for each row using the apply() function with axis=1. The resulting Series contains the values of the new row.

Note that the apply() function does not modify the original DataFrame. If we want to modify the original DataFrame, we need to assign the result to a column or row:

import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35]}
df = pd.DataFrame(data)
df['age'] = df['age'].apply(lambda x: x * 2)
df['description'] = df.apply(lambda x: x['name'] + ' is ' + str(x['age']) + ' years old', axis=1)
print(df)

Output:

   name  age              description
0  John   60      John is 60 years old
1  Jane   50      Jane is 50 years old
2   Bob   80       Bob is 80 years old
3  Alice  70  Alice is 70 years old

In this example, we modified the original DataFrame by assigning the result of the apply() functions to new columns. The resulting DataFrame contains the modified columns.

How To Merge Two DataFrames based on a Common Column

In Pandas, we can merge two DataFrames based on a common column using the merge() function. The merge() function combines the rows of two DataFrames into a single DataFrame based on the values of a specified column.

  1. Merging Two DataFrames with a Common Column: To merge two DataFrames based on a common column, we need to pass the two DataFrames and the name of the common column to the merge() function. Here’s an example:
import pandas as pd
data1 = {'name': ['John', 'Jane', 'Bob', 'Alice'],
         'age': [30, 25, 40, 35]}
df1 = pd.DataFrame(data1)
data2 = {'name': ['John', 'Jane', 'Bob', 'Alice'],
         'country': ['USA', 'Canada', 'UK', 'Australia']}
df2 = pd.DataFrame(data2)
merged_df = pd.merge(df1, df2, on='name')
print(merged_df)

Output:

    name  age    country
0   John   30        USA
1   Jane   25     Canada
2    Bob   40         UK
3  Alice   35  Australia

In this example, we merged two DataFrames ‘df1’ and ‘df2’ based on the ‘name’ column using the merge() function. The resulting DataFrame contains the merged rows with columns from both DataFrames.

  1. Merging Two DataFrames with Different Column Names: If the common column has different names in the two DataFrames, we need to pass the names of the columns to the merge() function using the left_on and right_on parameters. Here’s an example:
import pandas as pd
data1 = {'first_name': ['John', 'Jane', 'Bob', 'Alice'],
         'age': [30, 25, 40, 35]}
df1 = pd.DataFrame(data1)
data2 = {'name': ['John', 'Jane', 'Bob', 'Alice'],
         'country': ['USA', 'Canada', 'UK', 'Australia']}
df2 = pd.DataFrame(data2)
merged_df = pd.merge(df1, df2, left_on='first_name', right_on='name')
print(merged_df)

Output:

  first_name  age   name    country
0       John   30   John        USA
1       Jane   25   Jane     Canada
2        Bob   40    Bob         UK
3      Alice   35  Alice  Australia

In this example, we merged two DataFrames ‘df1’ and ‘df2’ based on the ‘first_name’ column in ‘df1’ and the ‘name’ column in ‘df2’ using the merge() function with left_on and right_on parameters. The resulting DataFrame contains the merged rows with columns from both DataFrames.

How To Group Data in a DataFrame by One or Multiple Columns

The groupby() function splits the DataFrame into groups based on the values of the specified column(s), applies a function to each group, and returns a new DataFrame with the results.

  1. Grouping Data by One Column: To group data in a DataFrame by one column, we need to pass the name of the column to the groupby() function. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'gender': ['male', 'female', 'male', 'female']}
df = pd.DataFrame(data)
grouped_df = df.groupby('gender').mean()
print(grouped_df)

Output:

              age
gender           
female  30.000000
male    35.000000

In this example, we grouped the DataFrame ‘df’ by the ‘gender’ column using the groupby() function and calculated the mean of the ‘age’ column for each group. The resulting DataFrame contains the mean age of males and females.

  1. Grouping Data by Multiple Columns: To group data in a DataFrame by multiple columns, we need to pass a list of column names to the groupby() function. The DataFrame will be grouped by the first column in the list, then by the second column, and so on. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'gender': ['male', 'female', 'male', 'female'],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
grouped_df = df.groupby(['gender', 'country']).mean()
print(grouped_df)

Output:

                      age
gender country           
female Australia     35.0
       Canada        25.0
male   UK            40.0
       USA           30.0

In this example, we grouped the DataFrame ‘df’ by the ‘gender’ and ‘country’ columns using the groupby() function and calculated the mean of the ‘age’ column for each group. The resulting DataFrame contains the mean age of males and females in each country.

Note that the groupby() function does not modify the original DataFrame. If we want to modify the original DataFrame, we need to assign the result to a new DataFrame:

import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'gender': ['male', 'female', 'male', 'female'],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
grouped_df = df.groupby(['gender', 'country']).mean()
df_grouped = df.groupby(['gender', 'country']).transform('mean')
print(df_grouped)

Output:

    age
0  30.0
1  25.0
2  40.0
3  35.0

In this example, we modified the original DataFrame ‘df’ by assigning the result of the groupby() function to a new DataFrame ‘df_grouped’. The resulting DataFrame contains the mean age of each group for each row.

How To Aggregate Data in a DataFrame by One or Multiple Columns

We can aggregate data in a DataFrame by one or multiple columns using the groupby() function with an aggregation function. The aggregation function applies a function to each group of rows and returns a scalar value that represents the aggregated value of the group.

  1. Aggregating Data by One Column: To aggregate data in a DataFrame by one column, we need to pass the name of the column to the groupby() function and an aggregation function to the agg() function. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'gender': ['male', 'female', 'male', 'female']}
df = pd.DataFrame(data)
grouped_df = df.groupby('gender').agg({'age': ['min', 'max', 'mean']})
print(grouped_df)

Output:

         age            
         min max  mean
gender               
female  25  35  30.0
male    30  40  35.0

In this example, we aggregated the ‘age’ column of the DataFrame ‘df’ by the ‘gender’ column using the groupby() function and calculated the minimum, maximum, and mean age of each group using the agg() function. The resulting DataFrame contains the aggregated values for each group.

  1. Aggregating Data by Multiple Columns: To aggregate data in a DataFrame by multiple columns, we need to pass a list of column names to the groupby() function and an aggregation function to the agg() function. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
        'age': [30, 25, 40, 35],
        'gender': ['male', 'female', 'male', 'female'],
        'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
grouped_df = df.groupby(['gender', 'country']).agg({'age': ['min', 'max', 'mean']})
print(grouped_df)

Output:

                 age            
                 min max  mean
gender country               
female Australia  35  35  35.0
       Canada     25  25  25.0
male   UK         40  40  40.0
       USA        30  30  30.0

In this example, we aggregated the ‘age’ column of the DataFrame ‘df’ by the ‘gender’ and ‘country’ columns using the groupby() function and calculated the minimum, maximum, and mean age of each group using the agg() function. The resulting DataFrame contains the aggregated values for each group.

Conclusion and Summary

In this tutorial, we covered the basics of selecting rows and columns in a Pandas DataFrame and Series. We learned how to create a DataFrame and Series, select rows and columns by label or position, use boolean indexing to filter rows, drop rows and columns, rename columns and indexes, sort a DataFrame by one or multiple columns, apply functions to DataFrame rows or columns, merge two DataFrames based on a common column, group data in a DataFrame by one or multiple columns, and aggregate data in a DataFrame by one or multiple columns.

Pandas is a powerful library for data manipulation and analysis in Python, and the techniques covered in this tutorial are just the tip of the iceberg. With Pandas, you can perform advanced data cleaning, transformation, and visualization tasks on large datasets with ease. We hope this tutorial has helped you get started with Pandas and has provided you with a solid foundation for further exploration of its capabilities.

Click to share! ⬇️