
Pandas is a popular data manipulation library for Python. It provides data structures for efficiently storing and manipulating data in memory. Two of the main data structures in Pandas are the DataFrame and Series. In this tutorial, we will cover the basics of working with DataFrame and Series in Pandas. We will learn how to select specific rows and columns, filter data using Boolean indexing, drop unnecessary rows and columns, rename columns, sort data, apply functions to rows or columns, merge data from different sources, group data by one or more columns, and aggregate data.
- How To Create a DataFrame and Series in Pandas
- How To Select Rows and Columns by Label or Position
- How To Use Boolean Indexing to Filter Rows
- How To Drop Rows and Columns from a DataFrame
- How To Rename Columns or Indexes in a DataFrame
- How To Sort a DataFrame by One or Multiple Columns
- How To Apply Functions to DataFrame Rows or Columns
- How To Merge Two DataFrames based on a Common Column
- How To Group Data in a DataFrame by One or Multiple Columns
- How To Aggregate Data in a DataFrame by One or Multiple Columns
- Conclusion and Summary
How To Create a DataFrame and Series in Pandas
To create a DataFrame or Series in Pandas, we can use a variety of input formats such as lists, arrays, dictionaries, or other Pandas data structures.
- Creating a Series: A Series is a one-dimensional array-like object that can hold any data type. To create a Series, we can pass a list or an array of values to the Series() function. Here’s an example:
import pandas as pd
s = pd.Series([10, 20, 30, 40, 50])
print(s)
Output:
0 10
1 20
2 30
3 40
4 50
dtype: int64
- Creating a DataFrame: A DataFrame is a two-dimensional table-like object that can hold multiple Series or arrays. To create a DataFrame, we can pass a dictionary or a list of dictionaries to the DataFrame() function. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
print(df)
Output:
name age country
0 John 30 USA
1 Jane 25 Canada
2 Bob 40 UK
3 Alice 35 Australia
In this example, we created a DataFrame with three columns: ‘name’, ‘age’, and ‘country’. The values for each column were passed as lists in a dictionary.
We can also create a DataFrame from a list of dictionaries. Each dictionary in the list represents a row in the DataFrame. Here’s an example:
import pandas as pd
data = [{'name': 'John', 'age': 30, 'country': 'USA'},
{'name': 'Jane', 'age': 25, 'country': 'Canada'},
{'name': 'Bob', 'age': 40, 'country': 'UK'},
{'name': 'Alice', 'age': 35, 'country': 'Australia'}]
df = pd.DataFrame(data)
print(df)
Output:
name age country
0 John 30 USA
1 Jane 25 Canada
2 Bob 40 UK
3 Alice 35 Australia
In this example, we created the same DataFrame as before, but using a list of dictionaries instead of a dictionary of lists.
How To Select Rows and Columns by Label or Position
In Pandas, we can select specific rows and columns from a DataFrame using either the label or the position of the row or column.
- Selecting Rows by Label: To select rows by label, we can use the loc[] function. The loc[] function accepts a row label or a list of row labels as input. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
print(df.loc['B'])
Output:
name Jane
age 25
country Canada
Name: B, dtype: object
In this example, we selected the row with the label ‘B’ using the loc[] function. We can also select multiple rows using a list of labels:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
print(df.loc[['B', 'D']])
Output:
name age country
B Jane 25 Canada
D Alice 35 Australia
- Selecting Rows by Position: To select rows by position, we can use the iloc[] function. The iloc[] function accepts a row position or a list of row positions as input. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
print(df.iloc[1])
Output:
name Jane
age 25
country Canada
Name: B, dtype: object
In this example, we selected the second row (position 1) using the iloc[] function. We can also select multiple rows using a list of positions:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
print(df.iloc[[1, 3]])
Output:
name age country
B Jane 25 Canada
D Alice 35 Australia
- Selecting Columns by Label or Position: To select columns by label or position, we can use the loc[] or iloc[] function with a colon (:) to select all rows. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
print(df.loc[:, 'name'])
print(df.iloc[:, 1])
A John
B Jane
C Bob
D Alice
Name: name, dtype: object
A 30
B 25
C 40
D 35
Name: age, dtype: int64
In this example, we selected the ‘name’ column using the loc[] function with a colon to select all rows. We also selected the second column (position 1) using the iloc[] function with a colon to select all rows. We can also select multiple columns by passing a list of labels or positions:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
print(df.loc[:, ['name', 'country']])
print(df.iloc[:, [0, 2]])
name country
A John USA
B Jane Canada
C Bob UK
D Alice Australia
name country
A John USA
B Jane Canada
C Bob UK
D Alice Australia
In this example, we selected the ‘name’ and ‘country’ columns using the loc[] function with a list of labels. We also selected the first and third columns (positions 0 and 2) using the iloc[] function with a list of positions.
How To Use Boolean Indexing to Filter Rows
In Pandas, we can use Boolean indexing to filter rows based on a certain condition. Boolean indexing returns a Boolean Series with True or False values indicating whether each element in the original Series or DataFrame satisfies the condition. We can then use this Boolean Series to select the rows that satisfy the condition.
Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
bool_series = df['age'] > 30
print(bool_series)
Output:
0 False
1 False
2 True
3 True
Name: age, dtype: bool
In this example, we created a Boolean Series that checks whether each element in the ‘age’ column of the DataFrame is greater than 30. We can then use this Boolean Series to select the rows that satisfy the condition:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
bool_series = df['age'] > 30
filtered_df = df[bool_series]
print(filtered_df)
Output:
name age country
2 Bob 40 UK
3 Alice 35 Australia
In this example, we selected the rows where the ‘age’ column is greater than 30 by passing the Boolean Series to the DataFrame[] operator. The resulting DataFrame contains only the rows that satisfy the condition.
We can also use logical operators such as AND (&) and OR (|) to combine multiple conditions. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
bool_series = (df['age'] > 30) & (df['country'] != 'USA')
filtered_df = df[bool_series]
print(filtered_df)
Output:
name age country
2 Bob 40 UK
3 Alice 35 Australia
In this example, we selected the rows where the ‘age’ column is greater than 30 and the ‘country’ column is not equal to ‘USA’. We used the logical AND operator (&) to combine the two conditions. The resulting DataFrame contains only the rows that satisfy both conditions.
How To Drop Rows and Columns from a DataFrame
We can remove rows or columns from a DataFrame using the drop() function. The drop() function returns a new DataFrame with the specified rows or columns removed.
- Dropping Rows: To drop one or more rows from a DataFrame, we need to specify the row labels and axis=0 in the drop() function. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
new_df = df.drop(['B', 'D'], axis=0)
print(new_df)
Output:
name age country
A John 30 USA
C Bob 40 UK
In this example, we dropped the rows with labels ‘B’ and ‘D’ using the drop() function with axis=0. The resulting DataFrame contains only the remaining rows.
- Dropping Columns: To drop one or more columns from a DataFrame, we need to specify the column labels and axis=1 in the drop() function. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
new_df = df.drop(['age', 'country'], axis=1)
print(new_df)
Output:
name
A John
B Jane
C Bob
D Alice
In this example, we dropped the columns with labels ‘age’ and ‘country’ using the drop() function with axis=1. The resulting DataFrame contains only the remaining column ‘name’.
Note that the drop() function does not modify the original DataFrame. If we want to modify the original DataFrame, we need to set the inplace parameter to True:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
df.drop(['age', 'country'], axis=1, inplace=True)
print(df)
Output:
name
A John
B Jane
C Bob
D Alice
In this example, we modified the original DataFrame by setting inplace=True. The resulting DataFrame contains only the remaining column ‘name’.
How To Rename Columns or Indexes in a DataFrame
The rename() function returns a new DataFrame with the specified columns or indexes renamed.
- Renaming Columns: To rename one or more columns in a DataFrame, we need to pass a dictionary with the old column names as keys and the new column names as values to the rename() function with axis=1. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
new_df = df.rename(columns={'name': 'first_name', 'country': 'nation'}, inplace=False)
print(new_df)
Output:
first_name age nation
0 John 30 USA
1 Jane 25 Canada
2 Bob 40 UK
3 Alice 35 Australia
In this example, we renamed the ‘name’ column to ‘first_name’ and the ‘country’ column to ‘nation’ using the rename() function with axis=1. The resulting DataFrame contains the renamed columns.
- Renaming Indexes: To rename the index in a DataFrame, we need to pass a dictionary with the old index values as keys and the new index values as values to the rename() function with axis=0. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
new_df = df.rename(index={'A': 'one', 'B': 'two', 'C': 'three', 'D': 'four'}, inplace=False)
print(new_df)
Output:
name age country
one John 30 USA
two Jane 25 Canada
three Bob 40 UK
four Alice 35 Australia
In this example, we renamed the index values using the rename() function with axis=0. The resulting DataFrame contains the renamed indexes.
Note that the rename() function does not modify the original DataFrame. If we want to modify the original DataFrame, we need to set the inplace parameter to True:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
df.rename(columns={'name': 'first_name', 'country': 'nation'}, inplace=True)
df.rename(index={'A': 'one', 'B': 'two', 'C': 'three', 'D': 'four'}, inplace=True)
print(df)
Output:
first_name age nation
one John 30 USA
two Jane 25 Canada
three Bob 40 UK
four Alice 35 Australia
In this example, we modified the original DataFrame by setting inplace=True. The resulting DataFrame contains the renamed columns and indexes.
How To Sort a DataFrame by One or Multiple Columns
In Pandas, we can sort a DataFrame by one or multiple columns using the sort_values() function. The sort_values() function returns a new DataFrame with the rows sorted by the specified columns.
- Sorting by One Column: To sort a DataFrame by one column, we need to pass the column label to the sort_values() function. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
new_df = df.sort_values('age', ascending=True)
print(new_df)
Output:
name age country
1 Jane 25 Canada
0 John 30 USA
3 Alice 35 Australia
2 Bob 40 UK
In this example, we sorted the DataFrame by the ‘age’ column in ascending order using the sort_values() function. The resulting DataFrame contains the rows sorted by the ‘age’ column.
- Sorting by Multiple Columns: To sort a DataFrame by multiple columns, we need to pass a list of column labels to the sort_values() function. The DataFrame will be sorted by the first column in the list, then by the second column, and so on. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
new_df = df.sort_values(['country', 'age'], ascending=[True, False])
print(new_df)
Output:
name age country
1 Jane 25 Canada
0 John 30 USA
2 Bob 40 UK
3 Alice 35 Australia
In this example, we sorted the DataFrame by the ‘country’ column in ascending order and then by the ‘age’ column in descending order using the sort_values() function. The resulting DataFrame contains the rows sorted by both columns.
Note that the sort_values() function does not modify the original DataFrame. If we want to modify the original DataFrame, we need to set the inplace parameter to True:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
df.sort_values(['country', 'age'], ascending=[True, False], inplace=True)
print(df)
Output:
name age country
1 Jane 25 Canada
0 John 30 USA
2 Bob 40 UK
3 Alice 35 Australia
In this example, we modified the original DataFrame by setting inplace=True. The resulting DataFrame contains the sorted rows.
How To Apply Functions to DataFrame Rows or Columns
We can apply functions to DataFrame rows or columns using the apply() function. The apply() function applies a function to each element, row, or column of a DataFrame and returns a new DataFrame with the results.
- Applying a Function to a Column: To apply a function to a column of a DataFrame, we need to pass the function to the apply() function with axis=0. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35]}
df = pd.DataFrame(data)
new_col = df['age'].apply(lambda x: x * 2)
print(new_col)
Output:
0 60
1 50
2 80
3 70
Name: age, dtype: int64
In this example, we applied a lambda function that multiplies each element in the ‘age’ column by 2 using the apply() function with axis=0. The resulting Series contains the values of the new column.
- Applying a Function to a Row: To apply a function to a row of a DataFrame, we need to pass the function to the apply() function with axis=1. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35]}
df = pd.DataFrame(data)
new_row = df.apply(lambda x: x['name'] + ' is ' + str(x['age']) + ' years old', axis=1)
print(new_row)
Output:
0 John is 30 years old
1 Jane is 25 years old
2 Bob is 40 years old
3 Alice is 35 years old
dtype: object
In this example, we applied a lambda function that concatenates the values of the ‘name’ and ‘age’ columns for each row using the apply() function with axis=1. The resulting Series contains the values of the new row.
Note that the apply() function does not modify the original DataFrame. If we want to modify the original DataFrame, we need to assign the result to a column or row:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35]}
df = pd.DataFrame(data)
df['age'] = df['age'].apply(lambda x: x * 2)
df['description'] = df.apply(lambda x: x['name'] + ' is ' + str(x['age']) + ' years old', axis=1)
print(df)
Output:
name age description
0 John 60 John is 60 years old
1 Jane 50 Jane is 50 years old
2 Bob 80 Bob is 80 years old
3 Alice 70 Alice is 70 years old
In this example, we modified the original DataFrame by assigning the result of the apply() functions to new columns. The resulting DataFrame contains the modified columns.
How To Merge Two DataFrames based on a Common Column
In Pandas, we can merge two DataFrames based on a common column using the merge() function. The merge() function combines the rows of two DataFrames into a single DataFrame based on the values of a specified column.
- Merging Two DataFrames with a Common Column: To merge two DataFrames based on a common column, we need to pass the two DataFrames and the name of the common column to the merge() function. Here’s an example:
import pandas as pd
data1 = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35]}
df1 = pd.DataFrame(data1)
data2 = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df2 = pd.DataFrame(data2)
merged_df = pd.merge(df1, df2, on='name')
print(merged_df)
Output:
name age country
0 John 30 USA
1 Jane 25 Canada
2 Bob 40 UK
3 Alice 35 Australia
In this example, we merged two DataFrames ‘df1’ and ‘df2’ based on the ‘name’ column using the merge() function. The resulting DataFrame contains the merged rows with columns from both DataFrames.
- Merging Two DataFrames with Different Column Names: If the common column has different names in the two DataFrames, we need to pass the names of the columns to the merge() function using the left_on and right_on parameters. Here’s an example:
import pandas as pd
data1 = {'first_name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35]}
df1 = pd.DataFrame(data1)
data2 = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df2 = pd.DataFrame(data2)
merged_df = pd.merge(df1, df2, left_on='first_name', right_on='name')
print(merged_df)
Output:
first_name age name country
0 John 30 John USA
1 Jane 25 Jane Canada
2 Bob 40 Bob UK
3 Alice 35 Alice Australia
In this example, we merged two DataFrames ‘df1’ and ‘df2’ based on the ‘first_name’ column in ‘df1’ and the ‘name’ column in ‘df2’ using the merge() function with left_on and right_on parameters. The resulting DataFrame contains the merged rows with columns from both DataFrames.
How To Group Data in a DataFrame by One or Multiple Columns
The groupby() function splits the DataFrame into groups based on the values of the specified column(s), applies a function to each group, and returns a new DataFrame with the results.
- Grouping Data by One Column: To group data in a DataFrame by one column, we need to pass the name of the column to the groupby() function. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'gender': ['male', 'female', 'male', 'female']}
df = pd.DataFrame(data)
grouped_df = df.groupby('gender').mean()
print(grouped_df)
Output:
age
gender
female 30.000000
male 35.000000
In this example, we grouped the DataFrame ‘df’ by the ‘gender’ column using the groupby() function and calculated the mean of the ‘age’ column for each group. The resulting DataFrame contains the mean age of males and females.
- Grouping Data by Multiple Columns: To group data in a DataFrame by multiple columns, we need to pass a list of column names to the groupby() function. The DataFrame will be grouped by the first column in the list, then by the second column, and so on. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'gender': ['male', 'female', 'male', 'female'],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
grouped_df = df.groupby(['gender', 'country']).mean()
print(grouped_df)
Output:
age
gender country
female Australia 35.0
Canada 25.0
male UK 40.0
USA 30.0
In this example, we grouped the DataFrame ‘df’ by the ‘gender’ and ‘country’ columns using the groupby() function and calculated the mean of the ‘age’ column for each group. The resulting DataFrame contains the mean age of males and females in each country.
Note that the groupby() function does not modify the original DataFrame. If we want to modify the original DataFrame, we need to assign the result to a new DataFrame:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'gender': ['male', 'female', 'male', 'female'],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
grouped_df = df.groupby(['gender', 'country']).mean()
df_grouped = df.groupby(['gender', 'country']).transform('mean')
print(df_grouped)
Output:
age
0 30.0
1 25.0
2 40.0
3 35.0
In this example, we modified the original DataFrame ‘df’ by assigning the result of the groupby() function to a new DataFrame ‘df_grouped’. The resulting DataFrame contains the mean age of each group for each row.
How To Aggregate Data in a DataFrame by One or Multiple Columns
We can aggregate data in a DataFrame by one or multiple columns using the groupby() function with an aggregation function. The aggregation function applies a function to each group of rows and returns a scalar value that represents the aggregated value of the group.
- Aggregating Data by One Column: To aggregate data in a DataFrame by one column, we need to pass the name of the column to the groupby() function and an aggregation function to the agg() function. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'gender': ['male', 'female', 'male', 'female']}
df = pd.DataFrame(data)
grouped_df = df.groupby('gender').agg({'age': ['min', 'max', 'mean']})
print(grouped_df)
Output:
age
min max mean
gender
female 25 35 30.0
male 30 40 35.0
In this example, we aggregated the ‘age’ column of the DataFrame ‘df’ by the ‘gender’ column using the groupby() function and calculated the minimum, maximum, and mean age of each group using the agg() function. The resulting DataFrame contains the aggregated values for each group.
- Aggregating Data by Multiple Columns: To aggregate data in a DataFrame by multiple columns, we need to pass a list of column names to the groupby() function and an aggregation function to the agg() function. Here’s an example:
import pandas as pd
data = {'name': ['John', 'Jane', 'Bob', 'Alice'],
'age': [30, 25, 40, 35],
'gender': ['male', 'female', 'male', 'female'],
'country': ['USA', 'Canada', 'UK', 'Australia']}
df = pd.DataFrame(data)
grouped_df = df.groupby(['gender', 'country']).agg({'age': ['min', 'max', 'mean']})
print(grouped_df)
Output:
age
min max mean
gender country
female Australia 35 35 35.0
Canada 25 25 25.0
male UK 40 40 40.0
USA 30 30 30.0
In this example, we aggregated the ‘age’ column of the DataFrame ‘df’ by the ‘gender’ and ‘country’ columns using the groupby() function and calculated the minimum, maximum, and mean age of each group using the agg() function. The resulting DataFrame contains the aggregated values for each group.
Conclusion and Summary
In this tutorial, we covered the basics of selecting rows and columns in a Pandas DataFrame and Series. We learned how to create a DataFrame and Series, select rows and columns by label or position, use boolean indexing to filter rows, drop rows and columns, rename columns and indexes, sort a DataFrame by one or multiple columns, apply functions to DataFrame rows or columns, merge two DataFrames based on a common column, group data in a DataFrame by one or multiple columns, and aggregate data in a DataFrame by one or multiple columns.
Pandas is a powerful library for data manipulation and analysis in Python, and the techniques covered in this tutorial are just the tip of the iceberg. With Pandas, you can perform advanced data cleaning, transformation, and visualization tasks on large datasets with ease. We hope this tutorial has helped you get started with Pandas and has provided you with a solid foundation for further exploration of its capabilities.
- Python Pandas DataFrame and Series Basics – Selecting Rows (vegibit.com)
- Indexing and selecting data — pandas 2.0.0 documentation (pandas.pydata.org)
- Select Rows & Columns by Name or Index in Pandas (www.geeksforgeeks.org)
- Selecting Rows And Columns in Python Pandas – Python (etav.github.io)
- python – Selecting specific rows from a pandas (stackoverflow.com)
- How to Select Rows and Columns in Pandas Using [ ], (www.kdnuggets.com)
- Selecting Columns in Pandas: Complete Guide • datagy (datagy.io)
- Select Specific Columns in Pandas Dataframe (www.pythonforbeginners.com)
- Pandas Tutorial 1: Pandas Basics (read_csv, (data36.com)
- Python Pandas Tutorial: A Complete Introduction for (www.learndatasci.com)
- How to Update Rows and Columns Using Python Pandas (www.digitalocean.com)
- A Simple Guide to Pandas Dataframe Operations – Analytics Vidhya (www.analyticsvidhya.com)
- Pandas Dataframe – Python Tutorial – pythonbasics.org (pythonbasics.org)
- Pandas Cheat Sheet for Data Science in Python | DataCamp (www.datacamp.com)
- Selecting Subsets of Data in Pandas: Part 1 – Medium (medium.com)