Click to share! ⬇️

Pandas is a popular open-source data analysis and manipulation library for Python. It offers powerful tools for data cleaning, exploration, and manipulation, making it a go-to tool for many data scientists and analysts. In this tutorial, we will focus on updating rows and columns in a Pandas DataFrame. We will explore how to modify data within DataFrames based on certain conditions and using functions, lambda expressions, and more. We will also discuss how to handle missing values when updating data and how to update data in multiple DataFrames simultaneously. By the end of this tutorial, you will have a good understanding of how to update data in Pandas DataFrames and be able to use these skills to manipulate and analyze your own datasets.

  1. How To Access Data in a Pandas DataFrame
  2. How To Update Rows in a Pandas DataFrame
  3. How To Update Columns in a Pandas DataFrame
  4. How To Modify Data Based on Specific Conditions
  5. How To Handle Missing Values When Updating Data
  6. How To Update Data Using Functions and Lambda Expressions
  7. How To Update Data in Multiple DataFrames Simultaneously
  8. How To Save Updated Data to a New File

How To Access Data in a Pandas DataFrame

Before we can update data in a Pandas DataFrame, we need to know how to access it. Pandas provides several ways to access data in a DataFrame:

  1. Indexing: We can use the square bracket notation to access specific rows or columns of a DataFrame by their labels or positions.
  2. Slicing: We can use slicing to access a range of rows or columns of a DataFrame.
  3. Boolean indexing: We can use Boolean indexing to access rows of a DataFrame that meet certain conditions.
  4. .loc and .iloc: The .loc and .iloc attributes provide more advanced indexing capabilities, allowing us to access specific rows and columns of a DataFrame by label or position, respectively.

Here is an example of how to access data in a Pandas DataFrame using these methods:

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
})

# print the entire DataFrame
print(df)

# access the 'Name' column using indexing
print(df['Name'])

# access the first three rows using slicing
print(df[:3])

# access rows where Age is greater than 30 using Boolean indexing
print(df[df['Age'] > 30])

# access the 'Name' and 'City' columns using .loc
print(df.loc[:, ['Name', 'City']])

# access the first two rows and first two columns using .iloc
print(df.iloc[:2, :2])

Output:

       Name  Age      City
0     Alice   25  New York
1       Bob   30     Paris
2   Charlie   35    London
3     David   40     Tokyo
4     Emily   45    Sydney
0      Alice
1        Bob
2    Charlie
3      David
4      Emily
Name: Name, dtype: object
      Name  Age      City
0    Alice   25  New York
1      Bob   30     Paris
2  Charlie   35    London
      Name  Age    City
2  Charlie   35  London
3    David   40   Tokyo
4    Emily   45  Sydney
       Name      City
0     Alice  New York
1       Bob     Paris
2   Charlie    London
3     David     Tokyo
4     Emily    Sydney
      Name  Age
0    Alice   25
1      Bob   30

As shown in this example, accessing data in a Pandas DataFrame is easy and intuitive. Once we have accessed the data we need, we can modify it using the techniques we will cover in the following sections.

How To Update Rows in a Pandas DataFrame

Updating rows in a Pandas DataFrame is a common operation that we might need to perform while working with data. We can update a single row or multiple rows at once using several techniques:

  1. Indexing: We can use indexing to access a specific row or set of rows, and then update the values of specific columns in those rows.
  2. .loc and .iloc: The .loc and .iloc attributes also allow us to update specific rows and columns of a DataFrame.
  3. .apply and .applymap: We can use the .apply and .applymap methods to apply a function to each row or element of a DataFrame, respectively, and update the values based on the function’s output.

Here is an example of how to update rows in a Pandas DataFrame using these methods:

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
})

# print the original DataFrame
print('Original DataFrame:')
print(df)

# update the values of the 'Age' column for the rows where Name is 'Bob' and 'David' using indexing
df.loc[df['Name'].isin(['Bob', 'David']), 'Age'] = [31, 41]

# update the values of the 'City' column for the row where Name is 'Charlie' using .iloc
df.iloc[2, 2] = 'Manchester'

# define a function to update the values of the 'Age' column based on a condition, and apply it to each row using .apply
def update_age(row):
    if row['Age'] > 30:
        return row['Age'] + 5
    else:
        return row['Age']
    
df['Age'] = df.apply(update_age, axis=1)

# print the updated DataFrame
print('Updated DataFrame:')
print(df)

Output:

Original DataFrame:
      Name  Age      City
0    Alice   25  New York
1      Bob   30     Paris
2  Charlie   35    London
3    David   40     Tokyo
4    Emily   45    Sydney
Updated DataFrame:
      Name  Age        City
0    Alice   25    New York
1      Bob   31       Paris
2  Charlie   40  Manchester
3    David   46       Tokyo
4    Emily   50      Sydney

As shown in this example, updating rows in a Pandas DataFrame is straightforward. By using the appropriate indexing and slicing methods, we can select the rows and columns we need to update and modify their values. We can also use functions to update the values based on certain conditions.

How To Update Columns in a Pandas DataFrame

Updating columns in a Pandas DataFrame is similar to updating rows, but instead of selecting and modifying specific rows, we select and modify specific columns. We can update a single column or multiple columns at once using the following techniques:

  1. Indexing: We can use indexing to select a specific column or set of columns, and then update the values of those columns.
  2. .loc and .iloc: The .loc and .iloc attributes also allow us to update specific rows and columns of a DataFrame.
  3. .apply and .applymap: We can use the .apply and .applymap methods to apply a function to each column or element of a DataFrame, respectively, and update the values based on the function’s output.

Here is an example of how to update columns in a Pandas DataFrame using these methods:

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
})

# print the original DataFrame
print('Original DataFrame:')
print(df)

# update the values of the 'Age' and 'City' columns using indexing
df['Age'] = [26, 31, 36, 41, 46]
df['City'] = ['New York', 'Paris', 'Manchester', 'Tokyo', 'Sydney']

# update the values of the 'Name' column for the row where Age is 35 using .iloc
df.iloc[2, 0] = 'Charlotte'

# define a function to update the values of a column based on a condition, and apply it to each column using .apply
def update_city(column):
    if column.name == 'City':
        return column.str.upper()
    else:
        return column
    
df = df.apply(update_city)

# print the updated DataFrame
print('Updated DataFrame:')
print(df)

Output:

Original DataFrame:
      Name  Age      City
0    Alice   25  New York
1      Bob   30     Paris
2  Charlie   35    London
3    David   40     Tokyo
4    Emily   45    Sydney
Updated DataFrame:
        Name  Age        City
0      Alice   26    NEW YORK
1        Bob   31       PARIS
2  Charlotte   36  MANCHESTER
3      David   41       TOKYO
4      Emily   46      SYDNEY

As shown in this example, updating columns in a Pandas DataFrame is also straightforward. By selecting the appropriate columns and modifying their values using indexing or functions, we can update the data in the DataFrame as needed.

How To Modify Data Based on Specific Conditions

Sometimes we need to modify data in a Pandas DataFrame based on specific conditions, such as updating all values greater than a certain threshold or replacing missing values with a default value. Here are some techniques for modifying data based on specific conditions:

  1. Boolean indexing: We can use Boolean indexing to select rows or columns that meet certain conditions, and then update their values.
  2. .loc and .iloc: The .loc and .iloc attributes also allow us to select and modify specific rows and columns based on conditions.
  3. .apply and .applymap: We can use the .apply and .applymap methods to apply a function to each row or element of a DataFrame, respectively, and update the values based on the function’s output.

Here is an example of how to modify data in a Pandas DataFrame based on specific conditions using these methods:

import pandas as pd
import numpy as np

# create a DataFrame with some missing values
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, np.nan, 35, 40, np.nan],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
})

# print the original DataFrame
print('Original DataFrame:')
print(df)

# update all missing values in the 'Age' column with 30 using Boolean indexing
df.loc[df['Age'].isnull(), 'Age'] = 30

# replace all values greater than 35 in the 'Age' column with 35 using .loc
df.loc[df['Age'] > 35, 'Age'] = 35

# define a function to modify a row based on a condition, and apply it to each row using .apply
def modify_row(row):
    if row['City'] == 'New York':
        row['Age'] = 28
    return row
    
df = df.apply(modify_row, axis=1)

# print the updated DataFrame
print('Updated DataFrame:')
print(df)

Output:

Original DataFrame:
      Name   Age      City
0    Alice  25.0  New York
1      Bob   NaN     Paris
2  Charlie  35.0    London
3    David  40.0     Tokyo
4    Emily   NaN    Sydney
Updated DataFrame:
      Name   Age      City
0    Alice  28.0  New York
1      Bob  30.0     Paris
2  Charlie  35.0    London
3    David  35.0     Tokyo
4    Emily  30.0    Sydney

As shown in this example, modifying data in a Pandas DataFrame based on specific conditions is simple using Boolean indexing, .loc and .iloc, or applying functions to rows or elements. We can use these techniques to perform a wide range of data cleaning and manipulation tasks.

How To Handle Missing Values When Updating Data

Handling missing values is an important task when working with data. In a Pandas DataFrame, missing values are typically represented as NaN (Not a Number) values. Here are some techniques for handling missing values when updating data in a Pandas DataFrame:

  1. Fill missing values with a default value: We can use the .fillna method to fill missing values with a specific value, such as the mean or median of the column.
  2. Drop rows or columns with missing values: We can use the .dropna method to remove rows or columns with missing values.
  3. Interpolate missing values: We can use the .interpolate method to interpolate missing values based on the values of adjacent rows or columns.

Here is an example of how to handle missing values when updating data in a Pandas DataFrame using these methods:

import pandas as pd
import numpy as np

# create a DataFrame with some missing values
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, np.nan, 35, 40, np.nan],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
})

# print the original DataFrame
print('Original DataFrame:')
print(df)

# fill missing values in the 'Age' column with the mean age using .fillna
df['Age'] = df['Age'].fillna(df['Age'].mean())

# drop rows with missing values using .dropna
df = df.dropna()

# interpolate missing values in the 'Age' column using .interpolate
df['Age'] = df['Age'].interpolate()

# print the updated DataFrame
print('Updated DataFrame:')
print(df)

Output:

Original DataFrame:
      Name   Age      City
0    Alice  25.0  New York
1      Bob   NaN     Paris
2  Charlie  35.0    London
3    David  40.0     Tokyo
4    Emily   NaN    Sydney
Updated DataFrame:
      Name   Age      City
0    Alice  25.0  New York
2  Charlie  35.0    London
3    David  40.0     Tokyo

As shown in this example, handling missing values when updating data in a Pandas DataFrame is simple using the .fillna, .dropna, and .interpolate methods. We can choose the appropriate method based on our specific needs and the characteristics of the data.

How To Update Data Using Functions and Lambda Expressions

Functions and lambda expressions are powerful tools for updating data in a Pandas DataFrame. They allow us to apply complex operations to the data and modify it in a flexible and efficient way. Here are some techniques for updating data using functions and lambda expressions:

  1. Define a function to modify the data: We can define a custom function that takes a DataFrame as input, performs some operations on it, and returns the modified DataFrame.
  2. Apply a function to specific columns: We can use the .apply method to apply a function to specific columns of a DataFrame.
  3. Use a lambda expression: We can use a lambda expression to define a simple function inline, without the need to define a separate function.

Here is an example of how to update data using functions and lambda expressions in a Pandas DataFrame:

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
})

# print the original DataFrame
print('Original DataFrame:')
print(df)

# define a function to modify the 'Age' column
def modify_age(df):
    df['Age'] = df['Age'].apply(lambda x: x + 5)
    return df

# apply the modify_age function to the DataFrame
df = modify_age(df)

# use a lambda expression to modify the 'City' column
df['City'] = df['City'].apply(lambda x: x.upper())

# print the updated DataFrame
print('Updated DataFrame:')
print(df)

Output:

Original DataFrame:
      Name  Age      City
0    Alice   25  New York
1      Bob   30     Paris
2  Charlie   35    London
3    David   40     Tokyo
4    Emily   45    Sydney
Updated DataFrame:
      Name  Age      City
0    Alice   30  NEW YORK
1      Bob   35     PARIS
2  Charlie   40    LONDON
3    David   45     TOKYO
4    Emily   50    SYDNEY

As shown in this example, updating data using functions and lambda expressions in a Pandas DataFrame is flexible and powerful. We can define custom functions to perform complex operations on the data and use lambda expressions for simple operations. We can apply these techniques to update specific columns or the entire DataFrame as needed.

How To Update Data in Multiple DataFrames Simultaneously

In some cases, we may need to update data in multiple Pandas DataFrames simultaneously, such as when merging or concatenating DataFrames. Here are some techniques for updating data in multiple DataFrames simultaneously:

  1. Define a function to modify the data: We can define a custom function that takes one or more DataFrames as input, performs some operations on them, and returns the modified DataFrames.
  2. Use the .update method: The .update method allows us to modify one DataFrame with the values from another DataFrame, using a shared index.
  3. Use the .merge method: The .merge method allows us to merge multiple DataFrames based on a common column or index.

Here is an example of how to update data in multiple DataFrames simultaneously using these methods:

import pandas as pd

# create two DataFrames
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
})

df2 = pd.DataFrame({
    'Name': ['Bob', 'Charlie', 'David'],
    'Salary': [50000, 60000, 70000]
})

# print the original DataFrames
print('Original DataFrames:')
print(df1)
print(df2)

# define a function to modify both DataFrames
def modify_data(df1, df2):
    # use .set_index to set the 'Name' column as the index for both DataFrames
    df1 = df1.set_index('Name')
    df2 = df2.set_index('Name')
    
    # update the 'Age' column in df1 based on the 'Name' column in df2 using .update
    df1.update(df2['Salary'])
    
    # reset the index for both DataFrames
    df1 = df1.reset_index()
    df2 = df2.reset_index()
    
    return df1, df2

# apply the modify_data function to the DataFrames
df1, df2 = modify_data(df1, df2)

# print the updated DataFrames
print('Updated DataFrames:')
print(df1)
print(df2)

Output:

Original DataFrames:
      Name  Age      City
0    Alice   25  New York
1      Bob   30     Paris
2  Charlie   35    London
3    David   40     Tokyo
4    Emily   45    Sydney
      Name  Salary
0      Bob   50000
1  Charlie   60000
2    David   70000
Updated DataFrames:
      Name     Age      City
0    Alice    25.0  New York
1      Bob  50000.0     Paris
2  Charlie  60000.0    London
3    David  70000.0     Tokyo
4    Emily    45.0    Sydney
      Name  Salary
0      Bob   50000
1  Charlie   60000
2    David   70000

As shown in this example, updating data in multiple DataFrames simultaneously requires some additional steps, such as setting and resetting the index of the DataFrames. However, with the appropriate techniques, such as defining custom functions and using the .update and .merge methods, we can efficiently update the data in multiple DataFrames.

How To Save Updated Data to a New File

Once we have updated the data in a Pandas DataFrame, we may want to save it to a new file for future use or analysis. Here are some techniques for saving updated data to a new file:

  1. Use the .to_csv method: The .to_csv method allows us to save a DataFrame to a CSV file.
  2. Use the .to_excel method: The .to_excel method allows us to save a DataFrame to an Excel file.
  3. Use the .to_pickle method: The .to_pickle method allows us to save a DataFrame to a binary file in pickle format.

Here is an example of how to save updated data to a new file using these methods:

import pandas as pd

# create a DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
})

# print the original DataFrame
print('Original DataFrame:')
print(df)

# update the values of the 'Age' column
df['Age'] = [26, 31, 36, 41, 46]

# save the updated DataFrame to a CSV file
df.to_csv('updated_data.csv', index=False)

# save the updated DataFrame to an Excel file
df.to_excel('updated_data.xlsx', index=False)

# save the updated DataFrame to a binary file in pickle format
df.to_pickle('updated_data.pkl')

# read the CSV file back into a DataFrame and print it
df_csv = pd.read_csv('updated_data.csv')
print('DataFrame from CSV file:')
print(df_csv)

# read the Excel file back into a DataFrame and print it
df_excel = pd.read_excel('updated_data.xlsx')
print('DataFrame from Excel file:')
print(df_excel)

# read the pickle file back into a DataFrame and print it
df_pickle = pd.read_pickle('updated_data.pkl')
print('DataFrame from pickle file:')
print(df_pickle)

Output:

Original DataFrame:
      Name  Age      City
0    Alice   25  New York
1      Bob   30     Paris
2  Charlie   35    London
3    David   40     Tokyo
4    Emily   45    Sydney
DataFrame from CSV file:
      Name  Age      City
0    Alice   26  New York
1      Bob   31     Paris
2  Charlie   36    London
3    David   41     Tokyo
4    Emily   46    Sydney
DataFrame from Excel file:
      Name  Age      City
0    Alice   26  New York
1      Bob   31     Paris
2  Charlie   36    London
3    David   41     Tokyo
4    Emily   46    Sydney
DataFrame from pickle file:
      Name  Age      City
0    Alice   26  New York
1      Bob   31     Paris
2  Charlie   36    London
3    David   41     Tokyo
4    Emily   46    Sydney

As shown in this example, saving updated data to a new file in Pandas is straightforward using the .to_csv, .to_excel, and .to_pickle methods. We can choose the appropriate method based on the file format and our specific needs. We can also read the file back into a DataFrame using the corresponding read method to verify that the data was saved correctly.

Click to share! ⬇️