• Magdalena Konkiewicz

Learn how to use pandas inplace parameter once and for all


Image by 995645 from Pixabay

Introduction

I have noticed that beginner, and sometimes even more advanced Data Scientists get confused about how to use inplace parameter in pandas when manipulating data frames.

What is more interesting I have seen not many articles or tutorials that explain the concept. It somehow seems to be assumed knowledge or self-explanatory concept. Unfortunately, this is not so straightforward for everyone so this article tries to explain what inplace parameter is and how to use it properly.



Functions that use inplace paarmeter

Let’s have a look at some examples of functions that use inplace:

  • fillna()

  • dropna()

  • sort_values()

  • reset_index()

  • sort_index()

  • rename()

I have created this list from the top of my head and there are probably more functions that use inplace as a parameter. I do not remember all of them by heart but pretty much any pandas DataFrame function that has inplace as a parameter will behave in a similar way. This means you will be able to apply the same logic when dealing with them that you will learn in this article.



Create a sample data frame

In order to illustrate the usage of inplace we will create a sample data frame.



import pandas as pd
import numpy as np
client_dictionary = {'name': ['Michael', 'Ana', 'Sean', 'Carl', 'Bob'], 
                     'second name': [None, 'Angel', 'Ben', 'Frank', 'Daniel'],
                     'birth place': ['New York', 'New York', 'Los Angeles', 'New York', 'New York'],
                     'age': [10, 35, 56, None, 28],
                     'number of children': [0, None, 2, 1, 1]}
df = pd.DataFrame(client_dictionary)
df.head()


We have created a data frame with five rows with the following columns: name, second name, birthplace, and number of children. Note that there are some missing values in age, second name, and children column (NaNs).

We will now demonstrate how dropna() function works with inplace parameter. Because we would like to examine two different variants we are going to create two copies of our original data frame.



df_1 = df.copy()
df_2 = df.copy()



dropna() with inplace = True

Let’s start with the variation where inplace=True. The code below drops all missing rows with missing values.



df_1.dropna(inplace=True)

If you run this in Jupyter notebook you will see that the cell has no output. This is because function with inplace=True does not return anything. It modifies an existing data frame with the desired operation and does it ‘in place’, that is on the original data frame.

If you run head() function on the data frame you should see that two rows are dropped.



df_1.head()


dropna() with inplace = False (default)

Now let’s run the same code with inplace = False. Note that we are now going to use df_2 version of data frame this time.



df_2.dropna(inplace=False)



If you run this code in Jupyter notebook you will see that there is an output (the screenshot above). The function with inplace = False returns a data frame with dropped rows.

Remember that when inplace was set to True nothing was returned but the original data frame was modified.

So what happens with the original data frame this time? Let’s call head() function to check.



df_2.head()



The original data frame is unchanged! So what happened?

When you use inplace=True the new object is created and changed instead of the original data frame. If you wanted to update the original data frame to reflect the dropped rows you would have to reassign the result to the original data frame as shown in the code below.



df_2 = df_2.dropna(inplace=False)

But wait! This is exactly what we were doing when using inplace=True. Yes, this last line of code is equivalent to the following line:



df_2.dropna(inplace=True)

The latter one is more elegant and does not create an intermediary object that is then reassigned to the original variable. It changes the original data frame directly, therefore, preferred if your intention is to change the original data frame.

Simple, no?

So, why there are so many mistakes with it? I am not sure, probably because some people still do not understand how to use this parameter properly. Let’s have a look at some common mistakes.


Image by Syaibatul Hamdi from Pixabay

Common mistakes

  • using inplace = True with a slice of the dataframe


I have noticed that several times. Let’s go back to our data frame example from the beginning of this article. There are free columns that have None values: second name, age, and number of children.

What do we do if we just want to drop None’s from the second name and age column and leave the number of children column unchanged?

I have seen people try to do the following:



df[['second name', 'age']].dropna(inplace=True)

This is probably something you do not want to do!

In fact, this should throw the following warning.


This warning is shown because pandas designers are nice and they actually try to warn you from doing something that you probably do not want to do. The code is changing the slice of the data frame that has only two columns not the original data frame. The reason for this is that you have chosen a slice of the data frame and you apply dropna() to this slice and not the original data frame.

In order to correct it use dropna() on the whole data frame with subset parameter.



df.dropna(inplace=True, subset=['second name', 'age'])
df.head()


This should result in rows with null values in second name and age column being dropped from the data frame.



  • assigning variable value to the result of inplace = True

I have seen this several times as well.



df = df.dropna(inplace=True)

This is again something you never should do! You just reassigning None to value of df. Remember when you use inplace=True then nothing is returned. Therefore the result os this code will assign None to df.



Summary

I hope that this article demystified inplace parameter for you and you will able to use it properly in your code. Happy data frame manipulation!

486 views0 comments

Recent Posts

See All