• Magdalena Konkiewicz

Pandas data manipulation functions: apply(), map() and applymap()


Image by Couleur from Pixabay

Once you started working with pandas you will notice that in order to work with data you will need to do some transformations to your data set. It is almost never the case that you load the data set and can proceed with it in its original form. There are several reasons why you may need this, for example:

changing units (e.g data set is in kg and you need it in pounds),

doing some mathematical transformation (log, root, etc),

lowercasing or capitalizing strings,

and more…

This article will explain how you can use apply(), map(), and applymap() to achieve it, and when to use one and not the other.



Arithmetic operations on Series

Before deep-diving into apply(), map(), and applymap() it is important to note that some simple basic arithmetic operations can be done without using any of the functions mentioned. Those are:

  • addition

  • subtraction

  • multiplication

  • or any mix of the above

Let’s load Iris data set and see some examples:



from sklearn import datasets
import pandas as pd
iris_data = datasets.load_iris()
df_iris = pd.DataFrame(iris_data.data,columns=iris_data.feature_names)
df_iris['target'] = pd.Series(iris_data.target)
df_iris.head()



It is very easy to do basic arithmetic operations. If we would like to compute sepal length to sepal width ratio you could just divide one column by another one:



df_iris['sepal_width_sepal_length_ratio'] = df_iris['sepal length (cm)'] / df_iris['sepal width (cm)']
df_iris.head()



As you can see this we have a new column added that is a result of this basic arithmetic operation. But what happens if you wanted to use more complex operations such as log, floor, cube, etc.

This is when apply(), map() and applymap() become your friend.


apply() on Series

Let’s start to see how we can use apply() can be used on the Series object. We are going to pretend we are interested in taking a log of sepal length. It is very easy to do it with apply(). We just need to pass one argument, a callable function that we want to apply to each element in the series. In our case this is math.log :



import math
df_iris['sepal_length_log'] = df_iris['sepal length (cm)'].apply(math.log)
df_iris.head()



As you can see we have successfully created a new column called sepal_length_log and the log was applied to each sepal length entry.

We can now illustrate how apply() can work on a data frame, a more complicated example.



Photo by JESHOOTS.COM on Unsplash


apply() on DataFrame

Apply function when used on DataFrame works a bit differently than apply used on Series. The apply function is either applied to the whole row or the whole column and you need to specify the axis. Again it takes a callable of the function as an argument. It’s most popular usage is to send an entire row as an argument for the function. Let’s see some examples:



df_iris['all_dimensions'] = df_iris.drop('target', axis=1).apply(sum, axis=1)
df_iris.head()



In the example above we have created the ‘all_dimensions’ column with apply(). In order to do it, I have dropped a ‘target’ column and summed all the other column values for each row using sum() function. Note that I had to use axis=1 as a parameter in order to apply the function to each row.

The above usage may not be the most useful but the power of apply lies in defining your own function. Let’s illustrate this here.

Imagine that you want to just add sepal data together given the whole row. You could define your function like this:



def sepal_sum(row):
    return row['sepal length (cm)'] + row['sepal width (cm)']

And now you could use apply() to applies the function to each row like this:



df_iris['sepal_dimensions'] = df_iris.drop('target', axis=1).apply(sepal_sum, axis=1)
df_iris.head()



That is very useful! And will become even more useful when you will be willing to use your own more complicated functions.


applymap()

When apply() is used on DataFrame the function argument becomes the whole row or a column depending on the axis you define. But what if you would like to apply some function on each element of the data frame and not on each row or column. This is when applymap() becomes useful.

Imagine that someone has made a mistake and you would like to add 1 to each entry in your data as you have found out that it was a consistent error in measurement. Let’s start with defining a helper function:



def add_one(item):
    return item + 1


And let’s use applymap() to apply it to every element to the original iris data frame excluding the target column.



df_iris.drop('target', axis=1).applymap(add_one).head()



Now if you compare this output to the original one you will notice that one was added to every entry. It shows how powerful applymap can be.



map()

Map() function can be used only on the Series object and it works identically as apply() on Series. Let’s use it with the log example we have used before to demonstrate apply() with series. The only change is that in the code we swap apply to map:



import math
df_iris['sepal_length_log'] = df_iris['sepal length (cm)'].map(math.log)
df_iris.head()

There is a difference between map and apply even when applied to Series. The difference is that apply when working with series can only take a function callable as an argument. Map() on the other hand can also take a collection such as a dictionary and apply it to change values in a Series according to the mapping. Let’s see an example:



df_iris.target = df_iris.target.map({0: 'species1', 1: 'species2'})
df_iris.head()



As you can see we have used a dictionary here to change 0 and 1 into ‘species1’ and ‘species2’ accordingly



apply(), applymap() and map() summary

Let’s sum up the most important differences here.

Apply() works on a row/column basis of a DataFrame or on elements while applied on Series.

Applymap() works on individual elements on DataFrame.

Map() works like apply on Series but can take not only functions callable as arguments but collections such as dictionaries as well.

They are all extremely useful and mastering their usage with no doubt will make you a batter Data Scientist.

I hope you have enjoyed this article and learned something new today!


1,959 views0 comments