Difference between map, applymap and apply methods in Pandas

Difference between map, applymap and apply methods in Pandas

3 min read
python
pandas
dataframe
vectorization

Can you tell me when to use these vectorization methods with basic examples?

I see that map is a Series method whereas the rest are DataFrame methods. I got confused about apply and applymap methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!

Photo by Roberto Nickson | Unsplash
3 posted solutions
684

jeremiahbuddha wrote:

apply works on a row / column basis of a DataFrame
applymap works element-wise on a DataFrame
map works element-wise on a Series


Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book):

Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:

In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [117]: frame
Out[117]: 
               b         d         e
Utah   -0.029638  1.081563  1.280300
Ohio    0.647747  0.831136 -1.549481
Texas   0.513416 -0.884417  0.195343
Oregon -0.485454 -0.477388 -0.309548

In [118]: f = lambda x: x.max() - x.min()

In [119]: frame.apply(f)
Out[119]: 
b    1.133201
d    1.965980
e    2.829781
dtype: float64

Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.

Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:

In [120]: format = lambda x: '%.2f' % x

In [121]: frame.applymap(format)
Out[121]: 
            b      d      e
Utah    -0.03   1.08   1.28
Ohio     0.65   0.83  -1.55
Texas    0.51  -0.88   0.20
Oregon  -0.49  -0.48  -0.31

The reason for the name applymap is that Series has a map method for applying an element-wise function:

In [122]: frame['e'].map(format)
Out[122]: 
Utah       1.28
Ohio      -1.55
Texas      0.20
Oregon    -0.31
Name: e, dtype: object
93

MarredCheese wrote:

Quick Summary

  • DataFrame.apply operates on entire rows or columns at a time.

  • DataFrame.applymap, Series.apply, and Series.map operate on one element at time.

Series.apply and Series.map are similar and often interchangeable. Some of their slight differences are discussed in osa's answer below.

45

Sergey Orshanskiy wrote:

Adding to the other answers, in a Series there are also map and apply.

Apply can make a DataFrame out of a series; however, map will just put a series in every cell of another series, which is probably not what you want.

In [40]: p=pd.Series([1,2,3])
In [41]: p
Out[31]:
0    1
1    2
2    3
dtype: int64

In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]: 
   0  1
0  1  1
1  2  2
2  3  3

In [43]: p.map(lambda x: pd.Series([x, x]))
Out[43]: 
0    0    1
1    1
dtype: int64
1    0    2
1    2
dtype: int64
2    0    3
1    3
dtype: int64
dtype: object

Also if I had a function with side effects, such as "connect to a web server", I'd probably use apply just for the sake of clarity.

series.apply(download_file_for_every_element) 

Map can use not only a function, but also a dictionary or another series. Let's say you want to manipulate permutations.

Take

1 2 3 4 5
2 1 4 5 3

The square of this permutation is

1 2 3 4 5
1 2 5 3 4

You can compute it using map. Not sure if self-application is documented, but it works in 0.15.1.

In [39]: p=pd.Series([1,0,3,4,2])

In [40]: p.map(p)
Out[40]: 
0    0
1    1
2    4
3    2
4    3
dtype: int64

Submit your solution

© 2023 Dev's Feed