Pandas Cheat Sheet Github

This is a guide to many pandas tutorials, geared mainly for new users. # Internal Guides. Pandas’ own 10 Minutes to pandas (opens new window). More complex recipes are in the Cookbook (opens new window). A handy pandas cheat sheet (opens new window). # Community Guides # pandas Cookbook by Julia Evans. Quick and Dirty Pandas Cheat Sheet. I don't regularly write new scripts in pandas. This repo is to help me when I need to tweak a script I haven't changed in a while.

Table of Contents¶

unique - find unique rows
Working with time series data

unique - find unique rows¶¶

Find unique rows in dataset

Company	Person	Sales
0	GOOG	Sam	200
1	GOOG	Charlie	120
2	MSFT	Amy	340
3	MSFT	Vanessa	124
4	FB	Carl	243
5	FB	Sarah	350

nunique - find number of unique rows¶¶

More efficient than finding the unique array and finding the length of it.

value_counts - find unique values and number of occurrences¶¶

apply - batch process column values¶¶

Calling the apply() is similar to calling the map() in Python. It can apply an operation on all records of a selected column. For instance, to find the squared sales, do the following

Company	Person	Sales	sq_sales
0	GOOG	Sam	200	40000
1	GOOG	Charlie	120	14400
2	MSFT	Amy	340	115600
3	MSFT	Vanessa	124	15376
4	FB	Carl	243	59049
5	FB	Sarah	350	122500

Pandas Functions Cheat Sheet

We can also define a function and call that within the apply() method. This can accept values of one or more columns to calculate a new column.

Company	Person	Sales	sq_sales	cu_sales
0	GOOG	Sam	200	40000	8000000
1	GOOG	Charlie	120	14400	1728000
2	MSFT	Amy	340	115600	39304000
3	MSFT	Vanessa	124	15376	1906624
4	FB	Carl	243	59049	14348907
5	FB	Sarah	350	122500	42875000

Github Search Cheat Sheet

Company	Person	Sales	sq_sales	cu_sales
1	GOOG	Charlie	120	14400	1728000
3	MSFT	Vanessa	124	15376	1906624
0	GOOG	Sam	200	40000	8000000
4	FB	Carl	243	59049	14348907
2	MSFT	Amy	340	115600	39304000
5	FB	Sarah	350	122500	42875000

Note how the index remains attached to the original rows.

Company	Person	Sales	sq_sales	cu_sales
4	FB	Carl	243	59049	14348907
5	FB	Sarah	350	122500	42875000
1	GOOG	Charlie	120	14400	1728000
0	GOOG	Sam	200	40000	8000000
3	MSFT	Vanessa	124	15376	1906624
2	MSFT	Amy	340	115600	39304000

isnull - finding null values throughout the DataFrame¶¶

Company	Person	Sales	sq_sales	cu_sales
0	False	False	False	False	False
1	False	False	False	False	False
2	False	False	False	False	False
3	False	False	False	False	False
4	False	False	False	False	False
5	False	False	False	False	False

Working with time series data¶¶

This section explains how to specify datatypes of columns while reading data and how to define column converters to ease certain data types.

Unnamed: 0	Registration Date	Country	Organization	Current customer?	What would you like to learn?
0	0	11/08/2019 06:09 PM EST	Jamaica	The University of the West Indies	NaN	NaN
1	1	11/08/2019 06:09 PM EST	Japan	iLand6 Co.,Ltd.	no	I am interested ArcGIS.
2	2	11/08/2019 05:56 PM EST	Canada	Safe Software Inc	yes	data science workflos
3	3	11/08/2019 05:51 PM EST	Canada	Le Groupe GeoInfo Inc	yes	general information
4	4	11/08/2019 05:26 PM EST	Canada	Safe Software Inc.	NaN	NaN

The Registration Date should be of type datetime and the Current customer? should be of bool. However, are they?

Everything is a generic object. Let us re-read, this time knowing what their data types should be.

Plotting time series¶¶

Now that the Registration date is datetime, we can plot the number of registrants by time. But before that, we need to set it as the index.

Country	Organization	Current customer?	What would you like to learn?
Registration Date
2019-11-08 18:09:00	Jamaica	The University of the West Indies	False	NaN
2019-11-08 18:09:00	Japan	iLand6 Co.,Ltd.	False	I am interested ArcGIS.
2019-11-08 17:56:00	Canada	Safe Software Inc	True	data science workflos
2019-11-08 17:51:00	Canada	Le Groupe GeoInfo Inc	True	general information
2019-11-08 17:26:00	Canada	Safe Software Inc.	False	NaN

Add a counter column to the dataframe¶¶

Pandas Cheat Sheet Github Free

Note. It is important to count up only after sorting. Else the numbers are going to be all over the place.