import pandas as pd
df = pd.read_csv("D:\BaiduNetdiskDownload\data\employees.csv")
df.head()
数据链接:
https://download.csdn.net/download/qq_43494013/91334882?spm=1001.2014.3001.5503
df1 = df.head(10)[['employee_id','salary']]
df1
pd.cut(df1['salary'],bins = 2)
其中bins = n,分成n段区间、起始值、结束值是所有数据的最小值、最大值,其中value_counts()是每个区间元素的个数
pd.cut(df1.salary,bins = 2).value_counts()
自定义区间
pd.cut(df1['salary'],bins = [0,10000,20000,30000])
pd.cut(df1['salary'],bins = [0,10000,20000,30000]).value_counts()
添加标签
df1['收入范围'] = pd.cut(df1['salary'],bins = [0,10000,20000,30000],labels = ['低','中','高'])
平均分配
qcut
实现
df1 = df[['employee_id','salary']].head(10)
df1
pd.qcut(df1['salary'],3)
pd.qcut(df1['salary'],3).value_counts()
对睡眠数据进行分箱处理
数据链接:
https://download.csdn.net/download/qq_43494013/91336841?spm=1001.2014.3001.5503
df = pd.read_csv("D:\BaiduNetdiskDownload\data\sleep.csv")
df1 = df.head(10)[["person_id",'sleep_quality']]
df1
df1['睡眠质量'] = pd.cut(df1['sleep_quality'],bins = 3)
df1
df1['睡眠质量'] = pd.cut(df1['sleep_quality'],bins = 3,labels = ['差','中','优'])
df1