[python]Pandas中的join方式
268 Views
连表查询方式共包括4种:inner、left、right、outer。那么这四种方式有什么区别?
假设有以下两个dataframe:A、B。A有属性user_id、sku_id,B有属性sku_id、cate。尝试使用4种方式按照sku_id连表操作。
A | user_id | sku_id | B | sku_id | label | |
0 | 0 | 101 | 0 | 101 | 1 | |
1 | 0 | 109 | 1 | 102 | 1 | |
2 | 1 | 102 | 2 | 103 | 2 | |
3 | 1 | 100 | 3 | 104 | 3 | |
4 | 1 | 99 | 4 | 105 | 3 | |
5 | 2 | 111 | 5 | 106 | 4 | |
6 | 3 | 112 | 6 | 107 | 5 | |
7 | 4 | 108 | 7 | 108 | 5 |
1.inner:df1和df2的左右位置无影响,取df1和df2共有的sku_id,并添加df1的属性user_id和df2的属性label,下表:df1 inner join df2
(A|B) | sku_id | label | user_id |
0 | 101 | 1 | 0 |
1 | 102 | 1 | 1 |
2 | 108 | 5 | 4 |
2.left:df1和df2的左右位置有影响,如df1左连接df2,则取df1的所有行,如果该行sku_id的df2可查到,则添加df2的label,否则添加空值,下表:df1 left join df2
(A|B) | sku_id | user_id | label |
0 | 101 | 0 | 1 |
1 | 109 | 0 | NaN |
2 | 102 | 1 | 1 |
3 | 100 | 1 | NaN |
4 | 99 | 1 | NaN |
5 | 111 | 2 | NaN |
6 | 112 | 3 | NaN |
7 | 108 | 4 | 5 |
3.right:df1和df2的左右位置有影响,同左连接,df1 left join df2==df2 right join df1
4.outer:df1和df2的左右位置无影响,如df1的sku_id与df2匹配,则合并df1的行与df2的行,否则填充空值
(A|B) | sku_id | user_id | label |
0 | 101 | 0 | 1 |
1 | 109 | 0 | NaN |
2 | 102 | 1 | 1 |
3 | 100 | 1 | NaN |
4 | 99 | 1 | NaN |
5 | 111 | 2 | NaN |
6 | 112 | 3 | NaN |
7 | 108 | 4 | 5 |
8 | 103 | NaN | 2 |
9 | 104 | NaN | 3 |
10 | 105 | NaN | 3 |
11 | 106 | NaN | 4 |
12 | 107 | NaN | 4 |