百木园-与人分享,
就是让自己快乐。

2.安装Spark与Python练习

一、安装Spark

检查基础环境hadoop,jdk

配置文件

试运行Python代码

二、Python编程练习:英文文本的词频统计

准备文本文件:heal-the-world.txt

点击查看代码
There\'s a place in your heart

And I know that it is love

And this place could be much brighter than tomorrow

And if you really try

You\'ll find there\'s no need to cry

In this place you\'ll feel

There\'s no hurt or sorrow

There are ways to get there

If you care enough for the living

Make a little space

Make a better place

Heal the world

Make it a better place

For you and for me

And the entire human race

There are people dying

If you care enough for the living

Make it a better place

For you and for me

If you want to know why

There\'s a love that cannot lie

Love is strong

It only cares for joyful giving

If we try we shall see

In this bliss we cannot feel

Fear or dread

We stop existing and start living

Then it feels that always

Love\'s enough for us growing

Make a better world

Make a better world

Heal the world

Make it a better place

For you and for me

And the entire human race

There are people dying

If you care enough for the living

Make a better place for you and for me

And the dream we were conceived in

Will reveal a joyful face

And the world we once believed in

Will shine again in grace

Then why do we keep strangling life

Wound this earth, crucify its soul

Though it\'s plain to see

This world is heavenly be god\'s glow

We could fly so high

Let our spirits never die

In my heart I feel you are all my brothers

Create a world with no fear

Together we’ll cry happy tears

We see the nations turn their swords into plowshares

We could really get there

If you cared enough for the living

Make a little space

To make a better place

Heal the world

Make it a better place

For you and for me

And the entire human race

There are people dying

If you care enough for the living

Make a better place for you and for me

Heal the world

Make it a better place

For you and for me

And the entire human race

There are people dying

If you care enough for the living

Make a better place for you and for me

Heal the world

Make it a better place

For you and for me

And the entire human race

There are people dying

If you care enough for the living

Make a better place for you and for me

There are people dying

If you care enough for the living

Make a better place for you and for me

There are people dying

If you care enough for the living

Make a better place for you and for me

You and for me

You and for me

You and for me

You and for me

读文件,预处理:大小写,标点符号,停用词,分词 main.py

点击查看代码
with open(\"Under the Red Dragon.txt\", \"r\") as f:
    text=f.read()
    text = text.lower()
    for ch in \'!@#$%^&*(_)-+=\\\\[]}{|;:\\\'\\\"`~,<.>?/\':
        text=text.replace(ch,\" \")
    
    
words = text.split() # 以空格分割文本
stop_words = []
with open(\'stop_words.txt\',\'r\') as f: # 读取停用词文件
    for line in f:
        stop_words.append(line.strip(\'\\n\'))
afterwords=[]


for i in range(len(words)):
    z=1
    for j in range(len(stop_words)):
    
        if words[i]==stop_words[j]:
            continue
        else:
            if z==len(stop_words):
                afterwords.append(words[i])
                break
            z=z+1
            continue

统计每个单词出现的次数,按词频大小排序,结果写文件 main.py

点击查看代码
counts = {}
for word in afterwords:
    counts[word] = counts.get(word, 0) + 1
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)

f1 = open(\'count.txt\', \'w\')
for i in range(len(items)):
    word, count = items[i]
    f1.write(word+\" \"+str(count)+\"\\n\")

输出结果

点击查看代码
a 22
make 18
place 17
living 10
world 10
care 8
people 7
dying 7
s 7
human 5
heal 5
entire 5
race 5
love 4
feel 3
heart 2
fear 2
space 2
ll 2
i 2
cry 2
joyful 2
crucify 1
create 1
we’ll 1
existing 1
high 1
fly 1
earth 1
face 1
find 1
turn 1
nations 1
spirits 1
ways 1
god 1
swords 1
wound 1
start 1
tomorrow 1
cared 1
brighter 1
tears 1
bliss 1
heavenly 1
glow 1
sorrow 1
reveal 1
plowshares 1
shine 1
life 1
brothers 1
lie 1
conceived 1
stop 1
hurt 1
believed 1
feels 1
strangling 1
strong 1
grace 1
plain 1
soul 1
cares 1
dread 1
happy 1
die 1
growing 1
giving 1
dream 1

三、使用PyCharm搭建编程环境:Ubuntu 16.04 + PyCharm + spark


来源:https://www.cnblogs.com/coder-one/p/15972584.html
本站部分图文来源于网络,如有侵权请联系删除。

未经允许不得转载:百木园 » 2.安装Spark与Python练习

相关推荐

  • 暂无文章