第二次提交代码报错,,,
来源:4-14 -算子综合案例实战之词频统计重构
慕九州8702158
2020-09-27
$ ./pyspark --master local[2] --jars ~/lib/elasticsearch-spark-20_2.11-6.3.0.jar
代码为
from pyspark.sql.types import *
from pyspark.sql.functions import udf
def get_grade(value):
if value <= 50 and value >= 0:
return "健康"
elif value <= 100:
return "中等"
elif value <= 150:
return "对敏感人群不健康"
elif value <= 200:
return "不健康"
elif value <= 300:
return "非常不健康"
elif value <= 500:
return "危险"
elif value > 500:
return "爆表"
else:
return None
data2017 = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("/data/Beijing_2017_HourlyPM25_created20170803.csv").select("Year","Month","Day","Hour","Value","QC Name")
grade_function_udf = udf(get_grade, StringType())
group2017 = data2017.withColumn("Grade", grade_function_udf(data2017['value'])).groupBy("Grade").count()
res2017 = group2017.select("Grade", "count", group2017["count"]/data2017.count())
data2017那句之后就会报错
>>> data2017 = spark.read.format("csv").option("header", "true").option("inferSc hema", "true").load("/data/Beijing_2017_HourlyPM25_created20170803.csv").select( "Year","Month","Day","Hour","Value","QC Name")
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000f81 00000, 29884416, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 29884416 bytes for committing re served memory.
# An error report file with more information is saved as:
# /home/wangwei/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/bin/hs_err_pid5227.log
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/home/wangwei/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/python/lib/py4j-0.10.7- src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/wangwei/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/python/lib/py4j-0.10.7- src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/home/wangwei/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/python/lib/py4j-0.10.7- src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/wangwei/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/python/pyspark/sql/read writer.py", line 166, in load
return self._df(self._jreader.load(path))
File "/home/wangwei/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/python/lib/py4j-0.10.7- src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/home/wangwei/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/python/pyspark/sql/util s.py", line 63, in deco
return f(*a, **kw)
File "/home/wangwei/app/spark-2.3.1-bin-2.6.0-cdh5.7.0/python/lib/py4j-0.10.7- src.zip/py4j/protocol.py", line 336, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o36.load
写回答
1回答
-
Michael_PK
2020-09-27
There is insufficient memory for the Java Runtime Environment to continue
机器内存不够了
0102020-09-27
相似问题