Skip to content

Commit 9759f04

Browse files
author
Sabbir Ahmed
authored
Merge branch 'develop' into dependabot/npm_and_yarn/moment-2.29.2
2 parents c92988b + b98cf73 commit 9759f04

1 file changed

Lines changed: 133 additions & 0 deletions

File tree

data/pyspark.json

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
{
2+
"id": "Pyspark",
3+
"title": "পাইস্পার্ক চিটশিট",
4+
"slug": "Pyspark",
5+
"description": "PySpark হল Apache Spark-এর Python API, এটি ওপেন সোর্স, distributed computing framework এবং রিয়েল-টাইম, large-scale data processing এর কাজে ব্যবহৃত হয়",
6+
"colorPref": "#b57521",
7+
"contents": [
8+
{
9+
"title": "SparkSession সেটআপ/চালু করা",
10+
"items": [
11+
{
12+
"definition": "SparkSession এ Pyspark initialize করুন",
13+
"code": [
14+
"from pyspark.sql import SparkSession",
15+
"spark = SparkSession.builder.appName('randomName').getOrCreate()"
16+
]
17+
}
18+
]
19+
},
20+
{
21+
"title": "ডাটাফ্রেম তৈরী ও লোড করা",
22+
"items": [
23+
{
24+
"definition": "ডাটাফ্রেম তৈরী করা",
25+
"code": [
26+
"from pyspark.sql.types import*",
27+
"spark.createDataFrame([(1, 'a'), (2, 'b')], ['num', 'letter'])"
28+
]
29+
},
30+
{
31+
"definition": " CSV file লোড করা",
32+
"code": "df = spark.read.load('/home/Dataset/Case.csv', format = 'csv', sep = ',', inferScheme = True, header = True)"
33+
},
34+
{
35+
"definition": "Txt file লোড করা",
36+
"code": "df = spark.read.text('people.txt')"
37+
},
38+
{
39+
"definition": "JSON file লোড করা",
40+
"code": "df = spark.read.json('customer.json')"
41+
}
42+
]
43+
},
44+
{
45+
"title": "Data Modify রিলেটেড কমান্ড সমুহ",
46+
"items": [
47+
{
48+
"definition": "Data ফিল্টার করা",
49+
"code": "df.filter(df['age']>24).show()"
50+
},
51+
{
52+
"definition": "Duplicate Data ড্রপ করা",
53+
"code": "df.dropDuplicates()"
54+
},
55+
{
56+
"definition": "null Data সরানো",
57+
"code": "df.na.drop().show()"
58+
},
59+
{
60+
"definition": "null value replace করা",
61+
"code": "df.na.fill(50).show()"
62+
},
63+
{
64+
"definition": "নির্দিষ্ট কলাম show করা",
65+
"code": "df.select('columnName').show()"
66+
}
67+
]
68+
},
69+
{
70+
"title": "Data Inspect রিলেটেড কমান্ড সমুহ",
71+
"items": [
72+
{
73+
"definition": "কলামের নাম ও ডাটা টাইপ দেখা",
74+
"code": "df.dtypes"
75+
},
76+
{
77+
"definition": "df এর কনটেন্ট show করা",
78+
"code": "df.show()"
79+
},
80+
{
81+
"definition": "df এর প্রথম ১০টি row দেখা ",
82+
"code": "df.head(10)"
83+
},
84+
{
85+
"definition": "df এর প্রথম row দেখা",
86+
"code": "df.first()"
87+
},
88+
{
89+
"definition": "Row এর সংখ্যা দেখা",
90+
"code": "df.count()"
91+
},
92+
{
93+
"definition": "df এর Schema দেখা",
94+
"code": "df.printSchema()"
95+
},
96+
{
97+
"definition": "df এর logical ও physical plan দেখা",
98+
"code": "df.explain()"
99+
}
100+
]
101+
},
102+
{
103+
"title": "Convert ও Output রিলেটেড কমান্ড সমুহ",
104+
"items": [
105+
{
106+
"definition": "DataFrame কে RDD তে রূপান্তর করা",
107+
"code": "rdd1 = df.rdd"
108+
},
109+
{
110+
"definition": "df এর কনটেন্ট Pandas Dataframe এ করে দেখা",
111+
"code": "df.toPandas()"
112+
},
113+
{
114+
"definition": "ফাইল Write ও Save করা",
115+
"code": "RDD.write.option('header',True).csv('/home/Data')"
116+
},
117+
{
118+
"definition": "Text ফাইল Save করা",
119+
"code": "textRdd.saveAsTextFile('/home/Data')"
120+
}
121+
]
122+
},
123+
{
124+
"title": "SparkSession Close করা",
125+
"items": [
126+
{
127+
"definition": "তৈরিকৃত Session Close করা",
128+
"code": "spark.stop()"
129+
}
130+
]
131+
}
132+
]
133+
}

0 commit comments

Comments
 (0)