scala常用操作
手动hello-world项目
BASH
1 | # 目录结构 |
- build.properties PROPERTIES
1
sbt.version=1.6.2
- main.scalaSCALA
1
2
3
4// 这个是默认的测试方法
object Main extends App {
println("Hello, World!")
}SCALA1
2
3
4
5
6
7
8
9
10
11
12
13
14
15// 使用这个测试代码,JVM内存如果有点小,那么会编译很长时间
// 会出现这个提示:Consider increasing the JVM heap using `-Xmx` or try a different collector
import org.apache.spark.sql.SparkSession
object Main {
def main(args: Array[String]) = {
println("hello scala!")
val ss = SparkSession.builder
.appName("example")
.master("local")
.getOrCreate()
import ss.implicits._
ss.createDataset(1 to 10).show()
ss.close()
}
} - build.sbt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76// The simplest possible sbt build file is just one line:
scalaVersion := "2.13.3"
// That is, to create a valid sbt build, all you've got to do is define the
// version of Scala you'd like your project to use.
// ============================================================================
// Lines like the above defining `scalaVersion` are called "settings". Settings
// are key/value pairs. In the case of `scalaVersion`, the key is "scalaVersion"
// and the value is "2.13.3"
// It's possible to define many kinds of settings, such as:
name := "hello-world"
organization := "ch.epfl.scala"
version := "1.0"
// Note, it's not required for you to define these three settings. These are
// mostly only necessary if you intend to publish your library's binaries on a
// place like Sonatype.
// Want to use a published library in your project?
// You can define other libraries as dependencies in your build like this:
libraryDependencies += "org.scala-lang.modules" %% "scala-parser-combinators" % "1.1.2"
// Here, `libraryDependencies` is a set of dependencies, and by using `+=`,
// we're adding the scala-parser-combinators dependency to the set of dependencies
// that sbt will go and fetch when it starts up.
// Now, in any Scala file, you can import classes, objects, etc., from
// scala-parser-combinators with a regular import.
// TIP: To find the "dependency" that you need to add to the
// `libraryDependencies` set, which in the above example looks like this:
// "org.scala-lang.modules" %% "scala-parser-combinators" % "1.1.2"
// You can use Scaladex, an index of all known published Scala libraries. There,
// after you find the library you want, you can just copy/paste the dependency
// information that you need into your build file. For example, on the
// scala/scala-parser-combinators Scaladex page,
// https://index.scala-lang.org/scala/scala-parser-combinators, you can copy/paste
// the sbt dependency from the sbt box on the right-hand side of the screen.
// IMPORTANT NOTE: while build files look _kind of_ like regular Scala, it's
// important to note that syntax in *.sbt files doesn't always behave like
// regular Scala. For example, notice in this build file that it's not required
// to put our settings into an enclosing object or class. Always remember that
// sbt is a bit different, semantically, than vanilla Scala.
// ============================================================================
// Most moderately interesting Scala projects don't make use of the very simple
// build file style (called "bare style") used in this build.sbt file. Most
// intermediate Scala projects make use of so-called "multi-project" builds. A
// multi-project build makes it possible to have different folders which sbt can
// be configured differently for. That is, you may wish to have different
// dependencies or different testing frameworks defined for different parts of
// your codebase. Multi-project builds make this possible.
// Here's a quick glimpse of what a multi-project build looks like for this
// build, with only one "subproject" defined, called `root`:
// lazy val root = (project in file(".")).
// settings(
// inThisBuild(List(
// organization := "ch.epfl.scala",
// scalaVersion := "2.13.3"
// )),
// name := "hello-world"
// )
// To learn more about multi-project builds, head over to the official sbt
// documentation at http://www.scala-sbt.org/documentation.html
常用注意事项
String类型可以直接比较
SCALA1
val result = if "test" == "test" true else false
创建类时变量加var才能修改他的值
SCALA1
2
3
4
5// 这里的name可以后期修改值,因为用了var修饰
// age和height都不支持修改,相当于都用了val修改
class Person(var name: String, val age: Int, height: Int) {
}class与object的区别
class用于创建普通的类,object用于创建单例对象,也可以认为是静态类
两个类在同一个文件里,class的是伴生类,object是伴生对象
伴生类可以访问伴生对象的所有属性,外部对象无法访问伴生对象的属性object类因为是单例对象,所以可以保存所有对象的共有值object类有一个的apply方法,可用于不用new创建对象,需手动创建SCALA1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16class Person(var name: String, val age: Int, height: Int) {
private var personId = Person.personId
println(s"object Person's personId:$personId")
def getPersonId() = {
Person.getPersonId() + 1
}
}
object Person {
private var personId = 0
def apply(name: String, age: Int, city: String = "Beijing") = new Person(name, age, city)
def getPersonId() = {
personId
}
}模式匹配
就是用于判断值的类型,然后根据判断结果进行操作。可以参考这里基本类型外的其他类型
YAML1
2Any: 任意类型,传什么都可以
Unit: 返回值为空可变参数
SCALA1
2
3
4def test(params: Int*) = {
params.foreach(println(_))
}
sum(1 to 5: _*)1 to 5: _*表示希望将一个输入当做参数序列传入
- 路径的操作SCALA
1
2
3
4// 在spark的编程中,直接使用/temp/data/data.txt会访问hdfs上的数据
val hdfsFileRdd = sparkContext.textFile("/temp/data/data.txt")
// 本地数据使用file:/temp/data/local.txt访问
val localFileRdd = sparkContext.textFile("file:/temp/data/local.txt")
正则的处理方式
SCALA
1 | // 使用.r创建正则 |
数据集的操作
RDD的创建
SCALA
1 | import org.apache.spark.SparkConf |
DataFrame的创建
SCALA
1 | import org.apache.spark.SparkConf |
RDD转DataFrame
SCALA
1 | import org.apache.spark.sql.types.StructType |
value toDF is not a member of org.apache.spark.rdd.RDD
DataFrame转RDD
SCALA
1 | import org.apache.spark.SparkConf |
创建sparkStreaming
SCALA
1 | class Test { |
- 本文标题:scala常用操作
- 创建时间:2022-03-08 09:31:26
- 本文链接:https://blog.212490197.xyz/article/tools/scala/regular-operation/
- 版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!
评论